Students use calculators to do math. Now, let them use ChatGPT

When ChatGPT was launched to the public a little under a year ago, users were amazed by the AI chatbot’s knowledge, conversation skills, and writing ability. After being prompted by a human user, the large language model generates replies by predicting what word should follow the previous one, taking into consideration the prompt, all the prior words, and its vast training from information available on the internet.

With this formula, ChatGPT can create apparently “thoughtful,” coherent, and well-written content. Many onlookers were soon predicting that high school and college students would use the AI to write their essays for them. At the time these prognostications were made, however, it wasn’t clear whether essays created by ChatGPT were actually superior to those penned by students.

Students vs. AI

We now have strong evidence that they are. In a new study published in the journal Scientific Reports, researchers in the faculty of computer science and mathematics at the University of Passau in Germany provided 111 high school teachers in Germany each with six essays to rate. The teachers were told to use a scale from zero to six to rate each essay for topic and completeness, logic and composition, expressiveness and comprehensiveness, language mastery, complexity, vocabulary and text linking, and language constructs.

Unbeknownst to the teachers, some of the essays were penned by real students aged 16 to 18 while others were generated by ChatGPT-3.5 or ChatGPT-4, the latter being more advanced. The essays were all “argumentative,” requiring students (or AIs) to think critically about a topic, then establish a position and support it with evidence. In total, the teachers rated 270 essays across 90 topics.

When the researchers tallied the results, they found that ChatGPT clearly outperformed the students across all the criteria. Student essays received an average score of 3.69 out of 6, while ChatGPT-3.5 scored 4.36 and ChatGPT-4 scored 4.68. (The researchers only examined high school students; it is likely that college students would have performed better.)

Interestingly, the researchers noticed that ChatGPT had some very robotic writing ticks. For example, every one of the AI-written essays began the concluding paragraph with the phrase “in conclusion.” The introductory sentences universally started with a general statement using the main concepts of the essay topics.

“Although this corresponds to the general structure that is sought after for argumentative essays, it is striking to see that the ChatGPT models are so rigid,” the authors commented.

ChatGPT as a learning tool

Such tendencies could allow educators to identify AI-created works handed in ostensibly as human-crafted and penalize students accordingly. But the researchers think such a push to maintain the education status quo by excluding AIs like ChatGPT would be a missed opportunity. As AI models improve and increasingly make certain human efforts obsolete, why not instead refocus education toward more modern pursuits that are more appropriate for this undeniably tech-driven era?

“Advanced chatbots could be used as powerful classroom aids that make lessons more interactive, teach students media literacy, generate personalized lesson plans, save teachers time on admin, and more,” Will Douglas Heaven wrote for MIT Technology Review earlier this year.

Heaven interviewed an educator whose exploits are highly applicable in light of the current study. While she previously required her students to write argumentative essays, instead, she now asks her students to have ChatGPT generate them. Students are then tasked with editing the work and assessing the AI’s arguments, considering their effectiveness on specific audiences. Students then turn in a rewrite.

Like a calculator, but for essays

The University of Passau researchers think that ChatGPT should not be viewed as a cheating tool but rather as “the new calculator,” which took a lot of the grunt work out of math. Students should be extensively taught to write in class but then eventually permitted to utilize ChatGPT once they have attained sufficient mastery. They can then correct, stylize, and hone the AI’s work.

“Our results provide a strong indication that the fear many teaching professionals have is warranted: the way students do homework and teachers assess it needs to change in a world of generative AI models,” the researchers wrote.

Fear of change is natural, but it regularly passes with time. ChatGPT may finally be the impetus that brings about long overdue changes in education, transforming it from a system that revolves around memorizing and regurgitating facts to one that teaches logic, reasoning, and critical thinking. Education often treats students like robots instead of humans. Instead maybe we should leave the robotic work to AI.