Future

Stanford Researcher Say ChatGPT Got Dumber In Recent Months, Forgot How to Do Math

By Ana Dascalescu

Posted on July 20, 2023

Is ChatGPT not working as well as it used to do? That’s exactly what some science researchers from Stanford are claiming in a new paper and no – they’re not talking about creative writing that could lead to a model collapse.

Instead, the researchers found that ChatGPT got dumber in an easily quantifiable way: both at programming and basic math.

From their findings, the percentage of ChatGPT generations that are directly executable dropped from 52.0% in March to 10.0% in June. The drop was also large for GPT-3.5 (from 22.0% to 2.0%).

Also read: How to Prove You Didn’t Use ChatGPT: One Simple Trick to Avoid ChatGPT Plagiarism Accusations

As for basic math, it would be a great idea to stop relying on ChatGPT for your papers right now, because the numbers look grim:

“GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly, GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task,” wrote the researchers.

Their findings caused quite a storm of comments on Reddit. Beyond jokes like “Proof that ultimately no intelligence survives exposure to talking to people on the internet” and “It’s like reading the second half of Flowers for Algernon,” some users accused the researchers of not knowing programming well enough to accuse ChatGPT of becoming worse.

” But the paper is weird. I mainly use ChatGPT for code so I just went through that section.

They are basing that quality drop based on GPT generating markdown syntax text and number of characters(The paper does not say what kind of characters it is adding. Could be increased comments, could be the random characters or it could be doing more of the annoying story explanations it gives.).

Not sure how either one of those things directly relates to code quality though,” wrote Reddit user u/lost-mars.

“ But the paper is weird.

I mainly use ChatGPT for code so I just went through that section.

Not sure how either one of those things directly relates to code quality though.”

Another user said “Guess they don’t teach Python at Stanford, or realize you should ask for a specific language if you want to actually compile your code”, and yet another chimed in to say that the researchers misled people with claims that ChatGPT can’t code properly.

“Chatgpt actually performed just as well in the paper in making code. It just added triple quotes to the beginning and end, making it not work directly from copy and paste, but was otherwise fine,” wrote u/TitleToAI.

Regardless of these results, most researchers agree that indeed, generative AI has an inherent issue and it could lead to a model collapse.

The more a generative AI is exposed to generated AI content (called synthetic training data, not original sources), the more prone it is to produce more errors and eventually the model (LLM / large language model) could collapse.

Also read: Comedian Sarah Silverman Sues Meta and OpenAI For Training Their AI On Her Book

Image by MOMO36H10 HH from Pixabay