AI users have to choose between accuracy or sustainability

Cheap or free access to AI models keeps improving, with Google the latest firm to make its newest models available to all users, not just paying ones. But that access comes with one cost: the environment.
In a new study, German researchers tested 14 large language models (LLMs) of various sizes from leading developers such as Meta, Alibaba, and others. Each model answered 1,000 difficult academic questions spanning topics from world history to advanced mathematics. The tests ran on a powerful, energy-intensive NVIDIA A100 GPU, using a specialized framework to precisely measure electricity consumption per answer. This data was then converted into carbon dioxide equivalent emissions, providing a clear comparison of each model’s environmental impact.
The researchers found that many LLMs are far more powerful than needed for everyday queries. Smaller, less energy-hungry models can answer many factual questions just as well. The carbon and water footprints of a single prompt vary dramatically depending on model size and task type. Prompts requiring reasoning, which force models to “think aloud,” are especially polluting because they generate many more tokens.
One model, Cogito, topped the accuracy table—answering nearly 85% of questions correctly—but produced three times more emissions than similar-sized models, highlighting a trade-off rarely visible to AI developers or users. (Cogito did not respond to a request for comment.) “Do we really need a 400-billion parameter GPT model to answer when World War II was, for example,” says Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences and one of the study’s authors.
The results underscored the balance between accuracy and emissions. The least-polluting model tested, Qwen 7B, answered just one in three questions correctly but emitted only 27.7 grams of carbon dioxide equivalent. In contrast, Deepseek’s R1 70B reasoning model answered nearly eight in 10 questions correctly—while producing more than 70 times the emissions for the same workload.
The type of question also affects environmental impact. Algebra or philosophy prompts produced emissions up to six times higher than what a high school student would generate getting homework help.
“Companies should be more transparent about the real emissions and water consumptions from prompts,” says Dauner. But at the same time, users ought to be more aware—and more judicious—about their AI use.
What's Your Reaction?






