The Environmental Impact of LLMs like ChatGPT: How much does a "thank you" cost?

Tech companies love to brag about how smart their AI is, but when it comes to how much energy, water and carbon it burns and the environmental impact, suddenly everyone goes quiet. Every time you ask a large language model (LLM) like ChatGPT to write your email, solve a math problem or draft code, you are triggering a chain of high-intensity computations. That seemingly harmless “thank you” to ChatGPT? Let’s see how much that really cost and look into the real environmental impact of integrating AI into our daily tasks.

A new study, published on May 14, 2025, by researchers from the University of Rhode Island and University of Tunis, offers the most comprehensive look yet at the real environmental toll of LLM inference: what happens not during training and each time a user sends a prompt. The results are significant and they underscore a growing paradox: while LLMs become faster, cheaper and more widely available, their cumulative environmental footprint is intensifying with every prompt we type.

Inference Is the Real Cost

The AI community has long focused on the carbon and energy implications of model training, rightly so, given the megawatt-hours and CO₂ tons needed to train GPT-3 or similar behemoths. But inference – the part that powers your actual conversation, has now become the dominant environmental factor.

The study estimates that inference may account for up to 90% of a model’s total lifecycle energy use. Unlike training, which occurs once per model iteration, inference happens billions of times per day.

How Much Does a Single Prompt Cost?

Let’s break it down, starting with the most widely used LLM, OpenAI’s ChatGPT: a single short GPT-4o query consumes around 0.42 watt-hours of energy, emits 0.15 grams of CO₂, and evaporates about 0.13 millilitres of freshwater. This may seem negligible, until you consider that OpenAI’s GPT-4o handles around 700 million queries per day.

Over a year, this adds up to:

391,509 to 463,269 MWh of electricity – equal to the consumption of 35,000 U.S. homes.
Over 1.5 million kilolitres of freshwater evaporated, roughly the drinking needs of 1.2 million people.
Up to 163,000 tons of CO₂ emissions, which would require a forest the size of Chicago to counterweight the emissions.

These numbers aren’t speculative. They’re modelled using public API data, real GPU specs (like H100 and A100 power draws), and region-specific environmental multipliers like Power Usage Effectiveness (PUE), Water Usage Effectiveness (WUE), and Carbon Intensity Factors (CIF). For the first time, inference costs are shown publicly in real use and not just lab simulations.

Does Size Matter?

One of the more surprising findings is that model size isn’t everything. GPT-4o mini, a smaller model variant, actually consumes more energy per query than GPT-4o. Why? Hardware. GPT-4o mini runs on older A100 GPUs, while GPT-4o is deployed on newer H100 or H200 systems. Efficiency gains from newer hardware can offset the size advantage of a smaller model.

Even more dramatic is the contrast between Claude-3.7 Sonnet and DeepSeek-R1. Both are capable reasoning models, but while Claude uses around 17 watt-hours for a long-form query, DeepSeek-R1 burns over 33 watt-hours. That’s more than a 90% difference – due to architecture, but mostly because of the inefficiencies of DeepSeek’s regional infrastructure.

OpenAI has acknowledged these infrastructure challenges and in its Economic Blueprint, proposes significant investments in AI infrastructure, including data centres and power plants to support sustainable AI growth and mitigate environmental impacts.

Who Are the Worst “Offenders”?

Among the 30 models evaluated, DeepSeek-R1, GPT-4.5, and o3 stand out for their inefficiency. DeepSeek-R1, in particular, consumes more than 150 mL of water per long-form query, a stunning number when scaled. These models are over 70 times more resource-intensive than their lightweight counterparts like GPT-4.1 nano, which handles long prompts using just 0.45 watt-hours and mere drops of water.

On the other end of the spectrum, Claude-3.7 Sonnet ranks as the most eco-efficient model when balancing performance and resource use, thanks to both optimized architecture and efficient AWS infrastructure.

Faster, Cheaper, Dirtier

Efficiency often seduces us into complacency. But the Jevons Paradox, named after 19th-century economist William Jevons, warns us otherwise: as something becomes more efficient, total consumption often increases.

This is precisely what’s happening with AI. GPT-4o is cheaper and faster than its predecessors, leading to more queries, more usage and ultimately a higher environmental cost. OpenAI’s own reports suggest explosive growth in ChatGPT users – up to 800 million weekly by April 2025. Efficiency improvements instead of reducing net impact, have enabled exponential demand.

Jegham, A., Al-Ali, A., & Al-Qurishi, M. (2025). How Hungry is AI? arXiv preprint arXiv:2505.09598

Eco-Efficiency Is a Relative Metric

To make sense of this trade-off, the researchers employed Data Envelopment Analysis (DEA), a tool that evaluates how effectively each model converts energy, water, and carbon into actual performance across reasoning, math and coding benchmarks.

This allows us to stop thinking in absolute terms and instead ask: for every watt-hour or gram of CO₂, how much useful intelligence do we get? A model that emits less but performs poorly may be less eco-efficient than one that emits more but achieves tenfold utility. The DEA framework helps shift sustainability debates from raw consumption to meaningful return-on-footprint.

Infrastructure Is the Variable

Beyond model design, the true environmental lever lies in infrastructure. A smaller model running on outdated hardware can consume more energy than a larger model on optimized systems. PUE, WUE, and CIF, the infrastructure multipliers, can dramatically swing environmental impacts.

For instance, U.S.-based deployments like those of OpenAI and Anthropic generally perform better due to higher-quality data centre infrastructure. By contrast, DeepSeek’s operations in Chinese data centres suffer from higher cooling demands and carbon intensity, which inflates water and CO₂ footprints.

This finding underscores a regulatory blind spot: it’s not just about what the model is, but where and how it runs.

The Path Forward: Regulation, Transparency and Smart Use

As in many of our previous articles, we need global regulatory thresholds on energy, water, and carbon use per inference that apply and synergize with AI regulations. As individuals, we should educate on the cognitive impacts and be aware of misuse of AI and slop.

Another point we actively advocate for is greater transparency, pushing companies to disclose real-time footprints, including training data rights and energy waste. Without this data, users and policymakers remain blind to the environmental consequences of AI.

Awareness Is No Longer Optional

We cannot afford to treat AI like magic. Behind every output is a real, measurable and often avoidable environmental cost. As users, we are complicit. As developers, we are responsible.

This is not a call to abandon LLMs. It is a call to use them mindfully, to demand transparency and to advocate for infrastructure that matches our ambitions. Artificial intelligence is not free. And the bill, increasingly, is paid by our climate and our water.

So next time you say “thanks, ChatGPT,” consider this: your gratitude spent enough water to keep a person alive for a day.

The Environmental Impact of LLMs like ChatGPT: How much does a “thank you” cost?