Quick Buzz Feed

What is the AI compute crunch—and how will it affect chatbots?

Gary Lloyd | May 01,26 | 01:37 EST

Technology

Rate limits on Claude and other tools could hint at a deeper squeeze on the chips, power and data centers needed to run advanced AI. Researcher Lennart Heim explains

Developers are saying the rate limits and blocked third-party tools look like a compute crunch. What does a compute shortage actually mean?

Compute refers to computing power essential for AI. Training AI models requires extensive compute, scaling with model size and data. Similarly, deploying models for users (inference) is also compute-intensive. If user engagement with AI tools, measured by tokens and intensity, increases tenfold, the required compute power could surge by a hundredfold. This indicates that current AI demand might be outstripping the available processing capabilities.

Why does a flat-rate subscription break down for AI in a way it didn’t for earlier Internet services?

Flat-rate subscriptions, common for internet services like Google Workspace, work because the marginal cost per user is low. However, with AI, using the service more heavily directly translates to significantly higher costs for the provider, roughly proportional to usage intensity. A flat monthly fee often means users consume more compute than their payment covers, necessitating rate limits to manage resource consumption on these subscription plans.

Beyond rate limits, what levers do these companies have to control how much compute users consume?

AI companies employ various strategies to manage compute usage. For instance, ChatGPT defaults to an 'Auto' mode that selects the appropriate model based on the query, using less powerful models for simpler tasks. Anthropic has similarly defaulted to its smaller, less powerful Claude Sonnet model, which is cheaper to run but offers less intelligence. Additionally, users often utilize these powerful AI tools inefficiently, akin to over-engineering simple requests, further contributing to compute consumption.

OpenAI’s Codex offers more usage for the money than Claude Code. Is that sustainable, or are we going to see everyone move toward more restrictive plans?

OpenAI benefits from greater financial resources and higher valuation, granting it access to more compute power. Building data centers and manufacturing chips are complex and expensive endeavors. While OpenAI currently offers more generous usage, companies like Anthropic face challenges forecasting demand, leading to the dilemma of overbuilding expensive data centers or facing unused capacity. The market is likely to remain compute-constrained, eventually leading to price increases, though companies currently prefer rate limiting to ensure broader user access.

Walk me through the supply chain. What are the biggest bottlenecks that prevent AI companies from simply building more compute?

The traditional Silicon Valley ethos of rapid software scaling without physical constraints clashes with the reality of AI's physical demands. Key bottlenecks exist across the supply chain: chip fabrication, power generation, and memory production. Chip manufacturers like TSMC require high utilization rates to avoid bankruptcy, making them hesitant to scale up production drastically based on speculative demand. Power infrastructure, such as gas turbines, has seen stagnant growth, unable to meet the sudden surge in AI data center requirements. Furthermore, a shortage of clean-room space and specialized factories (fabs) limits the production of memory, driving up prices and impacting broader technology sectors like smartphones. The physical infrastructure simply cannot expand as rapidly as digital demand.

Are training models and answering user queries competing for the same resources?

AI companies balance the need to develop more capable models (R&D compute) with the demand to serve current users (serving/inference compute). While training occurs continuously, inference spikes during peak user hours. There's a constant trade-off, with recent reports suggesting a significant portion (around 60%) of compute is allocated to R&D. This competition highlights the challenge for companies in simultaneously innovating and providing continuous, high-quality service to their user base.