Rate limits on Claude and other tools could hint at a deeper squeeze on the chips, power and data centers needed to run advanced AI. Researcher Lennart Heim explains
Compute refers to computing power essential for AI. Training AI models requires extensive compute, scaling with model size and data. Similarly, deploying models for users (inference) is also compute-intensive. If user engagement with AI tools, measured by tokens and intensity, increases tenfold, the required compute power could surge by a hundredfold. This indicates that current AI demand might be outstripping the available processing capabilities.
Flat-rate subscriptions, common for internet services like Google Workspace, work because the marginal cost per user is low. However, with AI, using the service more heavily directly translates to significantly higher costs for the provider, roughly proportional to usage intensity. A flat monthly fee often means users consume more compute than their payment covers, necessitating rate limits to manage resource consumption on these subscription plans.
AI companies employ various strategies to manage compute usage. For instance, ChatGPT defaults to an 'Auto' mode that selects the appropriate model based on the query, using less powerful models for simpler tasks. Anthropic has similarly defaulted to its smaller, less powerful Claude Sonnet model, which is cheaper to run but offers less intelligence. Additionally, users often utilize these powerful AI tools inefficiently, akin to over-engineering simple requests, further contributing to compute consumption.
OpenAI benefits from greater financial resources and higher valuation, granting it access to more compute power. Building data centers and manufacturing chips are complex and expensive endeavors. While OpenAI currently offers more generous usage, companies like Anthropic face challenges forecasting demand, leading to the dilemma of overbuilding expensive data centers or facing unused capacity. The market is likely to remain compute-constrained, eventually leading to price increases, though companies currently prefer rate limiting to ensure broader user access.
The traditional Silicon Valley ethos of rapid software scaling without physical constraints clashes with the reality of AI's physical demands. Key bottlenecks exist across the supply chain: chip fabrication, power generation, and memory production. Chip manufacturers like TSMC require high utilization rates to avoid bankruptcy, making them hesitant to scale up production drastically based on speculative demand. Power infrastructure, such as gas turbines, has seen stagnant growth, unable to meet the sudden surge in AI data center requirements. Furthermore, a shortage of clean-room space and specialized factories (fabs) limits the production of memory, driving up prices and impacting broader technology sectors like smartphones. The physical infrastructure simply cannot expand as rapidly as digital demand.
AI companies balance the need to develop more capable models (R&D compute) with the demand to serve current users (serving/inference compute). While training occurs continuously, inference spikes during peak user hours. There's a constant trade-off, with recent reports suggesting a significant portion (around 60%) of compute is allocated to R&D. This competition highlights the challenge for companies in simultaneously innovating and providing continuous, high-quality service to their user base.