At Hot Chips 2024, OpenAI has an hour-long keynote about building scalable AI infrastructure. This makes a lot of sense since OpenAI uses a ton of compute as an organization, and will likely use even more in coming years.
Please note that we are doing these live at Hot Chips 2024 this week, so please excuse typos.
OpenAI Keynote on Building Scalable AI Infrastructure
I think most of our readers are familiar with ChatGPT and OpenAI and how LLMs work. We are going to just show the next few slides since I think our readers have this understanding.
In terms of scale, the idea is that in 2018 – GPT-1 was cool. GPT-2 was more coherent. GPT-3 had in-context learning. GPT-4 is actually useful. The expectation is that future models will be even more useful with new behaviors.
A major observation is that scaling up yields better and more useful AI.
The question was how OpenAI would know if training a bigger model would yield a better model. OpenAI observed that every time compute was doubled, it got better results. The below chart shows a four order of magnitude increase in compute, and the scaling still works.
OpenAI looked at tasks like coding and found that a similar pattern held. This is done on a mean log scale, so that the pass/ fail was not overly weighted towards solving easy coding problems easy.
This is the MMLU benchmark. This is an attempt to be the end-all for machine learning benchmarks, but because of logarithmic progress, GPT-4 was already scoring ~90% on the test.
This is a plot of industry compute used to train different frontier models. Since 2018, it has increased by about 4x per year.
GPT-1 was a box for a few weeks. It has scaled to use huge clusters of GPUs.
In 2018, the rate of compute went from 6-7x growth per year to 4x growth per year. The idea is that in 2018, a lot of the low-hanging fruit was tackled. In the future, things like cost and power will become a bigger challenge.
On the inference side, the demand is driven by intelligence. Most of the compute for inference is being used for top-end models. The smaller models tend to be much smaller amounts of compute. Inference GPU demand is growing significantly.
Here are the three bullet bull case for AI compute.
The thought is that the world needs more AI infrastructure than the world is planning for.
Here is the real solar demand in black, here are expert forecasts on demand. Even though the line kept going up, experts disagreed.
For 50 years or so, Moore’s law kept going straight up for longer than many thought was possible.
As a result, OpenAI thinks AI needs massive investment because the increase in compute has yielded benefits over eight orders of magnitude already.
OpenAI says we must design for mass deployment. One example is RAS. The clusters are getting so big that hard and soft failures occur. Silent data corruption happens and is sometimes not reproducible, even if one can isolate the GPU. Cluster failures have a wide blast radius.
OpenAI says that the cost to repair needs to come down. The blast radius needs to come down so that if one thing fails, fewer other components fail.
One idea is to use graceful degradation. This is very similar to what we do at STH in our hosting clusters so that it does not require technician time. Validation is also important at scale.
Power is going to be a major challenge, as there is only so much power in the world. GPUs will all spin up and down at the same time. That creates data center load challenges.
Like our key lessons learned, OpenAI has takeaways. I will let you read those:
It is interesting that performance is only one of the four points, even though everyone focuses on performance.
Final Words
The scaling challenge and cluster-level challenges are enormous. When we look at the Top500, today’s big AI clusters are roughly similar the top 3-4 systems combined on that list. It was cool to see a big customer talk about how they see the need for AI hardware.
Honestly those final bullet points left me a bit annoyed.
“delivering AI will require massive infrastructure buildout.” Not done by the AI companies though. It’s like they expect everyone else to pick up the slack to make up for the fact that they are on the “performance at any power” hype train.
Sooner or later the pendulum needs to swing back to greater efficiency. We have hit the P4 era once again it would seem.