OpenAI Keynote on Building Scalable AI Infrastructure

1
OpenAI Hot Chips 2024_Page_19
OpenAI Hot Chips 2024_Page_19

At Hot Chips 2024, OpenAI has an hour-long keynote about building scalable AI infrastructure. This makes a lot of sense since OpenAI uses a ton of compute as an organization, and will likely use even more in coming years.

Please note that we are doing these live at Hot Chips 2024 this week, so please excuse typos.

OpenAI Keynote on Building Scalable AI Infrastructure

I think most of our readers are familiar with ChatGPT and OpenAI and how LLMs work. We are going to just show the next few slides since I think our readers have this understanding.

OpenAI Hot Chips 2024_Page_03
OpenAI Hot Chips 2024_Page_03
OpenAI Hot Chips 2024_Page_04
OpenAI Hot Chips 2024_Page_04
OpenAI Hot Chips 2024_Page_05
OpenAI Hot Chips 2024_Page_05

In terms of scale, the idea is that in 2018 – GPT-1 was cool. GPT-2 was more coherent. GPT-3 had in-context learning. GPT-4 is actually useful. The expectation is that future models will be even more useful with new behaviors.

OpenAI Hot Chips 2024_Page_06
OpenAI Hot Chips 2024_Page_06

A major observation is that scaling up yields better and more useful AI.

OpenAI Hot Chips 2024_Page_07
OpenAI Hot Chips 2024_Page_07

The question was how OpenAI would know if training a bigger model would yield a better model. OpenAI observed that every time compute was doubled, it got better results. The below chart shows a four order of magnitude increase in compute, and the scaling still works.

OpenAI Hot Chips 2024_Page_08
OpenAI Hot Chips 2024_Page_08

OpenAI looked at tasks like coding and found that a similar pattern held. This is done on a mean log scale, so that the pass/ fail was not overly weighted towards solving easy coding problems easy.

OpenAI Hot Chips 2024_Page_09
OpenAI Hot Chips 2024_Page_09

This is the MMLU benchmark. This is an attempt to be the end-all for machine learning benchmarks, but because of logarithmic progress, GPT-4 was already scoring ~90% on the test.

OpenAI Hot Chips 2024_Page_10
OpenAI Hot Chips 2024_Page_10

This is a plot of industry compute used to train different frontier models. Since 2018, it has increased by about 4x per year.

OpenAI Hot Chips 2024_Page_13
OpenAI Hot Chips 2024_Page_13

GPT-1 was a box for a few weeks. It has scaled to use huge clusters of GPUs.

OpenAI Hot Chips 2024_Page_14
OpenAI Hot Chips 2024_Page_14

In 2018, the rate of compute went from 6-7x growth per year to 4x growth per year. The idea is that in 2018, a lot of the low-hanging fruit was tackled. In the future, things like cost and power will become a bigger challenge.

OpenAI Hot Chips 2024_Page_15
OpenAI Hot Chips 2024_Page_15

On the inference side, the demand is driven by intelligence. Most of the compute for inference is being used for top-end models. The smaller models tend to be much smaller amounts of compute. Inference GPU demand is growing significantly.

OpenAI Hot Chips 2024_Page_16
OpenAI Hot Chips 2024_Page_16

Here are the three bullet bull case for AI compute.

OpenAI Hot Chips 2024_Page_17
OpenAI Hot Chips 2024_Page_17

The thought is that the world needs more AI infrastructure than the world is planning for.

OpenAI Hot Chips 2024_Page_18
OpenAI Hot Chips 2024_Page_18

Here is the real solar demand in black, here are expert forecasts on demand. Even though the line kept going up, experts disagreed.

OpenAI Hot Chips 2024_Page_19
OpenAI Hot Chips 2024_Page_19

For 50 years or so, Moore’s law kept going straight up for longer than many thought was possible.

OpenAI Hot Chips 2024_Page_20
OpenAI Hot Chips 2024_Page_20

As a result, OpenAI thinks AI needs massive investment because the increase in compute has yielded benefits over eight orders of magnitude already.

OpenAI says we must design for mass deployment. One example is RAS. The clusters are getting so big that hard and soft failures occur. Silent data corruption happens and is sometimes not reproducible, even if one can isolate the GPU. Cluster failures have a wide blast radius.

OpenAI Hot Chips 2024_Page_22
OpenAI Hot Chips 2024_Page_22

OpenAI says that the cost to repair needs to come down. The blast radius needs to come down so that if one thing fails, fewer other components fail.

OpenAI Hot Chips 2024_Page_23
OpenAI Hot Chips 2024_Page_23

One idea is to use graceful degradation. This is very similar to what we do at STH in our hosting clusters so that it does not require technician time. Validation is also important at scale.

OpenAI Hot Chips 2024_Page_24
OpenAI Hot Chips 2024_Page_24

Power is going to be a major challenge, as there is only so much power in the world. GPUs will all spin up and down at the same time. That creates data center load challenges.

OpenAI Hot Chips 2024_Page_25
OpenAI Hot Chips 2024_Page_25

Like our key lessons learned, OpenAI has takeaways. I will let you read those:

OpenAI Hot Chips 2024_Page_26
OpenAI Hot Chips 2024_Page_26

It is interesting that performance is only one of the four points, even though everyone focuses on performance.

Final Words

The scaling challenge and cluster-level challenges are enormous. When we look at the Top500, today’s big AI clusters are roughly similar the top 3-4 systems combined on that list. It was cool to see a big customer talk about how they see the need for AI hardware.

1 COMMENT

  1. Honestly those final bullet points left me a bit annoyed.

    “delivering AI will require massive infrastructure buildout.” Not done by the AI companies though. It’s like they expect everyone else to pick up the slack to make up for the fact that they are on the “performance at any power” hype train.

    Sooner or later the pendulum needs to swing back to greater efficiency. We have hit the P4 era once again it would seem.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.