Nvidia’s must-have H100 AI chip made it a multitrillion-dollar firm, one that could be price greater than Alphabet and Amazon, and opponents have been preventing to catch up. However maybe Nvidia is about to increase its lead — with the brand new Blackwell B200 GPU and GB200 “superchip.”
Nvidia says the brand new B200 GPU presents as much as 20 petaflops of FP4 horsepower from its 208 billion transistors and {that a} GB200 that mixes two of these GPUs with a single Grace CPU can supply 30 instances the efficiency for LLM inference workloads whereas additionally doubtlessly being considerably extra environment friendly. It “reduces value and vitality consumption by as much as 25x” over an H100, says Nvidia.
Coaching a 1.8 trillion parameter mannequin would have beforehand taken 8,000 Hopper GPUs and 15 megawatts of energy, Nvidia claims. At this time, Nvidia’s CEO says 2,000 Blackwell GPUs can do it whereas consuming simply 4 megawatts.
On a GPT-3 LLM benchmark with 175 billion parameters, Nvidia says the GB200 has a considerably extra modest seven instances the efficiency of an H100, and Nvidia says it presents 4x the coaching pace.
Nvidia instructed journalists one of many key enhancements is a second-gen transformer engine that doubles the compute, bandwidth, and mannequin measurement by utilizing 4 bits for every neuron as a substitute of eight (thus, the 20 petaflops of FP4 I discussed earlier). A second key distinction solely comes while you hyperlink up enormous numbers of those GPUs: a next-gen NVLink swap that lets 576 GPUs speak to one another, with 1.8 terabytes per second of bidirectional bandwidth.
That required Nvidia to construct a whole new community swap chip, one with 50 billion transistors and a few of its personal onboard compute: 3.6 teraflops of FP8, says Nvidia.
Beforehand, Nvidia says, a cluster of simply 16 GPUs would spend 60 p.c of their time speaking with each other and solely 40 p.c truly computing.
Nvidia is relying on firms to purchase massive portions of those GPUs, after all, and is packaging them in bigger designs, just like the GB200 NVL72, which plugs 36 CPUs and 72 GPUs right into a single liquid-cooled rack for a complete of 720 petaflops of AI coaching efficiency or 1,440 petaflops (aka 1.4 exaflops) of inference. It has almost two miles of cables inside, with 5,000 particular person cables.
Every tray within the rack accommodates both two GB200 chips or two NVLink switches, with 18 of the previous and 9 of the latter per rack. In whole, Nvidia says one among these racks can help a 27-trillion parameter mannequin. GPT-4 is rumored to be round a 1.7-trillion parameter mannequin.
The corporate says Amazon, Google, Microsoft, and Oracle are all already planning to supply the NVL72 racks of their cloud service choices, although it’s not clear what number of they’re shopping for.
And naturally, Nvidia is glad to supply firms the remainder of the answer, too. Right here’s the DGX Superpod for DGX GB200, which mixes eight techniques in a single for a complete of 288 CPUs, 576 GPUs, 240TB of reminiscence, and 11.5 exaflops of FP4 computing.
Nvidia says its techniques can scale to tens of hundreds of the GB200 superchips, related along with 800Gbps networking with its new Quantum-X800 InfiniBand (for as much as 144 connections) or Spectrum-X800 ethernet (for as much as 64 connections).
We don’t anticipate to listen to something about new gaming GPUs at the moment, as this information is popping out of Nvidia’s GPU Know-how Convention, which is normally nearly solely centered on GPU computing and AI, not gaming. However the Blackwell GPU structure will possible additionally energy a future RTX 50-series lineup of desktop graphics playing cards.