Accelerated calculations — a capability once limited to high-performance computers in government research labs — has gone mainstream.

Banks, car manufacturers, factories, hospitals, retailers and others are using AI supercomputers to handle the growing mountains of data they need to process and understand.

These powerful, efficient systems are the superhighways of computing. They transfer data and computation along parallel paths in a lightning-fast journey to valid results.

GPUs and CPUs are resources on this path, and their ramps are fast connections. The gold standard in connections for accelerated computing is NVLink.

So what is NVLink?

NVLink is a high-speed connection for GPUs and CPUs, formed using a reliable software protocol that usually runs over a few pairs of wires printed on a computer board. This allows processors to send and receive data from shared memory pools at lightning speed.

Diagram showing two NVLink usage scenarios

Now in its fourth generation, NVLink connects the host and accelerated processors at speeds up to 900 gigabytes per second (GB/s).

This is more than 7 times the bandwidth of PCIe Gen 5, the connection used in conventional x86 servers. And NVLink is 5 times more energy efficient than PCIe Gen 5, with data transmission that consumes just 1.3 picojoules per bit.

History of NVLink

First introduced as a GPU connection with the NVIDIA P100 GPU, NVLink has evolved with each new NVIDIA GPU architecture.

Basic specifications table for NVLink

In 2018, NVLink thrust itself into the high-performance computing spotlight when it debuted connecting the GPUs and CPUs in two of the world’s most powerful supercomputers, Summit and Sierra.

Systems installed at Livermore National Laboratories, Oak Ridge and Lawrence are expanding the frontiers of science in areas such as drug discovery, prediction of natural disasters and more.

Bandwidth doubles and then increases again

In 2020, third-generation NVLink doubled the maximum bandwidth per GPU to 600 GB/s, packing a dozen interconnects in each NVIDIA A100 Tensor Core GPU.

The A100 powers AI supercomputers in enterprise data centers, cloud computing services, and HPC labs around the world.

Today, 18 NVLink connections of the fourth generation are built into one NVIDIA H100 Tensor Core GPU. And this technology plays a new, strategic role that will allow the use of the most advanced processors and accelerators on the planet.

Chip-chip communication

NVIDIA NVLink-C2C it’s a board-level version of interconnect for combining two processors in a single package, creating a superchip. For example, it connects two CPU chips to provide 144 cores of Arm Neoverse V2 in NVIDIA Grace processor Superchip, a processor built to deliver energy-efficient performance for cloud, enterprise and HPC users.

The NVIDIA NVLink-C2C also combines the Grace processor and Hopper GPU to create Superchip Grace Hopper. It combines accelerated computing for the world’s most complex HPC and AI tasks in a single chip.

Alps, an artificial intelligence supercomputer planned for the Swiss National Computing Center, will be one of the first to use Grace Hopper. When it comes online later this year, the high-performance system will work on big scientific problems in fields from astrophysics to quantum chemistry.

The Grace processor uses NVLink-C2C
The Grace processor contains 144 Arm Neoverse V2 cores on two chips connected by NVLink-C2C.

Grace and Grace Hopper are also great for energy efficiency in demanding cloud computing.

For example, Grace Hopper is an ideal processor for recommendation systems. These economic engines of the Internet need fast, efficient access to large amounts of data to deliver trillions of results to billions of users every day.

A diagram showing how Grace Hopper uses NVLink to deliver the best performance in recommender systems
Recommendations get up to 4x more performance and greater efficiency with Grace Hopper than with Hopper with traditional processors.

Additionally, NVLink is used in a powerful system-on-a-chip for automakers that includes NVIDIA’s Hopper, Grace and Ada Lovelace processors. NVIDIA DRIVE Thor is a car computer that integrates intelligent functions such as digital instrument cluster, infotainment system, automated driving, parking, etc. into a single architecture.

LEGO Links of Computing

The NVLink also acts as a socket stamped into the LEGO part. It is the basis for building supersystems to solve the biggest challenges of high-performance manufacturing and AI.

For example, NVLinks on all eight GPUs in NVIDIA DGX system share fast direct connections via NVSwitch chips. Together, they create an NVLink network where every GPU on a server is part of a single system.

For even greater performance, DGX systems can be combined into modular units of 32 servers, creating a powerful and efficient computing cluster.

Image of the DGX family of server products that use NVLink
NVLink is one of the key technologies that enables users to easily scale NVIDIA DGX modular systems to SuperPODs with up to exaflops of AI performance.

Users can combine a modular block of 32 DGX systems into a single AI supercomputer using a combination of the NVLink network inside the DGX and NVIDIA Quantum-2 switched the Infiniband fabric between them. For example, the NVIDIA DGX H100 SuperPOD contains 256 H100 GPUs, providing up to exaflop maximum productivity of AI.

For even greater performance, users can use artificial intelligence supercomputers in the cloud, such as the one Microsoft Azure is building. tens of thousands of A100 and H100 graphics processors. It’s a service used by groups like OpenAI to train some of the world’s largest generative AI models.

And this is another example of the power of accelerated computing.