Hopper (microarchitecture)

Review Cycle

March 2026

Read Time

3 min read

Technical Depth

70% Detailed

Hopper (microarchitecture) — Source: images.contentstack.io

The Verdict

Verdict

The Nvidia Hopper H100 GPU is a significant advancement in GPU technology, designed specifically for datacenter applications. With its improved single-precision floating-point format (FP32) throughput, support for new instructions, and dedicated Transformer Engine, the H100 is poised to dominate the datacenter GPU market. The H100 provides exceptional performance, scalability, and security for every workload, making it an attractive option for large datacenter operators and researchers who require high-performance computing resources.

However, the exact TDP figures and thermal solution details were not publicly disclosed, which may be a concern for some buyers. Additionally, the H100 is a high-end datacenter GPU, which may be out of reach for smaller datacenter operators or individual researchers. Overall, the H100 is a powerful and feature-rich GPU that is designed to provide exceptional performance and scalability for datacenter applications.

2024 Deep Dive: Nvidia Hopper Microarchitecture for Datacenter GPUs

Executive Summary

Nvidia's Hopper microarchitecture is a significant advancement in GPU technology, designed specifically for datacenter applications. Named after computer scientist and United States Navy rear admiral Grace Hopper, this architecture improves upon its predecessors, the Turing and Ampere microarchitectures. The Hopper architecture features a new streaming multiprocessor, a faster memory subsystem, and a transformer acceleration engine, making it an attractive option for AI, HPC, and data analytics workloads.

The Hopper microarchitecture was officially revealed in March 2022, after being leaked in November 2019. It is used alongside the Lovelace microarchitecture and is designed to provide better performance when used in an SXM5 configuration than in a typical PCIe socket. With its improved single-precision floating-point format (FP32) throughput and support for new instructions, including the Smith–Waterman algorithm, the Hopper architecture is poised to dominate the datacenter GPU market.

Architecture & Design

The Nvidia Hopper H100 GPU is implemented using the TSMC N4 process with 80 billion transistors. It consists of up to 144 streaming multiprocessors, which improve upon the Turing and Ampere microarchitectures. The maximum number of concurrent warps per streaming multiprocessor (SM) remains the same between the Ampere and Hopper architectures, at 64. The Hopper architecture provides a Tensor Memory Accelerator (TMA), which supports bidirectional asynchronous memory transfer between shared memory and global memory.

Under TMA, applications may transfer up to 5D tensors. When writing from shared memory to global memory, elementwise reduction and bitwise operators may be used, avoiding registers and SM instructions while enabling users to write warp specialized codes. TMA is exposed through cuda::memcpy_async. The Hopper architecture also features improved single-precision floating-point format (FP32) throughput, with twice as many FP32 operations per cycle per SM than its predecessor.

The Hopper architecture adds support for new instructions, including the Smith–Waterman algorithm. Like Ampere, TensorFloat-32 (TF-32) arithmetic is supported. The mapping pattern for both architectures is identical. The Nvidia Hopper H100 supports HBM3 and HBM2e memory up to 80 GB, with the HBM3 memory system supporting 3 TB/s, an increase of 50% over the Nvidia Ampere A100's 2 TB/s.

Component	Specification
Process Node	TSMC N4
Transistors	80 billion
Streaming Multiprocessors	Up to 144
Memory	HBM3 and HBM2e up to 80 GB

Performance & Thermal

The Nvidia Hopper H100 GPU delivers exceptional performance, scalability, and security for every workload. With Nvidia's NVLink Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads, while the dedicated Transformer Engine supports trillion-parameter language models. The Hopper architecture provides up to 9X faster training and an incredible 30X inference speedup on large language models.

The H100 features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision, further extending Nvidia's market-leading AI leadership. The exact TDP figures were not publicly disclosed, but the H100 is designed to provide better performance when used in an SXM5 configuration than in a typical PCIe socket. The thermal solution for the H100 is designed to handle the increased power consumption, but exact details were not publicly disclosed.

Benchmarks have shown that the H100 provides significant performance improvements over its predecessor, the A100. In one benchmark, the H100 showed a 3x performance improvement over the A100 in Tensor Core, FP32, and FP64 data types. The H100 also provides improved performance in AI and HPC workloads, making it an attractive option for datacenter applications.

Market Positioning

The Nvidia Hopper H100 GPU is positioned as a high-end datacenter GPU, designed to provide exceptional performance, scalability, and security for AI, HPC, and data analytics workloads. The H100 is designed to compete with other high-end datacenter GPUs, such as the AMD Instinct MI200. The target buyer for the H100 is likely large datacenter operators, such as cloud service providers, who require high-performance GPUs to accelerate their workloads.

The H100 is also designed to provide a competitive advantage in the AI market, with its support for trillion-parameter language models and up to 9X faster training and 30X inference speedup on large language models. The H100 is also designed to provide improved performance in HPC workloads, making it an attractive option for researchers and scientists who require high-performance computing resources.

Specifications

Technical Specifications

Specification	Detail
Process Node	TSMC N4
Transistors	80 billion
Streaming Multiprocessors	Up to 144
Memory	HBM3 and HBM2e up to 80 GB
Tensor Cores	Fourth-generation
Transformer Engine	With FP8 precision

Frequently Asked Questions

What is the Nvidia Hopper microarchitecture?

The Nvidia Hopper microarchitecture is a GPU microarchitecture developed by Nvidia, designed specifically for datacenter applications. It features a new streaming multiprocessor, a faster memory subsystem, and a transformer acceleration engine.

What are the key features of the Nvidia Hopper H100 GPU?

The Nvidia Hopper H100 GPU features up to 144 streaming multiprocessors, HBM3 and HBM2e memory up to 80 GB, fourth-generation Tensor Cores, and a Transformer Engine with FP8 precision.

What are the performance improvements of the Nvidia Hopper H100 GPU?

The Nvidia Hopper H100 GPU provides up to 9X faster training and an incredible 30X inference speedup on large language models, as well as improved performance in AI and HPC workloads.