Microsoft expands its AI-supercomputer lineup with common availability of the most recent 80GB NVIDIA A100 GPUs in Azure, claims 4 spots on TOP500 supercomputers listing | Azure Weblog and Updates

0
44


Right now, Microsoft introduced the overall availability of a brand-new digital machine (VM) sequence in Azure, the NDm A100 v4 Collection, that includes NVIDIA A100 Tensor Core 80 GB GPUs. This expands Azure leadership-class AI supercomputing scalability within the public cloud, constructing on our June common availability of the unique ND A100 v4 cases, and including one other public cloud first with the Azure ND A100 v4 VMs claiming 4 official locations within the TOP500 supercomputing listing. This milestone is due to a class-leading design with NVIDIA Quantum InfiniBand networking, that includes In-Community Computing, 200 GB/s and GPUDirect RDMA for every GPU, and an all-new PCIe Gen 4.0-based structure.

We stay within the period of large-scale AI fashions, the demand for giant scale computing retains rising. The unique ND A100 v4 sequence options NVIDIA A100 Tensor Core GPUs every outfitted with 40 GB of HBM2 reminiscence, which the brand new NDm A100 v4 sequence doubles to 80 GB, together with a 30 % improve in GPU reminiscence bandwidth for right this moment’s most data-intensive workloads. RAM accessible to the digital machine has additionally elevated to 1,900 GB per VM- to permit prospects with giant datasets and fashions a proportional improve in reminiscence capability to help novel information administration strategies, sooner checkpointing, and extra.

The high-memory NDm A100 v4 sequence brings AI-Supercomputer energy to the lots by creating alternatives for all companies to make use of it as a aggressive benefit. Slicing-edge AI prospects are utilizing each 40 GB ND A100 v4 VMs and 80 GB NDm A100 v4 VMs at scale for large-scale manufacturing AI and machine studying workloads, and seeing spectacular efficiency and scalability, together with OpenAI for analysis and merchandise, Meta for his or her main AI analysis, Nuance for his or her complete AI-powered voice-enabled answer, quite a few Microsoft inner groups for giant scale cognitive science mannequin coaching, and plenty of extra.

“A few of our analysis fashions can take dozens, and even tons of of NVIDIA GPUs to coach optimally, and Azure’s ND A100 v4 product helps handle the rising coaching calls for of huge AI fashions. Trendy coaching strategies require not solely highly effective accelerators, but additionally a communication cloth between them, and Azure’s implementation of NVIDIA Quantum InfiniBand 200 GB/s networking with GPUDirect RDMA between every NVIDIA A100 GPU has allowed us to make use of PyTorch and the communication libraries we’re already accustomed to, with out modification.”—Myle Ott, Analysis Engineer, Meta AI Analysis

“The tempo of innovation in conversational AI is gated partially by experimental throughput and turnaround time. With the ND A100 v4, we’re in a position to not solely full experiments in half the time vs the NDv2  but additionally profit from important per-experiment PAYG value financial savings. This can be a essential accelerant for the development of our Dragon Ambient eXperience applied sciences.”—Paul Vozila, VP, Central Analysis at Nuance Communications 

“We stay within the period of large-scale AI fashions, just like the just lately introduced MT-NLG 530B. Coaching state-of-the-art Turing fashions at this measurement introduced unprecedented challenges to the underlying coaching infrastructure, on the similar time considerably raised the bar for acceleration, networking, stability, and availability. Just like the collaborative analysis effort with NVIDIA Selene supercomputing infrastructure, Azure NDm A100 v4 with 80 GB of excessive bandwidth reminiscence can take away many current limits in scaling up fashions, equivalent to growing the utmost variety of parameters and lowering the variety of nodes required. Its efficiency and agility can present a critical aggressive edge to Azure prospects within the race of advancing AI.”—Microsoft Turing

The brand new high-memory NDm A100 v4 for data-intensive GPU compute workloads reaffirms Microsoft’s dedication to quickly adopting and delivery the most recent scale-up and scale-out GPU accelerator applied sciences to the general public cloud.

We will’t wait to see what you’ll construct, analyze, and uncover with the brand new Azure NDm A100 v4 platform.

 





Dimension

Bodily CPU Cores Host Reminiscence (GB) GPUs Native NVMe Momentary Disk NVIDIA Quantum InfiniBand Community Azure Community

Standard_ND96amsr


_A100_v4

96 1,900 GB 8 x 80 GB NVIDIA A100 6,400 GB 200 GB/s 40 Gbps

Study extra