AWS, NVIDIA Partner on New Supercomputing Infrastructure, Software, and Services for Generative AI
AWS to offer first cloud AI supercomputer with NVIDIA Grace Hopper Superchip and AWS UltraCluster scalability
LAS VEGAS—Amazon Web Services, Inc. (AWS) and NVIDIA have announced an expansion of their ongoing strategic collaboration to deliver the most advanced infrastructure, software, and services to customers working in the area of generative artificial intelligence.
AWS has been working to stand out as a cloud-provider for AI related solutions while NVIDIA's chips have been widely used in the development of generated AI services.
As part of the expanded effort, the two companies said they will bring together the best of NVIDIA and AWS technologies to offer training foundation models and building generative AI applications.
This includes work to develop and design the world’s fastest GPU-powered AI supercomputer and to deploy software to speed the development of generating AI technologies.
AWS to offer first cloud AI supercomputer with NVIDIA Grace Hopper Superchip and AWS UltraCluster scalability.
Also during its Reinvent conference in Las Vegas, the AWS cloud unit announced its new Trainium2 artificial intelligence chip and the general-purpose Graviton4 processor.
“AWS and NVIDIA have collaborated for more than 13 years, beginning with the world’s first GPU cloud instance," said Adam Selipsky, CEO at AWS. "Today, we offer the widest range of NVIDIA GPU solutions for workloads including graphics, gaming, high performance computing, machine learning, and now, generative AI. We continue to innovate with NVIDIA to make AWS the best place to run GPUs, combining next-gen NVIDIA Grace Hopper Superchips with AWS’s EFA powerful networking, EC2 UltraClusters’ hyper-scale clustering, and Nitro’s advanced virtualization capabilities.”
Get the TV Tech Newsletter
The professional video industry's #1 source for news, trends and product and tech information. Sign up below.
“Generative AI is transforming cloud workloads and putting accelerated computing at the foundation of diverse content generation,” said Jensen Huang, founder and CEO of NVIDIA. “Driven by a common mission to deliver cost-effective, state-of-the-art generative AI to every customer, NVIDIA and AWS are collaborating across the entire computing stack, spanning AI infrastructure, acceleration libraries, foundation models, to generative AI services.”
More specifically the two companies described the expanded collaboration as follows:
- AWS will be the first cloud provider to bring NVIDIA GH200 Grace Hopper Superchips with new multi-node NVLink technology to the cloud. The NVIDIA GH200 NVL32 multi-node platform connects 32 Grace Hopper Superchips with NVIDIA NVLink and NVSwitch technologies into one instance. The platform will be available on Amazon Elastic Compute Cloud (Amazon EC2) instances connected with Amazon’s powerful networking (EFA), supported by advanced virtualization (AWS Nitro System) and hyper-scale clustering (Amazon EC2 UltraClusters), enabling joint customers to scale to thousands of GH200 Superchips.
- NVIDIA and AWS will collaborate to host NVIDIA DGX Cloud, NVIDIA’s AI-training-as-a-service, on AWS. It will be the first DGX Cloud featuring GH200 NVL32, providing developers the largest shared memory in a single instance. DGX Cloud on AWS will accelerate training of cutting-edge generative AI and large language models that can reach beyond 1 trillion parameters.
- NVIDIA and AWS are collaborating on Project Ceiba to design the world’s fastest GPU-powered AI supercomputer – an at-scale system with GH200 NVL32 and Amazon EFA interconnect, hosted by AWS for NVIDIA’s own research and development team. This first-of-its-kind supercomputer – featuring 16,384 NVIDIA GH200 Superchips and capable of processing 65 exaflops of AI – will be used by NVIDIA to propel its next wave of generative AI innovation.
- AWS will introduce three additional Amazon EC2 instances: P5e instances, powered by NVIDIA H200 Tensor Core GPUs, for large-scale and cutting-edge generative AI and HPC workloads; and G6 and G6e instances, powered by NVIDIA L4 GPUs and NVIDIA L40S GPUs, respectively, for a wide set of applications such as AI fine tuning, inference, graphics, and video workloads. G6e instances are particularly suitable for developing 3D workflows, digital twins, and other applications using NVIDIA Omniverse, a platform for connecting and building generative AI-enabled 3D applications.
- In addition, NVIDIA announced software on AWS to boost generative AI development. NVIDIA NeMo Retriever microservice offers new tools to create highly accurate chatbots and summarization tools using accelerated semantic retrieval. NVIDIA BioNeMo, available on Amazon SageMaker now and planned to be offered on AWS on NVIDIA DGX Cloud, enables pharmaceutical companies to speed drug discovery by simplifying and accelerating the training of models using their own data. The companies said that NVIDIA software on AWS is helping Amazon bring new innovations to its services and operations. AWS is using the NVIDIA NeMo framework to train select next-generation Amazon Titan LLMs. Amazon Robotics has begun leveraging NVIDIA Omniverse Isaac to build digital twins for automating, optimizing, and planning its autonomous warehouses in virtual environments before deploying them into the real world.
More information is available at nvidianews.nvidia.com/ and at aws.amazon.com.
George Winslow is the senior content producer for TV Tech. He has written about the television, media and technology industries for nearly 30 years for such publications as Broadcasting & Cable, Multichannel News and TV Tech. Over the years, he has edited a number of magazines, including Multichannel News International and World Screen, and moderated panels at such major industry events as NAB and MIP TV. He has published two books and dozens of encyclopedia articles on such subjects as the media, New York City history and economics.