Google Cloud has announced the new A3 supercomputer virtual machine at Google I/O today. This is a purpose-built Virtual Machine equipped to train and operate AI models, including models that help drive progress in the exciting field of generative AI.
Google Product Management Director Roy Kim and Corporate Product Manager Chris Kleban explained in a co-authored blog post that artificial intelligence and machine learning require massive amounts of computing power provided by the infrastructure. With the A3 supercomputers, Google Cloud provides a combination of Nvidia Corp's new H100 graphics processing units and its advances in the network, ensuring customers have access to the most powerful GPUs for AI workloads.
A single A3 supercomputer VM is fueled by eight H100 GPUs built on Nvidia's Container design, conveying three times faster computing than its forerunner chip, the A100. It moreover offers 3.6 terabytes per moment of bisectional transmission capacity over those GPUs through NVSwitch and NVLink 4.0, additionally integration with Intel Corp.'s 4th Gen Xeon Scalable processors to offload regulatory assignments, the company said.
Instances also use the intelligent network fabric of Google's Jupiter data centre, which can scale across 26,000 interconnected GPUs, allowing the instance to deliver AI performance up to 26 exaFlop. As a result, Google said, the A3 virtual machine will significantly improve the time and cost of training large machine learning models. Additionally, as organisations move from training to serving their models, the A3 virtual machine can increase inference performance by 30 times compared to the A2 virtual machine.
Google will offer A3 in several ways, customers can either run it on their own, or they can get it managed by Google. The DIY approach is to run A3 VMs on Google Kubernetes Engine (GKE) and Google Compute Engine (GCE), while the managed service runs A3 VMs on Vertex AI, the company's managed machine learning platform.
Google Cloud offers flexible deployment options, customers can either choose to deploy A3 virtual machines on Google Cloud's Vertex AI platform to build machine learning models on a fully managed infrastructure for high-performance training, or those who want to design their own custom software stack can deploy the A3 supercomputer on Google Compute Engine or Google Kubernetes Engine, the company said.