Loading...

Microsoft working on Singularity, a new AI infra with 1 lakh GPUs and 692GB ram

Microsoft working on Singularity, a new AI infra with 1 lakh GPUs and 692GB ram
Loading...

Imagine an artificial intelligence (AI) infrastructure interconnected with thousands of Graphic Processing Units and Artificial Intelligence Accelerators that work together cohesively with a goal towards reducing wasted efforts, all devices within this infrastructure are part of its mainframe, which Microsoft says can ensure devices are utilised to their full potential.  

The scenario sounds right out of a sci-fi movie, such as I-Robot, where a central AI computer called Viki (virtual interactive kinetic intelligence), controls all devices (robots) in the city.

Microsoft’s Azure and research teams are working together on this AI Infrastructure service, codenamed Singularity. The company has also put out job postings for this group, terming the project as “an AI service that will become a major driver for AI, both inside Microsoft and outside.”  

Loading...

The technological singularity, a term unrelated to Microsoft’s new AI infrastructure, refers to a hypothetical point in time where the growth of technology becomes uncontrollable and irreversible, resulting in changes to human civilisation. The advancement in technology would lead to a powerful superintelligence that is superior to human intelligence.  

In a recent paper, called “Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads”, published by researchers at Microsoft, the company termed the infrastructure as a way to reduce costs of AI when “computing at a global scale”.  

According to the research paper, “Singularity is a fully managed, globally distributed infrastructure service for AI workloads at Microsoft, with support for diverse hardware accelerators. Singularity is designed from the ground up to scale across a global fleet of hundreds of thousands of GPUs and other AI accelerators.” 

Loading...

The company plans to bring down the costs of AI implementation by “maximising the aggregate useful throughput on a given fixed pool of capacity of accelerators at planet scale while providing stringent SLAs (service level agreements) for multiple pricing tiers.” 

The infrastructure will be capable of allocating job workloads through “elastically scaling down or preempting training jobs”. But most of the wasted effort is lost when systems need to restart from scratch following a failure, whereas Singularity can jump back where a job was discontinued, this, Microsoft says, can reduce lots of wasted effort, especially in the realm of Deep neural network training jobs.