Loading...

Vinay Chhabra on AceCloud’s push for efficient AI inference in the next wave of AI

Vinay Chhabra on AceCloud’s push for efficient AI inference in the next wave of AI

AI is no longer just a research experiment—it’s running in production at unprecedented speed. Workloads are shifting from training massive models to applying them in real-world tasks, while enterprises navigate GPU shortages, hybrid clouds, and sovereign data requirements. 

In a conversation with TechCircle, Vinay Chhabra, Co-founder and Managing Director of AceCloud, explains how the company is focusing on efficient inference and flexible GPU utilization as the foundation for the next phase of AI adoption.

Edited Excerpts: 

The pace of change has accelerated, especially around GPUs, as cloud and AI workloads move from experimentation to production. What has surprised you most about how this transition has unfolded?

What stands out to me is the speed at which AI is advancing. We did not expect it to move this fast. Almost every day, new AI and LLM models are being released by companies around the world.

Recently, we saw Gemini 3, followed by Mistral 3 from Europe. We are also seeing several models from China, such as DeepSeek and Qwen. This has led to strong competition across regions.

At the same time, AI workloads are shifting. Most large models have already been trained on large GPU clusters. As a result, infrastructure usage is moving from training to inference, where models are applied to real tasks. By 2030, inference is expected to account for about 90% of total AI workloads. This shift is happening quickly.

Another major change is the rise of agentic AI. The industry is moving from chatbot-style interactions to systems that can plan, reason, and execute tasks. Agentic AI represents a shift from advisory responses to direct task execution. Many organizations are adopting this approach to improve internal operations.

AI workloads were largely experimental for many enterprises. From your perspective, what has changed in how customers are now using AI in production?

When you train a model, the goal is to use it. As a result, workloads shift from training to inference.

Inference workloads include tasks such as video generation. For example, a marketing company may create personalized video messages where the person in the video addresses each customer by name and reflects that customer’s previous interactions with the company. These video generation tasks run on GPUs.

Other workloads include computer vision tasks such as object identification. In photography applications, software can identify a person’s face and find all images containing that person across large photo collections. Many similar computer vision workloads are emerging.

On GPUs such as the NVIDIA H200 and L40s, users also train smaller models. The H200 has around 140 GB of memory, which allows several models to be trained directly on a single GPU.

Given the ongoing global GPU shortages, how are enterprises prioritizing compute today? How do they balance performance and availability, and what trade-offs are typically being made?

Yes, there is a GPU shortage due to heavy reliance on a single vendor, mainly NVIDIA. Demand is very high, and supply cannot keep up. However, some recent developments may ease the situation.

The Gemini 3 release shows that Google’s TPUs can support generative AI workloads effectively. Reports suggest Google plans to sell around 5 million TPUs by 2027, which could reduce some pressure. Large enterprises are also exploring alternatives for AI workloads. Amazon has introduced Trainium 3, another efficient accelerator.

Overall, more vendors are entering the space, though NVIDIA still dominates. The main impact of the shortage is on training large LLMs, which requires significant GPU capacity. Inference workloads are less affected because they need fewer and fewer GPUs. As a result, training is delayed, and costs are higher, with companies paying premiums to secure capacity. Training is the area most affected by the shortage.

Do you think there are areas where enterprises still have hidden advantages? Where are enterprises still delivering strong value, and where are they reaching their limits?

NVIDIA GPUs provide value because they support multiple workloads in a single platform. They handle AI and machine learning tasks as well as computer vision workloads. These include video games, movie rendering, and virtual desktop infrastructure. They also support ray tracing and a range of compute tasks, such as audio and video encoding and decoding.

Google’s TPUs are primarily designed for AI and machine learning workloads. NVIDIA GPUs, therefore, support a broader set of use cases, while TPUs remain in high demand due to the scale of AI and machine learning workloads.

The industry has changed significantly, with hyperscalers dominating the market. How has AceCloud decided what areas not to compete in, and where does it see opportunities to outperform?

We decided to focus on inferencing and computer vision workloads. AS Cloud is not focused on large-scale GPU clusters for model training. About 90% of AI workloads are expected to be inferencing, with around 10% for training.

Our focus includes inferencing and computer vision use cases such as video generation, along with training for computer vision models and smaller language models.

Large GPU clusters for training face demand variability. Customers typically require large numbers of GPUs for short periods and then leave. This leads to irregular usage, high upfront investment, and frequent customer turnover.

Inferencing workloads provide more consistent demand, allowing better planning, scheduling, and utilization of GPUs without the need to continuously secure short-term training customers.

What factors lead your customers to adopt hybrid or multicloud environments rather than sticking with one provider?

Multicloud is now common. Some workloads run better on one provider, while others fit better on another. Costs also vary across providers.

Relying on a single provider creates risk. When a provider has downtime, all dependent workloads are affected. With a multicloud setup, workloads can be shifted to another cloud during an outage.

Cost pressure is another driver. Many organizations are moving some workloads from hyperscalers to on-premises or alternative providers to improve cost efficiency.

Multicloud also emerges organically. In many mid-size and large companies, different teams start using different cloud providers. Over time, their workloads grow, and the organization finds itself operating across multiple clouds. Moving everything to one provider often brings no clear benefit when existing setups are working.

In practice, multicloud is driven by workload needs, cost control, risk reduction, and independent decisions made by teams. It is already widespread and is likely to remain so.

We know sovereign cloud is often framed as a compliance requirement. What operational and economic challenges do customers often underestimate?

Sovereign cloud has become a requirement under the DPDP Act. Certain types of data must remain within the country and cannot be transferred abroad. As a result, sovereign cloud adoption is no longer optional.

Domestic sovereign cloud providers are comparable in cost to hyperscalers. Customers benefit from data residency compliance and cost considerations.

In practice, hyperscalers may offer a wider range of services, while sovereign clouds operated by Indian providers offer fewer. However, most customers use only a limited subset of services—roughly 20% of services account for about 80% of usage. A large service catalog does not mean all services are actively used.

AceCloud covers the majority of common workloads. This makes it suitable for organizations prioritizing data sovereignty, cost management, and continuous support. In contrast, direct and immediate support is not always available with larger cloud providers.

Do local sovereign GPUs in India significantly reduce the country’s dependence on foreign AI infrastructure, or is global reliance still largely unavoidable?

There are two related issues to consider. The first is GPUs. India does not currently manufacture GPUs, which are the core hardware required for AI workloads. As a result, any large-scale AI infrastructure in India depends on imported GPUs.

This is where the concept of a sovereign cloud becomes relevant. In a sovereign cloud, GPUs are physically hosted within the country. This ensures that data processed on these systems does not leave national borders and allows organizations to meet government and regulatory compliance requirements related to data residency.

Globally, only a small number of countries produce GPUs. Most production is concentrated in the United States, with China also having manufacturing capabilities. Beyond these, there are few, if any, countries with meaningful GPU production, and any output elsewhere is limited to lower-end hardware.

Because of this global situation, a sovereign GPU cloud in India would still rely on GPUs supplied by US-based companies in the near term. However, with the government’s focus on semiconductor manufacturing and self-reliance, it is possible that domestic GPU development could emerge over time.

Looking ahead, what technologies is AceCloud investing in today that may not pay off immediately but are important for staying relevant in the coming years?

Our current investments focus on building an efficient inference system. This includes hosting multiple models on the same GPU and reallocating capacity when a workload becomes idle so it can be used by others.

We are also focusing on batch workloads. When GPU capacity is available, it can be used for batch processing, which lowers costs by using idle GPU time.

These are the current priorities for AceCloud: efficient inference as a service, supported by configurations that improve utilization and reduce costs for users.

Loading...

Sign up for Newsletter

Select your Newsletter frequency