Loading...

Krisp.AI CEO: "The tension between human and AI is artificial, and the future is hybrid"

Krisp.AI CEO: "The tension between human and AI is artificial, and the future is hybrid"

As enterprises accelerate the deployment of AI-powered voice systems, questions around audio quality, language barriers, and the role of human agents have moved to the centre of the conversation. TechCircle spoke with Davit Baghdasaryan, co-founder and CEO of Krisp.AI, about how the company evolved from a single noise-cancellation feature into a broader voice AI platform, its growing focus on India, and where it stands on the debate between human and automated voice agents.

Edited Excerpts: 

Krisp.AI began with software-only, real-time noise cancellation in 2017, before remote work became mainstream. What did you see then that others didn't?

We started the company in 2017, and the early ideas came together in 2016. I was working at Twilio, a large communications platform, and that naturally had me thinking about how communication could be improved.

There is a personal story behind why we chose this particular problem. I was living in San Francisco but travelling frequently to my homeland, Armenia, especially in the summers. Because of the time zone difference, I was joining morning calls from San Francisco while it was evening in Yerevan. I simply wanted privacy, a button I could press so that the surrounding noise would not be heard by my team. In a sense, that was already remote work. And at the time, it was impossible to do.

That personal need is where the idea came from. My co-founder, Arto, and I started working on it together. Back then, it was almost embarrassing to call it AI, it was machine learning, and the tooling for real-time machine learning on voice was simply not there. It was very difficult. It took us two years to productise the technology.

Once we did, we built an app that could work with Zoom and any other voice or video application. We launched it, and it was a significant success — that was around the 2018–2019 timeframe. Then, somewhat randomly, we noticed that many of our users were actually working remotely. We hadn't anticipated that. We adjusted our website and the application to match that workflow. About six months later, COVID hit, and remote work became universal. We had already repositioned before the wave arrived, which helped us grow considerably during that period.

Krisp.AI has since expanded well beyond noise cancellation, accent conversion, live translation, agent assist, and speech analytics. How do you ensure that it is a platform strategy rather than feature sprawl?

From the early days, people would tell us that noise cancellation is just a feature and ask what our platform story was. To be direct about it, that has been one of the hardest things for us to work through. Eventually, it is a challenge every company faces.

The playbook we have developed is this: we identify hard technology problems, real-time voice AI, low latency, small models. These are sophisticated problems. We build the technology, then we place it inside our own products. We have two user-facing products: one is a consumer product that anyone can use, and the second is a call centre product. We integrate the technology there, collect feedback, and refine it. Then we make it available to others to embed into their own products.

We have hundreds of partners who have embedded our technologies. These include Fortune 500 companies, Fortune 50, in some cases. I would expect that many of the apps a person uses on their phone contain a piece of Krisp inside them. The largest AI labs now come to us for specific technologies we have built.

One example is what we call voice isolation, a derivative of noise cancellation, but focused specifically on amplifying the primary speaker while removing all secondary speakers from the audio. That turns out to be critical for AI voice bots. Without it, practically no one can go to production today. We estimate that we power around 60 percent of that traffic.

So the process is: build hard technology, put it into our own products, refine it, then extend it to others. That cycle gives us deep, real-world expertise in voice. And voice in the real world is genuinely complicated, with background noise, secondary speakers, poor microphones, low-quality audio codecs, thousands of accents, and hundreds of languages. We have become the company that handles real-world voice at that level of complexity. Because of that, companies come to us to solve problems they do not want to deal with themselves.

Accent conversion is one of the more discussed and arguably more sensitive features you offer. You are changing how a person sounds in real time. How do you think about the ethics of that?

We understand the sensitivity, but we hold a somewhat different view. We see this technology as something that improves understanding between people. There are hundreds of millions of people every day who are not understood well because of their accents. You could frame accent as a sub-case of language itself, people speak different languages, and if a technology can help them understand each other, there is real productivity value in that, and simply the ability to communicate more clearly.

Accent as a barrier has existed for thousands of years. We value accents; they represent diversity. But in B2B communication specifically, they can create friction. Removing that friction has clear value.

There is also a more direct case: the call centre. Human agents working in call centres face significant stress every day. Customers are often frustrated. And on top of that, agents are expected to modify or mask their accent throughout the working day, that adds cognitive load. When you give an agent a tool that removes that burden at the click of a button, they are more than happy. When we speak with them, they do not describe accent conversion as a sensitive matter. They are simply relieved that the technology can solve the problem. In practice, we have not encountered pushback from our customers or from human agents.

India is clearly a significant focus for Krisp.AI right now. You support 17 Indian languages and have recently appointed a chief growth officer specifically for India. But India's BPM industry is under pressure from automation and clients demanding AI-first delivery. Are you selling to an industry that is about to shrink?

India is the world's largest hub for global customer support, and voice remains a substantial part of that. Over time, though, voice work has moved out of India, partly to the Philippines and other countries, because of the accent barrier, among other things.

What this technology enables, and it is actually a term our customers came up with rather than one we invented, is what they call a "voice renaissance." Voice work is coming back to India because the barrier that existed before is no longer there.

And now consider live voice translation. This is another area where we are seeing strong adoption. I believe 2026 will be a significant year for that technology,, it is working, it is in production, and it is already delivering measurable return on investment. For BPOs and BPMs, voice translation means they can take workloads that previously could only be served from within Europe, the US, or Latin America and move them to India. So what Krisp.AI is doing with accent conversion, noise cancellation, and voice translation is, in our view, creating jobs in India and bringing business there. We are not just saying that, we are seeing it happen.

That is why, although we already had customers in India, directly and through US-based clients, we decided to establish operations there and brought Vibhar Nair on as chief growth officer. The intent is to grow our presence significantly in the market. When I visited India a couple of months ago, the scale of voice communication there was striking. I do not have the precise numbers, but I sense that in terms of sheer volume of voice traffic, India may be number one in the world.

On the broader question of whether the industry is shrinking or transforming, it is clearly transforming. Indian BPO companies are adjusting to a new reality where AI agents will take on a portion of the workload. But our view, which is shaped by conversations with customers and industry analysts, is that the world will be hybrid. Routine calls will be handled by AI voice agents, and that is a good outcome, because those calls are also the most repetitive for agents. It frees up the human workforce to deal with more complex problems. In India specifically, the workforce is highly trained and technically skilled. Those capabilities will remain valuable. Human agents and voice AI agents will work together on the same platforms.

BFSI and healthcare are your priority verticals in India, both heavily regulated and, frankly, cautious about where their data goes. How does that conversation actually go when you approach a large bank or insurance company and propose letting AI handle their voice layer?

It is counterintuitive, but healthcare and BFSI are actually among the early adopters of these technologies. I do not claim to fully understand all the reasons, but one argument I would make is that these organisations care deeply about customer experience. If you have a banking customer, you want to give them a good experience, and you also want to manage costs. Those pressures push them toward technologies like accent conversion and voice translation.

Voice translation is a particularly clear case. If a bank or insurance company wants to serve customers across multiple regions of the world, building that capability through traditional staffing is expensive and slow. With Krisp, a single human agent who speaks primarily English can effectively communicate across 60 languages. That solves a significant scaling problem for them.

There is also a security dimension that helps our case considerably. For noise cancellation and accent conversion, the audio never leaves the agent's device, it does not pass through our cloud. That makes the security review straightforward. It largely removes the objection before it is raised. A bank or healthcare organisation does not face the same risk profile with on-device processing that they would with a cloud-based solution.

With digital sovereignty becoming a policy issue in India, not just a technical preference, is on-device processing a genuine differentiator for Krisp.AI, or is it becoming a standard expectation?

There are things you can do on the device, and there are things you cannot, at least not yet.

Noise cancellation and accent conversion are technologies where we have invested considerable effort in optimising our models to run on the kind of machines actually used in call centres, which are typically not powerful. We were able to make them work there, and that was a technical achievement that allowed us to scale in that environment.

Voice translation is a different problem. The models required are significantly larger, and at present, it is not feasible to run them on call centre devices. That processing happens in the cloud, which does attract more scrutiny and longer security reviews. There are some large banks and enterprises that are not yet ready to send customer audio to a third-party cloud. I expect that will change over time, partly because the value the technology delivers is substantial, and partly because we have deep experience in security — I was head of product security at Twilio before co-founding Krisp.AI, so this is not unfamiliar territory for us.

To answer the question directly: on-device is a genuine differentiator where it is achievable. It simplifies security significantly. But for certain workloads, particularly on the hardware found in call centres, it is simply not possible today.

On the broader human-versus-AI debate, many companies are racing toward fully autonomous AI voice agents with no human in the loop. Is Krisp.AI building toward a world with fewer human agents, or are you betting the human stays in the conversation?

We are in a fairly unusual position because we have large-scale traffic in both categories, human-to-human conversations and human-to-AI conversations. For AI voice assistants from some of the largest AI labs, we power a significant share of that infrastructure as well. So we are already serving both.

Our strategy and vision are to improve every voice interaction in the world, regardless of whether it is human-to-human or human-to-AI. Human-to-human communication is enormous, somewhere in the range of 30 to 50 trillion minutes per year globally. Human-to-AI voice communication is much smaller right now, perhaps 50 to 100 billion minutes per year, but growing quickly. From our standpoint, both are worth improving, and we have built tools for both.

I think the tension between human and AI agents is somewhat artificial. If you look at it from the perspective of the customer or the perspective of value creation, companies are deploying these technologies because the current experience is often poor and expensive. Technology is there to improve that experience and make it more affordable. When it is more affordable, you can serve more customers better. That is not a case of one side winning at the expense of the other.

We are not betting against human agents. We are a key technology provider to companies that rely on human agents, and we are also a key provider to companies building AI voice agents. The future, as we see it, is both operating together on the same platform.

Loading...

Sign up for Newsletter

Select your Newsletter frequency