Persistent Systems’ Sameer Dixit on why data readiness alone fails in the age of Agentic AI
As enterprises race to adopt generative and agentic AI, many assume their data foundations are already in place. In reality, the gap between being data-ready and being ready for autonomous AI is widening.
In a conversation with TechCircle, Sameer Dixit, Senior Vice President, Engineering – Data/Analytics, AI/ML and Integration at Persistent Systems, explains why traditional notions of data readiness fall short, how ownership and semantics are becoming critical, and where enterprises should be cautious as AI systems move from analysis to action.
Edited Excerpts:
When you look at large enterprises today, what signs indicate that their data is ready for generative or agentic AI, and what signals show that it is not, even when governance requirements appear to be in place?
For years, the idea was simple: AI is only as useful as the data behind it. That remains true. But the way enterprises need to think about data has changed.
Traditionally, data readiness meant scale and hygiene. Data had to be clean, structured, governed, secure, and compliant. It was designed for people—analysts and engineers writing SQL or Python, querying databases, and using reports to make decisions. That foundation is still necessary, but it no longer addresses how modern AI systems operate.
Today, AI systems query data, learn from it, reason over it, and take action. In many cases, this happens with little or no human involvement. These systems decide what to query and what to do next. For that to work, data must be consumable by machines, not just accessible. This requires a semantic layer. AI agents do not understand tables, columns, or IDs. They operate in business terms: a claim, its status, and whether it has been approved.
Even that is not enough. A semantic layer enables agents to interact with data, but not to act on it autonomously. Autonomous action requires a business context, clear ownership rules, and domain-specific constraints. Consider an insurance claim filed after heavy rain. A basic AI system can pull the policy, past claims, and weather data. It can summarize what happened. But it cannot approve the claim without understanding whether the damage qualifies as flood damage or regular rain damage, whether advisories were issued, or whether policy conditions were violated. Those decisions depend on domain rules, not just data.
This reflects a broader shift: from data readiness to data readiness for AI, to knowledge readiness for AI. Data readiness remains the base. On top of that, data must be interpretable and reliable for AI systems. Beyond that, enterprises must encode business knowledge so agents can act within defined boundaries.
This shift also changes ownership. Data was once largely the responsibility of IT. Now, IT teams work alongside AI teams and domain experts. Domain teams must define and codify the rules that guide autonomous behavior. This is no longer a single-team effort.
Not every data asset needs this level of maturity. Only the data used for autonomous decision-making requires it. Treating all data the same leads to overinvestment, slow progress, and unclear returns. A better approach is to identify where agentic systems will operate, focus on those domains, and mature them accordingly.
The progression is clear: data readiness, AI readiness, and then knowledge readiness. That is the path to making autonomous AI systems work in practice.
Many organizations assume they are AI-ready after modernizing their data or moving to the cloud. Where does that assumption break down once AI agents enter the picture?
A modern data ecosystem is no longer optional. Data readiness now means building an environment that can support AI and knowledge systems end-to-end. That requires more than a traditional data warehouse. It requires governance, elastic cloud infrastructure, and the ability to run multiple workloads in the same environment.
Legacy on-prem systems designed for ETL and reporting cannot meet these needs. They were not built to support dynamic workloads or the tooling required today, including lineage. Lineage itself has changed. It is no longer something defined once in a static diagram. At runtime, data flows evolve continuously, often in ways teams no longer fully understand.
In many modernization efforts, the largest task is simply identifying what exists in the current data ecosystem. Over time, systems change, ownership shifts, and documentation falls behind. Even the people managing the platforms often lack a complete picture.
A modern cloud-based data ecosystem is necessary not only for analytics used by people, where scale, cost, compute, and governance already pose challenges, but also for emerging AI use cases. Fully autonomous agents may not yet be widespread, but AI-driven interaction with data already is.
Users increasingly expect to query data in natural language and receive clear answers and insights in return. Consumer tools have set that expectation, and enterprise systems are now being measured against it. Meeting that expectation requires a modern data foundation.
How does the meaning of data quality change when the primary consumer is no longer a human analyst, but an AI system making real-time decisions?
There are two issues at play. Humans rely on context without thinking about it. Take a simple case: you query a system for a list of employees and get five datasets. A human who works in the organization knows which one is correct. They know who owns it, when it was refreshed, and how it is used. An agent does not have that knowledge.
The agent has to infer it. It must examine metadata, ownership, freshness, and pipeline history. It needs to determine whether the data reflects today or yesterday. That decision depends on context, not just availability.
Data quality is only the starting point. The data must be complete and semantically correct, with accurate names, locations, and attributes. Beyond that, the agent must be able to identify which dataset should be exposed and queried. That requires semantic layers, context, and representations that make the data discoverable and distinguishable from similar datasets.
When the agent cannot proceed, it must explain why. It may find a dataset that is outdated, incomplete, unusable, or missing semantic context. At that point, a human needs to be brought in. The system must know when to escalate, how to escalate, and how to describe the problem clearly to the human.
This shifts responsibility to the data platform itself. It must support this level of reasoning and decision-making, not just storage and access.
Traditional data architectures were built for predictable, linear pipelines. As systems evolve toward adaptive, agentic AI, they introduce uncertainty and nonlinear behavior. Is that a fair characterization?
Agentic systems show early warning signs when behavior becomes inconsistent. Different agents, or the same agent at different times, can reach conflicting conclusions on the same task. This points to unpredictability and signals that the system is not operating as intended.
Another signal is when agents stall, loop, or escalate unnecessarily. Repeated retries, frequent handoffs to humans, or agents getting stuck often indicate missing or inaccessible information needed to make a decision. Teams limit retries or force handoffs to prevent agents from cycling on the same problem, which reflects gaps in system design rather than isolated errors.
Semantic confusion is another source of failure. Terms such as “approved,” “priority,” or “exception” can carry different meanings across systems. Without a strong semantic layer, agents misinterpret context. These systems are designed to always produce an answer, even when they lack the information to do so. The issue is not only whether an answer is correct, but whether it is relevant.
A further indicator appears when humans routinely bypass agent decisions. Teams override agents not because of policy, but because they do not trust the context behind the recommendation. When this happens, the agent exists without real impact. Humans remain in the loop, and fully autonomous agents are rare.
These signals help assess agent performance. The core risk is not system failure, but incorrect output. Failures are visible. Wrong results are harder to detect.
Which parts of the enterprise data stack most often become bottlenecks when organizations first begin deploying AI agents at scale?
The first challenge is data readiness and ownership. Many organizations lack clear control over their data and have not defined semantic layers. Machines struggle to work directly with tables, schemas, and raw data structures. Semantic layers translate business concepts into data models that systems can understand. A simple request, such as identifying diabetic patients in a hospital, maps to codes and logic that require domain context. Without that semantic mapping, even basic queries become difficult. This is why semantic layers are becoming more common, whether through existing tools or in-house solutions, and why gaps here slow progress.
A second issue is the definition of an agent. Agentic systems depend on managing context, memory, and state across interactions. These capabilities are not native to most data environments and require new components. Many organizations underestimate the effort needed to support this layer, which limits how well agents function in practice.
The third gap is the lack of a platform approach to AI. The AI landscape is changing quickly, and most use cases share common needs such as model deployment, code sharing, reuse, and governance. Without a platform, scaling and reuse are limited. A platform also enables composability, allowing organizations to mix models and tools rather than relying on a single stack. This matters because AI environments are increasingly hybrid, spanning multiple models, providers, and validation frameworks. A modular approach makes it possible to adapt as the ecosystem continues to change.
Based on your experience, which enterprise use cases are least suitable for agentic AI today, despite the hype around it?
The first AI opportunities for most companies are internal. Horizontal functions such as marketing, finance, HR, and talent are the easiest places to start. These use cases involve internal users, which lowers regulatory and litigation risk and focuses AI on automation and productivity. A second layer is AI tools for internal business roles, where systems support daily work, such as clinical research, patient analysis, or recruitment, by bundling tools and agents to improve how tasks are done.
The highest risk comes when AI is exposed directly to end customers, especially in regulated industries, and without human oversight. Another area enterprises should prioritize is citizen AI, where employees are given secure, governed environments to build simple AI agents without coding. This allows workers to solve routine problems on their own, surface recurring issues, and escalate proven solutions for wider adoption. Without this approach, many everyday problems remain unaddressed because they never reach the top of the priority list.

