SQL query logs hold the context AI agents need to stop hallucinating joins
DataHub's new Context Intelligence layer uses SQL query history to build a semantic index, helping AI agents avoid incorrect answers by providing context.

When Miro's data team pointed AI agents directly at its Snowflake environment, the agents got the wrong answer more than 65% of the time. The problem wasn't the model — it was context. With more than 10,000 tables and no semantic layer to guide routing, the agents had no way to know which data assets matched which business questions.
DataHub is releasing a context intelligence layer that mines existing SQL query history to build a semantic index — and exposes it to agents via MCP, LangChain, Google's Agent Development Kit, and CrewAI. The company calls it Context Intelligence, and it's built on the same query-log infrastructure DataHub has used for lineage tracking in production deployments worldwide. "For the first time, enterprises can turn years of analyst query history into a living, retrievable knowledge base where agents stop hallucinating joins because they have access to the joins that have worked before, validated by the people who ran them," Shirshanka Das, co-founder and CTO of DataHub, told VentureBeat in an exclusive interview.
DataHub began as a metadata management project at LinkedIn, built to solve two problems simultaneously: making data easy to find and use across the organization while ensuring it was only used for the right reasons. The primary use case in the years since has been lineage — understanding how data flows from operational systems through streaming infrastructure into warehouses and out to business tools. Miro, the digital collaboration platform, was already using DataHub for lineage tracking and impact analysis when it began testing analytics agents against its Snowflake environment.
Ronald Angel, product manager for the data platform at Miro, told VentureBeat that the scale of the data estate became the problem immediately. Sending natural language queries directly to the Snowflake MCP produced incorrect answers more than 65% of the time. Data vendors, including Pinecone, Oracle, and Redis, all have contextual memory capabilities.
On the platform side, Microsoft has built out its Fabric IQ as a semantic layer for context. DataHub's argument isn't feature parity. The company is positioning the context layer as platform-neutral — provisioning context into existing endpoints like Snowflake semantic views and Microsoft Fabric IQ rather than replacing them.
Source: VentureBeat