The Data Layer Hierarchy
There’s a hierarchy to data, and it matters for understanding where AI tools create the most value.
At the top: public data. It’s freely available, well-indexed, and anyone can access it. Academic papers, news articles, government datasets, aggregated market statistics. Tools built on public data compete on who can process it fastest and surface the most relevant results. The underlying data is the same for everyone — the differentiation comes from the interface, the synthesis, the workflow.
In the middle: licensed data. Private companies that have done the work of aggregating, cleaning, and standardizing data that would otherwise be fragmented. Property records, financial filings, commercial databases. Valuable, but only valuable for the domain — and available to any competitor willing to pay for access.
At the bottom — deepest, and most valuable: your own private data. Documents your organization has created. Contracts you’ve signed. Analysis you’ve done. Records that exist nowhere else. Data that can’t be licensed because no one else has it.
The hierarchy matters because AI tools are being built at every layer, but they’re not equally hard to build and not equally defensible.
A tool built on public data has a commoditization problem. The underlying data is available to everyone. Building a better interface creates temporary advantage — but if the tool is valuable, competitors will build similar interfaces, and the margin compresses toward the data licensing cost.
A tool built on licensed data has a distribution problem. You’re one of many paying for the same data. Your tool’s value depends on doing something with that data that competitors haven’t figured out yet. That window exists, but it closes.
A tool built on private data has a different problem entirely: adoption. You have to convince someone to give you access to their private data, which means building trust, demonstrating security, and proving the output is worth the exposure.
But once you clear the adoption hurdle, the defensibility is fundamentally different. Your tool improves as the user’s data grows. The insights are specific to their situation, not generic to the market. And the switching cost is high — leaving means leaving behind the accumulated context.
The practical implication: tools that work with your data are categorically different from tools that work with the world’s data.
A market intelligence tool pulls from licensed databases and surfaces what’s publicly knowable about a market. Useful, but the insight is available to anyone who uses the tool. You’re not seeing something your competitors can’t also see.
A tool that synthesizes your own documents — your deal history, your portfolio analysis, your due diligence notes — surfaces things that are specifically yours. The insight is actionable in a way that generic market intelligence can’t be, because it’s grounded in the specific context of your situation.
This is why private data tools command higher trust requirements and higher prices. They’re not competing on access to information that’s available to everyone. They’re competing on the ability to make your information more useful to you.
There’s an architectural consequence too.
Tools built on public or licensed data can be deployed as SaaS with relatively few complications. The data lives on the vendor’s servers. The processing happens in the cloud. The output is a report or a dashboard.
Tools built on private data have a harder problem. The private data can’t always go to the vendor’s servers — security, compliance, and trust requirements may preclude it. The architecture has to accommodate data staying where it is, with processing happening locally or within the organization’s controlled environment.
This is a harder technical problem. It’s also a harder sales problem. And that’s exactly why the category is less crowded than it should be, given how much value lives there.
The data that matters most is the data no one else has. The tools that help you use it are the ones that are hardest to replace.