The External/Internal Divide

There’s a split running through almost every AI tool category right now, and it’s easy to miss because both types solve what looks like the same problem.

Call it the external/internal divide.

External-data tools pull from public sources: property databases, market reports, web indexes, court records, patent filings. They give you signal about the world. They’re valuable because aggregation and access are genuinely hard.

Internal-data tools work on your documents: the contracts you’ve signed, the emails you’ve sent, the PDFs in your inbox, the notes from last week’s meeting. They give you signal about your situation.

Both are AI tools. Both involve documents and queries and natural language interfaces. But they are solving fundamentally different problems, for different buyers, with different trust requirements.

The external/internal distinction doesn’t map cleanly onto “better” or “worse.” It maps onto who owns the value.

With external-data tools, the value is in the data aggregation. The AI interface is a commodity layer on top of a proprietary data moat. The company wins by having better data, more data, fresher data. The AI is almost incidental.

With internal-data tools, the data moat is yours. The user brings it. The AI has to earn its place by being genuinely useful on that data — not on a curated corpus, not on synthetic examples, but on the messy, inconsistent, incomplete documents real people actually have.

This is harder to build. It’s also harder to copy.

The trust profile is completely different too.

External-data tools require you to trust the source’s coverage and accuracy. You’re leaning on someone else’s index of the world. If their data is stale or incomplete, you don’t know until something goes wrong.

Internal-data tools require you to trust the tool with your data. That’s a much higher bar — especially in professional contexts where the documents are sensitive, deal-critical, or under NDA. The barrier to entry is higher, but so is the switching cost once you’re in.

There’s a third category emerging that blurs this line: tools that combine both. Ingest your documents, then enrich with external signals. Pull the lease you uploaded and cross-reference it against public market rents.

These are interesting but hard. The data provenance gets complicated. The trust surface expands. The query interface has to be sophisticated enough to know when to use which source.

For now, most successful products pick one side and go deep. The external tools are faster to market and easier to demo. The internal tools are stickier and harder to replicate.

Know which one you’re building before you start explaining it to anyone.