Building an MCP server for document processing is not technically hard. The framework handles the protocol. The SDK handles the schema generation. A working server with two or three tools can be put together in an afternoon.

The hard part isn’t the code. It’s the schema.

A schema for lease abstraction isn’t just a list of fields. It’s a theory of what matters in a lease. Which clauses are always present, which are optional, how renewal options interact with base rent, what “rent escalation” means when a lease has stepped increases versus CPI adjustments versus percentage rent. How to handle leases where the commencement date is defined by a future event rather than a fixed calendar date. What to do when a landlord-favorable clause is absent — is it missing from the document or does its absence mean something?

These questions don’t have answers in documentation. They have answers in the heads of people who have abstracted thousands of leases. The schema captures that knowledge. The schema is the product, in a meaningful sense — the code that executes against it is almost interchangeable.

This is why domain-specific tools are hard to replicate quickly. A competitor can see your interface and copy your tool structure in a weekend. Reverse-engineering the schema — understanding why you extract these specific fields in this specific way, why some edge cases have dedicated handling and others are collapsed into a general field — takes much longer. The schema embeds accumulated judgment that isn’t visible from the outside.

It also means the schema gets better over time in ways that compound. Every lease with an unusual structure teaches you something. Every analyst complaint about a missing field or a misclassified clause becomes a schema revision. The tool that’s been through more documents has a more refined theory of what documents contain.

The code is the easy part. Build the schema like it’s the product. +++