A VC said it well to me last week: "Our AI copilot is a bad intern. Polished output, wrong half the time, and I still have to check every page." He is not alone.
EY reports that 38% of private equity firms expect to spend more than half their total budget on AI in 2026. Forty-two percent are already directing more than a quarter of their business unit budgets at it. The money is flowing. The ambition is real. And the implementation reality across most of those firms is that the output is close enough to credible to be dangerous, and far enough from correct to be useless for any decision that matters.
This is what happens when a general-purpose copilot meets a domain that was never structured for general-purpose tools.
Private Markets Data Was Never Built for Generalists
Private markets data does not live in clean schemas. A fund with 40 portfolio companies receives information in 40 different formats. One founder sends a board deck as a PDF. Another sends a Google Sheets link. A third sends a two-paragraph email with the numbers buried in the middle. The cap table comes from Carta for some companies, Pulley for others, and a hand-built spreadsheet for the rest. Side letters arrive as scanned documents. Banking statements come from 15 different institutions with 15 different layouts. The identifiers are inconsistent. The reporting periods overlap or do not. There is no standard because there has never been a business reason for founders to care about one.
A general LLM dropped into this environment does what it was built to do: generate plausible-looking text from ambiguous inputs. That is exactly the wrong behavior. When a copilot cannot tell the difference between a convertible with a most-favored-nation clause and a standard SAFE, it will produce an answer that reads beautifully and is wrong. When it cannot distinguish between a fully diluted share count and a post-money share count, it will confidently pick one and never flag the uncertainty. It is a bad intern. Polished output. Wrong half the time. Zero self-awareness about which half.
The firms running into persistent errors are not using bad models. They are using the wrong shape of solution.
Purpose-Built Agents Are Scoped on Purpose
Purpose-built extraction agents work differently. They are scoped. A single agent does not try to "read the document." It tries to extract one specific field, on one specific document type, with a validation layer that checks the output against known constraints. An extraction agent reading for "liquidation preference multiple" on a stock purchase agreement knows what a liquidation preference looks like, where it tends to appear, what values are plausible, and how to flag a novel structure for human review. It cannot answer a question outside its scope, and that is the feature, not the limitation. The boundary is what makes it trustworthy.
GoodStream runs 171 of these specialized extraction agents across the document types private capital firms actually receive. Each agent is trained for a specific field. Each output is source-traced to the page and paragraph it came from. Each extraction carries an accuracy score. When accuracy falls below threshold, the system flags the field for a human review step before anything syncs to the portfolio model. The current accuracy runs above 97% without human review, and every flagged case feeds back into the next iteration.
This architecture solves the exact failure mode the EY data is describing. AI multiplies the operational weakness of the data it sits on. A general LLM on inconsistent fund data multiplies the inconsistency. A domain-bounded agent on the same data reduces the inconsistency by forcing it through a structured extraction with validation and traceability.
Generalists in Specialist Jobs Produce Confident Garbage
A copilot is a generalist. A generalist in a specialist's job produces confident garbage. That is what is actually happening inside the "AI budgets" that are about to double. Firms buying general copilots and expecting purpose-built output are buying a bad intern and paying a full-time salary for the review cycles.
The firms that are getting value from AI in 2026 are doing two things differently. They are choosing tools built for the document types and workflows of private capital, not repurposed consumer copilots. And they are building review and audit layers into the tooling from day one, because non-determinism without observability is how quiet data errors propagate into LP reports and secondaries pricing.
GoodStream enters General Availability next month. If you have been evaluating copilots and quietly wondering why the accuracy never quite gets where you need it, it is not your prompt engineering. It is the wrong shape of solution. We should talk.
See specialized extraction agents on your own documents. Book a demo.



