You cannot automate a process you haven't first understood.

I had been wanting to build a structured sell-side research pipeline for over a year. The trigger was a simple frustration: seventeen browser tabs open, three broker notes partially read, an earnings call transcript running in the background — and I still couldn't tell you in one sentence what had actually changed for the company. The information was there. The synthesis wasn't.

The question I kept asking was: can Claude do this? The better question, I eventually realized, was: can I describe what "this" is precisely enough that anyone — or anything — could follow the instructions?

That realization is the starting point for everything that follows.

The Process Has to Exist Before It Can Be Automated

There's a principle in software engineering — old enough to be almost cliché — that goes: do one thing well. The Unix pipe philosophy. cat file | grep keyword | sort | uniq. The power isn't in any single command. It's in the composition — each stage does exactly one job, hands clean output to the next, and doesn't try to do more than it should.

When I started building this pipeline, I kept hitting the same wall. I couldn't instruct Claude on Stage 3 until I had understood Stage 2 completely. And Stage 2 kept breaking down until I was honest about what Stage 1 was actually producing.

Building the pipeline forced me to define the process. And defining the process forced me to understand what I actually valued in research.

That turned out to be the most useful thing about the exercise — not the pipeline itself, but the thinking it forced.

The Five Stages

The pipeline, as I now run it, has five stages. Each does one thing.

Stage 1: Intake. Scraping earnings call transcripts, broker notes, IR site announcements, and guidance updates. Claude can do this — but only if you've built a reliable scraper for each source. The first version of my scraper handled three companies. I now have it running on fifteen. Each new company IR site is a small engineering problem in itself.

Stage 2: Classification. What is this document? Which quarter? Which company? Is this a management transcript or a broker note? A guidance revision or an analyst initiation? Classification sounds trivial. It isn't — especially when a single broker publishes twelve documents on the same company in a month.

Stage 3: Extraction. This is where Claude earns its place. Given a classified document, it pulls the structured information I care about: revenue guidance versus actuals, gross margin commentary, management language around demand signals, any notable changes in tone. The output is a structured JSON — not prose.

Stage 4: Synthesis. This is the hardest stage. Given extractions from Stage 3 across five quarters and three brokers, what has actually changed? What does the broker consensus assume that differs from what management is signaling? Synthesis requires a thesis. Claude can only compare against what you tell it to compare against — it has no prior context unless you give it some. In practice, I've found that writing a one-paragraph "current thesis" for each company and feeding it in with every synthesis request produces dramatically better output.

Stage 5: Flagging. What needs human judgment? A guidance miss of 2% is noise. A CEO who used the word "challenging" three times in a single paragraph when they've never used it before — that needs a human. I've built a simple set of flags for material changes in language and significant deviations from prior guidance. Everything else gets deprioritized.

How It Works, Precisely

Stage 3 (extraction) works because I stopped asking Claude to summarize and started asking it to fill a schema. The distinction sounds minor. It isn't. A summary is prose — variable in structure, inconsistent across documents, impossible to compare programmatically. A schema forces the output into predictable fields: management_revenue_guidance, gross_margin_commentary, demand_signal_language, tone_flags. When Claude fills the same schema against every document, the extractions from five consecutive earnings calls line up column by column. That's what makes Stage 4 possible. JSON output, not prose, is what lets the pipeline actually pipeline.

The synthesis prompt (Stage 4) has a specific structure that took several iterations to get right. It opens with the current investment thesis — one paragraph, written by me, updated whenever my view materially changes. Then it feeds in the extraction outputs from the last four to six quarters, followed by the most recent broker note extractions. The instruction to Claude is precise: "What has changed materially against the thesis? Where does the data support it, and where does it challenge it? Flag anything that should change my view." Claude doesn't decide whether the thesis should change — that's my call. It surfaces the comparison. The quality of that comparison is almost entirely a function of how well-specified the thesis paragraph is. A vague thesis produces a vague synthesis.

The classifier (Stage 2) was the most technically interesting problem to solve. Broker notes don't announce what they are. An initiation of coverage looks like a routine update until you read the first paragraph. A guidance revision looks like a quarterly note until you check the date. I ended up building a two-pass approach: the first pass extracts stated metadata (company name, quarter references, broker name, document date); the second pass runs a semantic check for signals that distinguish document types — language patterns characteristic of initiations versus updates versus thesis revisions. Documents that fall below a confidence threshold on the second pass go into a human review queue rather than continuing downstream. The queue is small now — two or three documents a week at most — but building that fail-safe before I trusted any downstream output was not optional. The whole pipeline's integrity depends on Stage 2 being right.

What I Got Wrong (And What Fixed It)

My first version tried to do all five stages in one prompt. The output was unusable. The mistake was thinking that Claude's capability was the constraint. It wasn't. My problem specification was the constraint.

Stage 4 (synthesis) was the specific failure point. I was asking Claude to synthesize without a thesis. What I got back were accurate summaries — which I could have produced myself by reading more carefully. What I needed were comparisons: to my prior view, to consensus, to what management had said four months ago.

I also underestimated the importance of Stage 2 (classification). If a broker note gets misclassified — labeled as a quarterly update when it's actually an initiation of coverage — Stage 3 will extract numbers against the wrong template. Garbage in, garbage out. I spent three weeks building a robust classifier before I was willing to trust Stage 3's output.

The version I run today is the sixth iteration. The first five were learning.

What the Pipeline Can't Do

It can't catch what you didn't ask for. If there's a subtle regulatory risk buried in paragraph fourteen of a management transcript — and you haven't told the pipeline to look for regulatory language — it won't surface it. The pipeline is only as good as the questions it's been trained to ask.

It has no variant perception. It can tell you what the consensus says and what management says. It can't tell you whether either of them is wrong. That's still your job.

And it doesn't know what you know. If you spent forty-five minutes on a call with the CFO last month and came away with a specific concern about working capital discipline — that context lives in your head, not in the pipeline. The synthesis in Stage 4 is only as good as the thesis you bring to it.

Where I've Landed

The pipeline doesn't replace research. It replaces the parts of research that were never really research — the administrative labor of locating, organizing, and summarizing documents that were always going to say roughly the same thing.

What it freed up was time and attention for the parts that actually matter: the call with management, the cross-portfolio pattern you notice after reading three companies in the same vertical, the hunch worth chasing.

The seventeen browser tabs are down to four. That felt like the win.