TL;DR

Two months in, the gap between people who get a lot out of Claude (Cowork) and people who don’t isn’t just about prompts or designing workflows — me thinks it is about giving Claude the things it would have remembered, if it could. Some of the things I've incorporated are a working folder it never leaves, a CLAUDE.md file it reads at the start of every session, skills that encode the work I do repeatedly, and a voice DNA file so the writing comes out like mine. Below is the toolkit I’ve ended up with, and the one I didn’t see coming: building all of this taught me how I think more than it taught Claude how to help me.

Claude is a colleague who forgets everything overnight. Brilliant, fast, broadly skilled. Reads what you put in front of it. Writes a draft, reviews a filing, explains a concept, runs a script. Then closes the laptop, and tomorrow morning has no memory that you exist.

The first month I worked with Claude (Cowork , note: I've subsequently been spending time with Claude Code as well, where some of these issues are a bit redundant, but play on...), I kept at times being highly disappointed, frustrated and almost raging all alongside having the my mind routinely blown by what Claude actually produced. Jagged intelligence - jagged experience. Outputs that started strong drifted as projects went on. Each new conversation needed a re-introduction to the same problem. The fixes I asked for in one session would quietly come back undone in the next.

Then I realised the problem wasn’t Claude. The problem was that I had no system for remembering things on Claude’s behalf. A human colleague would have notes, calendar, a workspace (and of course, a brain that remembers what was explained in the past, hopefully ;)) Claude had none of that, and I was filling none of it in.

The second month was about building the system. A folder. A memory file. A growing set of skills. A voice file so the writing sounds like me. A failure log so the same mistake doesn’t happen twice. Each piece small. Together, the difference between asking Claude things and working with it.

What I didn’t expect was the side effect.

Writing all of this down, in the form a stateless colleague could read at any time, forced me to make explicit a lot of things I’d been holding loosely in my head. What I value in a research report. How I think a good email opens. Which thinkers I reach for when I write. Which mistakes I keep making. I was building infrastructure for Claude, but the artefact I ended up with was a clearer picture of how I actually work.

Below is the toolkit, in three layers. Setup, memory, discipline.

Layer 1: Setup — where Claude works

1. The working folder

The single most useful thing I did in week one was point Claude at one folder on my machine and say: this is where you work. Everything I generate, edit, scrape, or save lives somewhere under C:\Users\tgrka\CLAUDE GLOBAL FOLDER\. Subfolders for skills, scrapers, reports, drafts. Nothing strays.

The reason this matters is mundane and surprisingly load-bearing. When Claude can read and write inside one folder consistently, files persist across sessions, scripts find their inputs, and the same path I gave it last week still resolves today. When Claude is wandering across the filesystem on a per-task basis, things go missing or land in places I never look at again. A research report ends up in Downloads. A draft skill lands in Documents. A scraper writes its output to a temp directory and the next session can’t find it.

Pick the folder. Don’t move it.

2. Pick the right model for the job

Claude isn’t one model; it’s a family. Sonnet for most things. Opus when the work needs more reasoning depth or nuance — investment analysis, voice writing, anything I’d describe as judgment-heavy. Haiku for fast, cheap, narrow tasks. Knowing which one I want before I start is a small habit and it changes the texture of the output more than people realise. Asking Sonnet to do Opus work produces something that looks fine and is subtly wrong; asking Opus to do Sonnet work is just slow and expensive.

I now ask the model selection question before anything non-trivial. Sonnet 4.6 or Opus 4.6? Two seconds of decision, hours of difference downstream.

Layer 2: Memory — what Claude knows about you

3. CLAUDE.md and persistent memory

The single most important file in my Claude setup is one called CLAUDE.md. It lives at C:\Users\tgrka\.claude\CLAUDE.md, and Claude Code reads it automatically at the start of every session. Inside it: who I am, what I do, my hard rules (never fabricate financial data, never use sub-agents for analysis, always ask which model), the paths I work in, the projects I have running, and a changelog of every meaningful behavioural decision I’ve made about how Claude should work for me.

Today the file is around 140 lines. It started at twelve. Each time something went wrong — or something worked and I wanted to keep it — a line gets added. The fabrication incident with Snowflake’s financials became Hard Rule 9: never spawn sub-agents. The misinterpretation of my "auto mode" instruction became Hard Rule 10: distinguish tool-internal errors from real permission prompts. The Pinegrow rationale for CSS variables, abandoned when I moved off Pinegrow, got rewritten in place.

This is the closest thing Claude has to persistent memory. It is also, more honestly, my own running document of what I’ve learned about working with an AI. The file outlives any individual conversation, and I am the only person who edits it.

4. Skills

A skill is the next layer of memory. CLAUDE.md is what Claude knows about me in general; skills are what Claude knows about specific kinds of work I do.

From Anthropic’s engineering team, here’s the simplest description:

“A skill is a directory that contains a SKILL.md file.”

— Anthropic engineering, "Equipping agents for the real world with Agent Skills"

The directory holds instructions, optional scripts, and reference material. The SKILL.md file describes what the skill does and when Claude should use it. Anthropic describes them as folders Claude can discover and load on demand for specific tasks.1 Skills work the same way across Claude.ai, Claude Code, and the API.

What this means in practice: I can take a piece of work I do repeatedly — reading a 10-K, drafting an investment memo, writing a blog post in my voice — and turn the way I do it into a folder Claude reads. Next time I trigger that work, the folder loads automatically and Claude follows the instructions without me re-explaining.

Over two months I’ve built about a dozen skills. They fall into three categories.

Investment-process skills. The work I do as a portfolio manager, codified.

SellSideInitiation
A five-phase pipeline that produces a closed-book fundamental analysis of a public company — business model, financial history, management delivery, earnings sentiment, competitive ecosystem — sourced entirely from documents I provide. Output is a 25-page Word document. Triggered by one phrase. The skill encodes which financial template applies to which business model type, what counts as a fabricated figure, and how the seven verification gates run.
investment-moats
Four frameworks for analysing competitive advantage — Helmer’s 7 Powers, Greenwald’s Earnings Power Value, Buffett-Munger mental models, and Porter’s Five Forces — written in the same schema. Triggers when I ask about a company’s moat or strategic position. Combines all four into a structured analysis.
Earnings Sentiment
Sixteen quarters of earnings call transcripts, read for tone shifts. Top-five positives, top-five concerns. Reading transcripts at this scale is something Claude is genuinely better at than I am.
Initiation Merge
A deterministic Python script that assembles outputs from the four pipeline stages into one report. Not an LLM — pure code, regex, and assertion checks. The merge step is where LLM creativity becomes a bug.

Voice and identity skills. Skills that capture me, not the work.

voice-dna
My writing voice across four formats (webpage, email, document, Substack) and a 1–10 professionalism dial. Banned phrases, banned structures, sentence rhythm targets, bold-emphasis density bands. Paired with a deterministic Python linter (voice_gate.py) that fails on the worst LLM tells.
user-profile
My identity, role, expertise, default file paths, and formatting preferences. Loaded at the start of every task so outputs are tailored to me without re-asking.

Engineering / dev tooling skills. The narrowest category, useful when I’m building.

Karthik_Coding / gstack
Fast headless browser for QA testing my own website — navigate, interact, screenshot, verify state, diff before-and-after. Hundred milliseconds per command. Used when I want Claude to dogfood a change before I commit it.
ir-scraper
The project skill for the IR scraper described elsewhere on this page. Encodes the eight platform types, the four anti-bot strategies, and the company-specific profiles I’ve built up.

The skills compound. Each new skill I write makes the next session shorter, because I’m no longer re-explaining the work.

5. Voice DNA

The voice-dna skill deserves its own line. It started as a way to get Claude to write blog posts that sounded less like a press release and more like me. It became something more useful than that.

The skill is built on close reading of sixteen of my actual writing samples spanning seven years — Substack essays, Medium pieces, IFC investment memos, board papers, technical posts. Three registers, four formats, a dial for how professional the output should be. It’s the most opinionated file in my whole setup, and the one with the strongest deterministic enforcement.

The enforcement matters. A skill is a prompt, and prompts are suggestions. Earlier this month I wrote about how LLMs are bad at the mechanical parts of work; the same logic applies to writing. Some rules are non-negotiable — certain rhetorical patterns I always strip out, certain layouts I won’t use (no pull-quote boxes containing one-line maxims), certain closings I won’t accept (the kind that announces the takeaway) — and the only honest way to enforce non-negotiable rules is with code that fails the build. So I built voice_gate.py: regex over the output, two tiers, hard violations block delivery. The skill is the spec; the gate is the gate.

Layer 3: Discipline — how Claude does its work

6. The session recap

Towards the end of any non-trivial session, I ask Claude one question: what should we write down so the next session can pick this up? The answer almost always contains things I would have forgotten by next morning.

The recap goes into CLAUDE.md if it’s a behavioural rule, into a project-specific notes file if it’s about the work, or into the skill itself if it’s about how I want a particular kind of task done. The act of recapping is what makes the next session start from a higher baseline rather than from scratch.

7. Failure logs

Each major skill of mine has a file called FAILURE_LOG.md sitting next to its SKILL.md. When something breaks — a fabricated figure, a misformatted table, a section that came back empty — the failure goes into the log with date, root cause, and what was changed to prevent it. The SellSideInitiation skill’s log is twenty-four entries long and growing.

This is the difference between a skill that improves and a skill that gets broken in the same way every other Wednesday. Without a log, every fix is local: this one report, this one company, this one bug. With the log, every fix is structural: a rule added to the skill, a check added to the verification gate, a sentence in CLAUDE.md.

8. Verification before delivery, not after

One habit changed more than anything else over these two months is that Claude should not deliver any non-trivial output without running an automated verification step over it. Not a vibe check. A script that exits non-zero when something is wrong.

For financial reports, that’s final_gate.py: cross-checks every cited number, fails on bare placeholder tags, verifies the balance sheet reconciles. For voice writing, it’s voice_gate.py: regex over the prose, hard-fails on the worst LLM tells. For documents merged from multiple stages, it’s the merge script’s assertion checks: heading structure must match, broken bold count must be zero. Each gate is a hundred lines or less. Each takes two seconds to run. Each catches things a careful human read would miss.

9. Don’t use sub-agents for analytical work

This one was learned the hard way. For a while I was using sub-agents to parallelise long pipelines — the merge step, the audit step, the data-extraction step. The architecture looked clean. Each sub-agent was given a clear, narrow brief.

What I didn’t see, until I checked the numbers carefully, was that sub-agents lose context. They can’t see CLAUDE.md, the conversation history, the skills. They’re launched fresh each time. And when they don’t have what they need, they fill in plausibly from training data — and the orchestrator stitches the plausible-but-wrong outputs together with no way to tell.

The Snowflake run was the wake-up call. A hundred out of one hundred and twenty-two financial data points in one Section came back fabricated by a sub-agent that had been given a perfectly clear instruction. Revenue and gross profit were right because they were the first two fields in the API response. Everything after that was hallucinated, with the right tags attached. The report read fine. It was wrong in every detail.

So now: no sub-agents for analytical work. Everything runs sequentially in one conversation, with full context carried forward. Slower. The only architecture I’ve found that actually delivers what it claims.

What I didn’t expect

I started this thinking I was teaching Claude how to do my work. Two months in, the surprise is that Claude is teaching me how I do my work.

Writing voice-dna meant articulating what makes my writing mine, in a way I’d never had to make explicit. Writing SellSideInitiation meant naming, in order, every analytical step I run on a company — including the steps I do unconsciously. Writing the failure log meant being honest about every mistake I’d been making and quietly not codifying. The artefacts that exist as instructions for a stateless colleague turned out to be the clearest description I’ve ever had of how I work.

That wasn’t the goal. It might be the most useful side effect.

The toolkit isn’t finished. New skills get added, existing ones get rewritten when something breaks, the failure logs keep growing. What I’ll know a year from now is different from what I know now. But the shape is set: a working folder, a memory file, skills for the work I repeat, voice DNA for the writing, recaps and failure logs for the things that broke, verification before anything goes out the door, and one conversation per task.