Why We Build This
The bitter lesson of AI is simple: general methods that leverage computation win. Hand-crafted features lose. Scale wins. Cleverness loses.
The bitter lesson of software engineering is adjacent: the systems that win are the ones that make the right information available at the right time. Not the ones with the best abstractions. Not the ones with the cleanest code. The ones where context flows.
Every repository is two things at once.
The first is a collection of files. Functions, classes, configs, tests. This is what tools see. This is what AI reads.
The second is a record of decisions. Why the auth system was rewritten in Q3 2023. Why the scheduler lives in its own crate. Why nobody touches the migration directory without checking with the platform team first. Why that one file has 47 edge cases and a comment that says "do not refactor."
The first is executable. The second is invisible.
And the second is what actually determines whether you ship or spend three weeks debugging something that was solved in a PR two years ago.
We used to lose context slowly. An engineer leaves. Their mental model goes with them. The team adjusts. Tribal knowledge decays over months.
Now we lose it instantly.
An AI agent opens a pull request against a codebase it has never seen. It reads the files. It does not read the history. It proposes a change to a module that was deliberately frozen after three failed rewrites. The CI passes. The tests pass. The architecture breaks.
This is not an edge case. This is the default behavior of every AI coding tool on the market.
Rich Sutton argued that researchers who build hand-crafted knowledge into AI systems always lose to researchers who build systems that scale with computation. Chess engines that encoded grandmaster heuristics lost to engines that searched deeper. NLP systems built on linguistic rules lost to transformers trained on raw text.
The lesson applies to software infrastructure too.
Teams that hand-craft onboarding documents, maintain wikis, and write architecture decision records are doing the right thing. But they are doing it in a way that does not scale. The documents go stale. The wiki diverges from reality. The ADRs stop getting written after the third sprint.
The approach that scales is extraction, not authoring. Let the repository be the source of truth. Let computation do the work of surfacing what matters.
We think about this the way the industry thought about observability ten years ago.
Before Datadog, teams logged to files and grepped. Before Sentry, teams read stack traces in production logs. The infrastructure did not exist to make runtime behavior legible at scale.
Repository context is in that same pre-infrastructure era. Teams grep through git logs. They search Slack for that one thread where someone explained the deployment process. They ask the person who has been on the team the longest. If that person left, they guess.
This is not a tooling problem. It is an infrastructure gap.
Observability made runtime behavior queryable. Repository Context Infrastructure makes institutional memory queryable.
AI is not just another developer on the team. It is a contributor that operates without institutional memory. It sees the current state of every file. It does not see the trajectory of any decision.
As AI writes more code, the cost of missing context compounds. Every blind refactor. Every migration that ignores prior attempts. Every PR that touches a frozen surface.
The solution is not to make AI smarter. The solution is to make context available.
General methods that leverage computation win. Give the model the context. Let it reason. Stop trying to encode software engineering judgment into prompts. Encode the repository's history into a structured layer and let the model do what models do.
Onboard is not a documentation tool. It is not a search engine. It is not an embeddings pipeline.
It is infrastructure that converts repository evolution into structured, queryable, persona-aware context. For the new engineer who needs to understand what to read first. For the staff engineer who needs to see architectural fault lines. For the security reviewer who needs to map auth boundaries. For the AI agent that needs machine-readable institutional memory before it opens a pull request.
We ingest the full story: git history, pull requests, issues, code structure, churn patterns, ownership signals. We use extended reasoning to extract architectural inflection points, migration timelines, risk zones, and social ownership clusters. We generate context tailored to the reader.
Humans get clarity. Agents get structure.
We are betting that the next decade of software will be defined not by who writes the most code, but by who has the best context.
Code is becoming commoditized. Context is not.
The teams that win will be the ones where every contributor, human or machine, understands the repository they operate in. Not just the files. The history. The intent. The architecture. The risk.
That requires infrastructure.
We are building it.