Summary of "Почему AI генерит мусор — и как заставить его писать нормальный код"
High-level thesis
AI models can generate code quickly but often produce outputs that are lower-quality, more complex, insecure, and harder to maintain unless they are used inside an engineering process. The problem is not only the model itself but the process and context you give it.
Solution proposed: “Contextual engineering” — a repeatable, multi-stage workflow that produces maintainable, reviewable code from agents by controlling context, reducing noise, and adding quality gates.
Key problems and evidence
Empirical findings cited in the video:
- Carnegie Mellon (Nov 25)
- When ~800 GitHub repositories were processed by agent workflows, lines of code and commits spiked initially.
- Static-analyzer warnings rose by ~30% and code complexity increased by ~40% and stayed high — showing temporary speed but lasting technical debt.
- December report (referred to as “Cat Rabbit”)
- AI-generated PRs averaged ~10 problems per PR (≈1.7× worse).
- More critical/serious errors; XS vulnerabilities ~2.74× more frequent; performance issues ~8× more frequent.
- MIT (Jan 26)
- Example of a developer spending 9 hours due to agent dead ends — users had difficulty judging hallucinations and mistakes.
- Incident example
- An agent-built social network leaked API keys, emails, and private messages because no engineering controls were applied.
Market context
- Major model-makers (Anthropic, OpenAI, Google) are hiring hundreds–thousands of engineers and paying high salaries — indicating heavy investment in people and processes to control model outputs.
Core concept: Contextual engineering
Goal: maximize correctness and completeness of the model input while minimizing context-window size and noise.
High-level formula:
- Supply the most correct and complete data needed for the request
- Minimize noise and context size
- Get higher-quality output
Contextual engineering focuses on structuring work, roles, and quality gates so agents act like a team with narrow, well-defined contexts.
Four-stage workflow
A repeatable process recommended in the video:
1. Research
- Have agents scan the codebase and produce a factual, structured research document listing relevant files, functions, models, and integrations.
- Use multiple specialized subagents in parallel, each with a small, efficient context window, to avoid one large noisy pass.
- Output should be factual references and links to files/lines — no opinions or recommendations (to keep noise low for next stages).
2. Design
- Produce architecture artifacts: C4 (context/container/component/code), data-flow diagrams, sequence diagrams, ADRs (architectural decision records), and a test strategy.
- Use the research doc so the designer-agent works only with relevant scope.
- Perform human review (pair review recommended) to catch architectural errors early.
3. Plan
- Convert the approved design into a phased implementation plan with small, reviewable phases.
- For each phase, list files to create/modify, acceptance/quality criteria, and test cases.
- Keep phases small so each can be committed, tested, and gated independently.
4. Implementation
- Run a team of specialized agents (lead/coordinator, coder, security reviewer, QA/tester, architecture reviewer) — each agent has its own prompt and a narrow context window.
- Enforce per-phase quality gates: build passes, tests pass, linters (including complexity & cognitive metrics), security scans, and adherence to design/contracts.
- If a gate fails, reject the output and send it back to the implementer; don’t proceed until gates are satisfied.
- Keep commits local for human review before pushing; include generated docs and ADRs in the repository near the code.
Practical recommendations and patterns
- Don’t rely on a single universal prompt. Use specialized prompts per phase, per task type (feature vs bug), and per stack (backend vs frontend).
- Store team-specific prompts, standards, and examples in a “prompts” folder (architecture rules, testing rules, naming conventions, etc.).
- Use weaker/cheaper models for large-scale scanning subagents and stronger models for design and critical tasks to optimize tokens and cost.
- Treat agents like a team with defined roles and responsibilities; the lead coordinates but doesn’t write code.
- Automate CI to enforce quality (linters, tests, security, contract checks). Consider forbidding co-authoring policies in commits if licensing is a concern.
- Prefer self-hosted or otherwise controlled model deployments if data leakage is a concern.
- Keep architectural docs and ADRs next to code to retain decision history.
Live demo (illustrative example)
Task: add user avatar upload to a Go microservice, storing images on S3 with square crop and validation.
Tools/models used in demo:
- OPUS 4.6 (demo model)
- Agent frameworks supporting multi-agent teams (references to Claude’s team feature, Cursor, Copilot)
Steps executed:
- Research agent: scanned the repo and produced a factual “what/where/how” document listing the user model, controllers, adapters, etc.
- Design agent: produced C4 diagrams, data flow and sequence diagrams, risk analysis, API contracts, testing strategy, and storage model changes.
- Planning agent: split the feature into 8 implementable phases and mapped files to be created/changed.
- Implementation: spawned a multi-agent team (developer, test, security, architecture reviewer) executing phases in parallel; each phase passed quality checks. Code was committed but not pushed for human review.
Demo outcomes:
- Generated architecture and code within ~32 minutes.
- Agents flagged naming and async-processing issues; author manually suggested CDN and async resizing.
- Emphasized final human review and CI enforcement.
Why this matters
- Without process: fast but messy, insecure, and prone to high technical debt.
- Companies building models hire engineers not to write all code, but to design and control how models are used: architecture, review, QA, and integration.
- With contextual engineering: hallucinations and noise are reduced, correctness and maintainability increase, and outputs become safe to review and push to production.
Actionable checklist
- Create per-phase prompts and place them in a repo “prompts” folder.
- Implement the 4-stage pipeline: Research → Design → Plan → Implement.
- Use subagents for narrow tasks; choose models per role (trade cost vs capability).
- Produce and review C4/DFD/sequence diagrams and ADRs before coding.
- Break work into small phases with commitable units and CI quality gates.
- Ensure security scans, linters, tests, and contract checks run automatically.
- Keep generated design and docs alongside code; review everything manually before pushing.
Future topics promised
- Building multi-agent pipelines from scratch
- Internals of agent orchestration
- Practical RAG (retrieval-augmented generation) updates
Main speakers and sources cited
- Video narrator/presenter (unnamed in subtitles)
- Research and reports:
- Carnegie Mellon (Nov 25 study)
- December report referred to as “Cat Rabbit”
- MIT overview (Jan 26)
- Industry references: Anthropic, OpenAI, Google (hiring and investment context)
- Individuals cited: Boris Cherniy (head of KD Code — comments on engineers’ roles)
- Tools/models mentioned: OPUS 4.6, Claude (team/agent capabilities), agent frameworks, Cursor, Copilot
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.