Video summary
State of Agentic Coding #7 with Armin and Ben
Main summary
Key takeaways
Technological concepts & agentic coding realities discussed
Agentic coding “loops” vs terminal workflows
- Ben and Armin discuss how better model quality has made agentic coding substantially more capable over the past year.
- Despite this, they’re skeptical that most developers are using true always-on / fully autonomous “looping” systems.
- They reference the idea from Anthropic/OpenAI-adjacent practitioners that the goal shouldn’t be “prompting more,” but architecting systems that run loops and converge over time.
- Even so, they claim many practical workflows still resemble:
- human-in-the-loop terminal usage
- not Minority Report-style UI
- not large-scale autonomous agents
Quality, QA, and evaluation problems remain
- Even with bigger context windows, LLM code generation can still silently regress features or UI behavior.
- Comprehension/verification doesn’t automatically scale with code generation output.
- It’s difficult to QA what agents changed and why.
- Armin suggests that in “looping” environments, the bottleneck may shift to:
- human comprehension, not model ability
“Psychosis” / warped reasoning
- They describe “AI psychosis” less as dramatic hallucination and more as subtle perspective warping:
- agents generate plausible but not comprehensive hypotheses
- Example: agents propose many checks but miss the real root cause
Security research + token economics
- They connect AI coding’s rise to more automated security reporting and weaponization:
- Security tooling finds issues at scale, but teams face a “base budget” of security triage/research.
- They claim a feedback loop risk may be emerging:
- some open-source maintainers are getting repos “fixed” by bots/agents
- based on issues submitted by other AI tools
- creating changes and trust risks
Product/platform changes (Claude/Copilot/agent tools) emphasized
Token subsidies are unwinding
- Copilot is shifting from subscription-like usage to usage-based pricing, with bundled usage reportedly reduced.
- Claude-P / CLI agent execution is highlighted:
- Claude-P is no longer covered by the prior subscription model.
- Instead, Anthropic provides separate “programmatic agent credits” for CLI/SDK-style usage (including Claude via CLI/SDK, and potentially “ultra mode/workflow”).
- They interpret this as economic steering:
- features that drive higher spend are promoted
- they speculate providers value human-in-the-loop traces more than raw machine-to-machine execution, because those traces provide better training signal and product differentiation
Token spend becomes a dominant business metric
- The industry increasingly optimizes for token usage, since token spend correlates strongly with revenue.
- They compare this to past “metrics” (MAU/DAU), but argue tokens are more directly monetizable.
Reviews / guides / practical experiments mentioned
Local open-weight models progress
- They discuss experimentation progress for running models locally:
- hardware + quantization improvements sometimes allow local models to support coding-agent-style workloads at performance approaching cloud agents from ~a year earlier
- Example: Dwarf Star 4 (DS4) (DeepSeek-derived)
- described as an end-to-end packaged local coding model experience
- uses quantization and on-disk caches optimized for agentic coding
- reported performance claims:
- prefill: ~450 tokens/sec
- generation: ~25–26 tokens/sec
- performance reportedly stays relatively stable until a large fraction of context is filled
- Practical constraints:
- users may need high-end machines
- battery/heat can become limiting
- “Flash” vs “Pro” variants:
- flash is easier to run but requires more supervision
Claude/Coding agent behavior still needs supervision
- They recount attempts where agents refactor aggressively and produce unwanted changes, requiring careful review and rollback.
“Slop fork” / library rewrite discussion (Bun → Rust; performance + strategy)
“Slop forks” and token justification
- The talk revisits the idea that some projects are being “slop-forked”:
- they cite Bun
- a claim that the Bun creator’s company was acquired by Anthropic
- Bun being rewritten via agentic tools (“Robun”) from Zig to Rust
- They note the rewrite isn’t fully released yet, but prebuilt artifacts exist.
Why rewrite at all if it’s already successful?
- They acknowledge uncertainty (performative vs technical reasons), but suggest:
- evidence models can port large codebases while tests protect behavior
- a way to prove feasibility for enterprises doing legacy migrations
Contrast with typical adoption
- Some forks (e.g., “V-next”-style for Next.js) haven’t gained huge adoption, implying wholesale rewrites differ from incremental forks.
Language debate: Ruby/Zig “human languages” vs AI efficiency
- They argue LLMs tend to select languages that are easier for machines, even if they introduce worse trade-offs for humans.
- Ruby is called “worst for AI” due to:
- non-local state
- heavy runtime metaprogramming
- Zig is framed as “human-friendly,” but hard to guarantee stability in fully machine-driven contexts:
- memory management
- crash tendency
- Core takeaway: they believe the ecosystem is gradually eliminating languages designed primarily for humans in favor of machine-optimized choices.
Key analysis claims (what’s working / what’s not)
Software volume doesn’t mean user value
- They emphasize that many apps look like “agent harness multiplexers” built on strong UI/primitive libraries (diffs/trees/ghosty-like terminal views), not necessarily agentic innovation.
- Their argument: primitives matter more than agent wrappers.
Token factory vs sustainable open-source primitives
- They claim most money goes to token sellers, while primitive tool makers often monetize indirectly or through other products.
Fast fashion / shallow conviction
- Software built with AI may get discarded quickly if it doesn’t gain early traction.
- They encourage longer project commitment.
Named speakers/sources (as stated in the subtitles)
- Ben Vingar (Ben Vinegar)
- Armin (Armen Aroner / “Armaroner” in subtitles)
Additional contextual mentions:
- Boris Churnney (Anthropic)
- Peter Steinberger (OpenAI/OpenCL; “OpenClaw” referenced)
- and others including Jenssen, Copilot/Anthropic, Modem/Diffs, and Dwarf Star authors