Stubsack: weekly thread for sneers not worth an entire post, week ending 27th July 2025

BlueMonday1984@awful.systems · 7 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th July 2025

scruiser@awful.systems · 3 days ago

So this blog post was framed positively towards LLM’s and is too generous in accepting many of the claims around them, but even so, the end conclusions are pretty harsh on practical LLM agents: https://utkarshkanwat.com/writing/betting-against-agents/

Basically, the author has tried extensively, in multiple projects, to make LLM agents work in various useful ways, but in practice:

The dirty secret of every production agent system is that the AI is doing maybe 30% of the work. The other 70% is tool engineering: designing feedback interfaces, managing context efficiently, handling partial failures, and building recovery mechanisms that the AI can actually understand and use.

The author strips down and simplifies and sanitizes everything going into the LLMs and then implements both automated checks and human confirmation on everything they put out. At that point it makes you question what value you are even getting out of the LLM. (The real answer, which the author only indirectly acknowledges, is attracting idiotic VC funding and upper management approval).

Even as critcal as they are, the author doesn’t acknowledge a lot of the bigger problems. The API cost is a major expense and design constraint on the LLM agents they have made, but the author doesn’t acknowledge the prices are likely to rise dramatically once VC subsidization runs out.

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th July 2025

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th July 2025

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 20th July 2025 - awful.systems - awful.systems