High CapacityKyle Chan2025-12-05

AI agents and the 90% problem

Analiza AI (Claude Code)

W kolejce do triage'u — analiza pojawi się po najbliższym przebiegu (Claude Code).

Treść źródłowa

Image created with Google DeepMind’s Nano Banana modelToday’s top AI models are mind-blowingly good at certain tasks. They can achieve gold-level performance on the International Math Olympiad. They can beat out all but the best human programmers on coding tests. Math and science benchmarks are quickly getting saturated. And yet, where is the AI revolution we were promised?Top AI models like DeepSeek-V3.2 perform strongly on difficult math and coding tests. Chart: DeepSeek-V3.2 Technical Report2025 was supposed to be the year of agents. Autonomous online systems with the ability to book flights, plan your next trip, schedule appointments, and complete mundane paperwork were supposed to finally prove the real-world transformative effects of generative AI. On paper, AI models are getting really good at agentic tasks, with ever more sophisticated tool-calling capabilities over increasingly long time horizons.But when we look at how AI is really being used, it’s still relatively limited. A recent study (below) revealed that ChatGPT is still mostly being used for very basic stuff, like personal advice and text editing. And much-hyped agentic AI browsers like ChatGPT Atlas and Perplexity’s Comet don’t seem to be catching on, perhaps due to their mediocre performance.ChatGPT is mostly used for “practical guidance” and “writing,” according to a recent study. Chatterji et al. 2025. How People Use ChatGPT. NBER working paper.There’s a similar story for business users. When you look at the set of “1,001 AI use cases from Google Cloud customers,” it’s still mostly customer service agen…