Buttdarling: Rise of the Autonomen

Monday, December 29, 2025

Rise of the Autonomen

A new result from the AI evaluation nonprofit METR has pushed the conversation around autonomous AI systems into new territory. According to METR’s latest reporting, Claude Opus 4.5 has achieved the longest “time horizon” the group has ever measured on its autonomy benchmark, with a 50 percent success point at roughly four hours and forty nine minutes. On the surface, that sounds like a dramatic leap toward machines that can work independently for long stretches. The reality is more nuanced, but no less significant.

https://www.msn.com/en-in/technology/artificial-intelligence/five-hours-of-expert-level-autonomy-metr-s-claude-opus-4-5-s-crazy-results/ar-AA1SNP9z?ocid=BingNewsVerp

Anthropic has a 2-hour engineering take-home test. It says its new Claude 4.5 model outscored every human who took it.

Buttdarling

Monday, December 29, 2025

Rise of the Autonomen

No comments:

Post a Comment

Superintelligence and the mass murder-suicide option

Followers

Report Abuse