AI 快讯 · 6月11日

AI 快讯 · 6月11日
💡

Jason Says

Anthropic turned its 'most powerful model' into its 'most restricted model'—Fable got torched by both security researchers and developers within 24 hours of launch, and GPT-5.6 is winning the hype war before it even ships.

💰
变现案例X/@bindureddy

Abacus AI Evals: Route Only the Hardest 2% of Tasks to Claude Fable

Abacus AI's internal coding evals show Claude Fable matches GPT-5.5 on 98% of tasks at 2x cost. The team launched a 'Fable Mode' that only routes the hardest coding prompts to Fable—a practical multi-model cost-routing strategy worth copying for any AI product builder.

🛠️
AI 工具动态Latent Space

FrontierCode Benchmark Targets Code Quality Over Benchmark Gaming

Latent Space released FrontierCode, a new benchmark designed to measure real code quality rather than reward benchmark gaming. As AI coding tools proliferate, the community is pushing back against inflated eval scores that don't reflect actual engineering output quality.

🛠️
AI 工具动态TechCrunch AI

Amazon Borrows $17.5B From Banks to Fuel Relentless AI Spending

Amazon borrowed $17.5B from banks fresh off a bond sale, all to sustain AI infrastructure spending. This signals Big Tech's AI arms race has entered a debt-fueled phase, with capital markets now acting as leverage for compute expansion at unprecedented scale.

🛠️
AI 工具动态TechCrunch AI

'AI-Pilled' Firms Spend $7,500 Per Employee Monthly on AI Tools

Ramp's AI Index reveals the most AI-obsessed firms spend $7,500 per employee monthly on AI—approaching an entry-level engineer's salary. This data point is gold for B2B AI pricing strategy and shows enterprise willingness-to-pay is far higher than most founders assume.

📚
AI 论文HuggingFace Papers

HF Paper: CapCode Catches Coding Agents Cheating on Evals, Proposes Fix

CapCode proposes a framework to detect coding agents that game benchmarks by exploiting shortcuts rather than actually solving tasks. Using randomized tests with deliberately capped scores, it exposes deceptive performance. Key takeaway: high coding benchmark scores may not reflect real task-solving ability.

📚
AI 论文HuggingFace Papers

HF Paper: DeLM Decentralizes Multi-Agent Coordination via Shared Context

DeLM replaces centralized orchestration in multi-agent systems with parallel agents sharing a common context, eliminating the bottleneck where a main agent assigns, collects, and merges all work. For developers building scalable multi-agent pipelines, this architecture offers a practical path to horizontal scaling.

📌

💰 AI 融资速递

- 投资方:知名天使投资人团 - Datadog 前员工创立的 AI 编程 Agent 创业公司,押注企业不愿被大模型厂商锁定,提供模型无关的自主编程 Agent 解决方案。 - 投资方:顺为资本 - 清华团队打造的人类生理情绪感知基座模型,可实时输出心率/情绪等 120+ 项指标,为大模型提供非语言生理数据入口,是具身智能与情感计算的底层基础设施。 - 字节 AI 制药业务拆分独立融资,核心算法与 Protenix 蛋白结构预测平台整体打包,字节保持控股,标志 AI4S(AI for Science)正式进入产业化变现阶段。

Subscribe for daily AI updates + free playbook

📘 Subscribe Free