Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

1 item

TELEGRAM SOLIDOTJun 15, 2026

AI Math Problem-Solving Still Falls Short of Human Experts in Rigorous First Proof Test

The First Proof project tested four AI systems on ten original, unpublished research-level math problems created by mathematicians for this purpose. All problems were never included in any model's training data, and solutions were scored by anonymous expert reviewers from relevant fields. The AI responses showed frequent hallucinations and a critical absence of literature citations, failing to reference any sources. The evaluation confirmed that current reasoning models cannot yet match top human mathematicians. This was the first assessment to simultaneously satisfy three key standards: frontier math problems, no training data leakage, and expert human evaluation.