Good morning, AI enthusiasts.Ā
Its new AI co-mathematician just achieved a record-breaking score on one of the toughest math benchmarks designed to challenge AI for years. In one remarkable case, a professor even solved a previously unsolved problem using an approach hidden within a proof that the systemās own reviewers had initially dismissed.
In todayās insights:
š§ Google DeepMindās AI co-mathematician
ā” Automate repetitive tasks with Codex
Google DeepMind Just Built an AI Co-Mathematician
DeepMind has unveiled a new research paper detailing its AI co-mathematician- an agentic system powered by Gemini 3.1 thatās designed to assist mathematicians in solving complex, unsolved problems. The system just achieved a record-setting performance on a benchmark focused on advanced research-level mathematics.
The details:
š¹ DeepMind took inspiration from AI coding tools like Claude Code, applying collaborative agent workflows and built-in review systems to mathematical research.
š¹ A central coordinator agent divides problems into multiple parallel research tracks, while specialized sub-agents handle coding, literature review, and proof generation.
š¹ Oxford mathematician Marc Lackenby solved an open problem from the Kourovka Notebook after discovering what he described as a āreally, really clever proof strategyā hidden inside one of the AIās rejected outputs.
š¹ On Epoch AIās FrontierMath Tier 4 benchmark, the system reached 48% accuracy- more than doubling Gemini 3.1 Proās standalone score of 19%.
Why it matters: AI is already accelerating breakthroughs in mathematics, and agentic systems are pushing those capabilities even further- much like what happened with coding AI. But cases like Lackenbyās highlight the bigger picture: the most powerful future for AI may be one where it amplifies human expertise instead of replacing it.
In this guide, youāll learn how to use Codexās Computer Use feature to handle tedious, repetitive tasks automatically on both Mac and Windows.
Step-by-step:
š¹ Open Codex, head to the Plugins section, enable the Computer Use plugin, and create a new task.
š¹ Open the permissions settings, switch from āDefault permissionsā to āFull access,ā approve the prompts, and give Codex a real workflow to handle.
š” Example prompt:
āOpen Chrome and debug the UI issue on this webpage Iām building: http://localhost:3000/. Navigate through the app, reproduce the bug I described, and explain what you think is causing it. If youāre unsure before making changes, ask first.ā
ā” Pro tip: Computer Use isnāt just for coding. Codex can also automate repetitive tasks across desktop apps - from Photoshop exports and Premiere Pro cleanup to batch file renaming and other workflow-heavy tools.
Thatās it for today.
The AI space doesnāt slow down - and neither should your thinking.
See you in the next drop
