Gemini 3.1 Pro Lands With 77.1% ARC-AGI-2 — Google's Biggest Reasoning Jump Yet

Google announced Gemini 3.1 Pro on February 18, 2026 — a significant mid-cycle reasoning upgrade that represents the first time Google has used a .1 version increment rather than the .5 it used for Gemini 2.5. That naming shift is not cosmetic: it signals that this release is specifically a targeted upgrade to reasoning depth, not a broad feature refresh. Google describes Gemini 3.1 Pro as built "for tasks where a simple answer isn't enough," positioning it as the go-to model when Gemini 3 Pro's standard intelligence runs into its ceiling on hard, multi-step problems. The model rolls out to the Gemini app, NotebookLM, Google AI Studio, and Vertex AI simultaneously for Google AI Pro and Ultra subscribers.

The headline number is ARC-AGI-2. Gemini 3 Pro — itself already a strong model — scored in the mid-to-high 30s on ARC-AGI-2, a benchmark designed to test genuine abstract reasoning on novel visual puzzles with no prior training distribution. Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, which Google describes as "more than double the reasoning performance of 3 Pro." To put that in context: ARC-AGI-2 was specifically designed to be hard for current AI systems, and most frontier models score well below 50% on it without specialized scaffolding or code execution. Gemini 3.1 Pro reaching 77.1% is a standout result, placing it substantially ahead of the broader field on this particular test of flexible, generalizable reasoning. Even Gemini 3 Deep Think — the compute-intensive parallel reasoning mode — scored 45.1% with code execution at its December 2025 launch, making 3.1 Pro's 77.1% figure a striking jump when compared on an apples-to-apples basis.

Google frames 3.1 Pro's improved intelligence in three practical dimensions: visual explanation of complex topics, data synthesis into unified views, and bringing creative projects to life. The visual explanation angle speaks to one of the model's key strengths — generating clear diagrams, annotated charts, and structured visual outputs that make complex information accessible rather than just dumping text. For data synthesis, 3.1 Pro can take disparate inputs — spreadsheets, documents, research papers, structured databases — and produce unified analyses that find the right connections rather than just summarizing each source in isolation. The creative dimension means the model can sustain coherent, high-quality output across longer generation tasks like writing, design briefs, or multi-phase project planning.

NotebookLM is one of the primary deployment targets for 3.1 Pro, and the pairing makes intuitive sense. NotebookLM is Google's AI-native research and note-taking tool that allows users to upload documents, PDFs, papers, and transcripts and then converse with them, generate summaries, and build structured notebooks from large corpora of material. The limits of earlier NotebookLM experiences were often tied to the reasoning depth of the underlying model — getting shallow summaries instead of genuine synthesis, missing non-obvious connections between sources, or failing to draw correct inferences across many documents simultaneously. With 3.1 Pro's substantially upgraded core reasoning, NotebookLM users can expect meaningfully better analysis, more accurate cross-document connections, and deeper explanations of technical material. For researchers, lawyers, analysts, and students who use NotebookLM as a core research tool, this is one of the most immediately practical upgrades in the product's history.

The .1 naming convention is itself notable and worth unpacking. Google's previous two generations both used a .5 increment — Gemini 2.5 arrived mid-year as a meaningful but incremental update to 2.0. By using .1 for this upgrade, Google is signaling a tighter iteration cycle: rather than waiting six months for a major mid-cycle release, the team is shipping targeted capability improvements as soon as they are ready. This reflects a broader shift across the AI industry in early 2026, where the pace of model releases has accelerated to the point where major labs are shipping significant updates every few weeks rather than every few months. For users, the practical implication is that the model available to them today may be meaningfully better than what was available last month — and that cadence is likely to continue through the rest of 2026.

Availability for Gemini 3.1 Pro is rolling out now across multiple surfaces. In the Gemini app, it is accessible to Google AI Pro and Ultra subscribers as a selectable model in the standard model dropdown. In NotebookLM, it powers the research and synthesis capabilities for the same Pro and Ultra tiers. Google AI Studio exposes it directly for developers via the API, and Vertex AI integrations give enterprise customers access through Google Cloud's managed ML infrastructure. This breadth of simultaneous deployment — consumer app, prosumer research tool, developer API, and enterprise cloud all at once — reflects Google's advantage in having deeply integrated AI across its ecosystem rather than offering models through a standalone API alone.

Placed in the context of the February 2026 model race, Gemini 3.1 Pro lands as one of the strongest single-model reasoning announcements of the month. MiniMax M2.5 hit 80.2% on SWE-Bench for coding. GLM-5 broke the 50-point barrier on the Artificial Analysis Intelligence Index. Claude Sonnet 4.6 became Anthropic's default workhorse. But on ARC-AGI-2 — the benchmark that the research community most closely watches as a proxy for genuine generalization beyond training data — Gemini 3.1 Pro's 77.1% is the highest published score by a standard (non-specialized) foundation model as of this writing. For teams whose hardest challenges involve novel reasoning, abstract synthesis, or visual explanation of complex concepts, Gemini 3.1 Pro makes a compelling case for moving to the top of the evaluation list right now.

Gemini 3.1 Pro: Google's Reasoning Leap That Doubles ARC-AGI-2 Performance

Share this article

Gemini 3.1 Pro: Google's Reasoning Leap That Doubles ARC-AGI-2 Performance

Share this article