Gemini 2.0: Google counters with native multimodality

The game-changing news

Gemini 2.0 is not an incremental update. The three main changes: native multimodality (the model processes text, image, audio and video in a unified way, not through stacking), better tool use (calls functions and uses external tools with much higher precision), and real-time output in voice and image generation.

The family launches in three tiers: Flash (speed, ~$0.10/M tokens), Pro (balance), Ultra (maximum capability, more limited).

1 million tokens: what it really means

Gemini 2.0 maintains the 1M token context window introduced in 1.5 Pro, but Google improved memory and recall: in their internal "needle in a haystack" benchmark, recall improves significantly at long context.

In practice: more reliable for use cases that require effectively "finding" information at the end of the context.

Native multimodality: the real differential

The previous generation of multimodal models (GPT-4 Vision, Claude 3) processed image and text but the model had been trained primarily on text and modalities were stacked. Gemini 2.0 was trained from scratch as multimodal.

Practical impact: cross-modal understanding ("describe what's in this image and link it to this paragraph") is more natural and precise. Image generation as direct output (not just call to DALL-E) is faster and more contextually integrated.

Real benchmarks and their limitations

Gemini 2.0 leads or ties leaders in MMLU, GSM8K, MATH and HumanEval. But the most important: multimodal benchmarks (MMMU, MathVista) where it really pulls ahead.

Limitation of public benchmarks: they don't measure what matters most in real applications — instruction following, consistency, real production capability. Gemini 2.0 here is competitive but doesn't clearly lead vs Claude or GPT.

vs. GPT-4o and Claude 3.5: who wins?

The answer is "it depends on the case": Voice in real time: GPT-4o leads. Long code with reasoning: Claude 3.5 Sonnet leads. Massive multimodality (video + text + audio): Gemini 2.0 leads. Workspace integration: Gemini 2.0 obvious.

For most companies, the choice depends on existing ecosystem: who's already on Google Workspace will gravitate toward Gemini; who's on Microsoft 365 will go with Copilot (OpenAI underneath); who's neutral will choose by specific case.