The game-changing news
Gemini 2.0 is not an incremental update. The three main changes: native multimodality (the model processes text, image, audio and video in a unified way, not through stacking), better tool use (calls functions and uses external tools with much higher precision), and real-time output in voice and image generation.
The family launches in three tiers: Flash (speed, ~$0.10/M tokens), Pro (balance), Ultra (maximum capability, more limited).
1 million tokens: what it really means
Gemini 2.0 maintains the 1M token context window introduced in 1.5 Pro, but Google improved memory and recall: in their internal "needle in a haystack" benchmark, recall improves significantly at long context.
In practice: more reliable for use cases that require effectively "finding" information at the end of the context.
Native multimodality: the real differential
The previous generation of multimodal models (GPT-4 Vision, Claude 3) processed image and text but the model had been trained primarily on text and modalities were stacked. Gemini 2.0 was trained from scratch as multimodal.
Practical impact: cross-modal understanding ("describe what's in this image and link it to this paragraph") is more natural and precise. Image generation as direct output (not just call to DALL-E) is faster and more contextually integrated.
Real benchmarks and their limitations
Gemini 2.0 leads or ties leaders in MMLU, GSM8K, MATH and HumanEval. But the most important: multimodal benchmarks (MMMU, MathVista) where it really pulls ahead.
Limitation of public benchmarks: they don't measure what matters most in real applications — instruction following, consistency, real production capability. Gemini 2.0 here is competitive but doesn't clearly lead vs Claude or GPT.
vs. GPT-4o and Claude 3.5: who wins?
The answer is "it depends on the case": Voice in real time: GPT-4o leads. Long code with reasoning: Claude 3.5 Sonnet leads. Massive multimodality (video + text + audio): Gemini 2.0 leads. Workspace integration: Gemini 2.0 obvious.
For most companies, the choice depends on existing ecosystem: who's already on Google Workspace will gravitate toward Gemini; who's on Microsoft 365 will go with Copilot (OpenAI underneath); who's neutral will choose by specific case.