GPT-5.5 vs Gemini 3.5 vs Claude Fable 5: AI Models Compete as Robotaxis Roll Out and AI-Designed Vaccines Enter Human Trials

This week in tech, OpenAI shipped GPT-5.5 and Google launched Gemini 3.5 Flash, while Anthropic brought out Claude Fable 5 — widening the gap between leading AI labs. At the same time, Rivian promised supervised self-driving this year, Uber and Wayve prepared London robotaxis, and Cambridge researchers advanced an AI-designed universal coronavirus vaccine into human trials. What looks like three separate tracks is actually one story: intelligence is becoming cheaper, more capable, and more embedded in physical systems.

The AI Front: Three Labs, Three Different Bets

If you only read one section of this week in tech, make it this one. The major AI labs are no longer competing on a single dimension; they are racing across autonomy, coding, safety, and scientific reasoning simultaneously. The result is a genuinely crowded field of capable models — and a market that is starting to show which bets are paying off.

OpenAI Ships GPT-5.5 With an Agentic Focus

OpenAI’s April release of GPT-5.5 is perhaps the most significant model update so far this year. The company bills it as a step toward “a new way of getting work done on a computer,” and early benchmarks support that framing. On Terminal-Bench 2.0, a benchmark that tests complex command-line workflows requiring planning, iteration, and tool coordination, GPT-5.5 scores 82.7%, beating GPT-5.4’s 75.1% and Claude Opus 4.7’s 69.4%. On Expert-SWE, OpenAI’s internal eval for long-horizon coding tasks with a median human completion time of twenty hours, GPT-5.5 outperforms GPT-5.4. On BrowseComp, which measures how well a model can follow complex browsing instructions, GPT-5 hits 84.4% against Claude Opus 4.7’s 79.3%.

The numbers matter, but the behavior shift matters more. GPT-5.5 is described as better at holding context across large codebases, reasoning through ambiguous failures, checking assumptions with tools, and carrying changes through surrounding code. Dan Shipper, CEO of Every, called it “the first coding model I’ve used that has serious conceptual clarity.” An engineer at early-access partner NVIDIA went further: “Losing access to GPT-5.5 feels like I’ve had a limb amputated.”

OpenAI also emphasized efficiency. GPT-5.5 matches GPT-5.4’s per-token latency while using fewer tokens to complete the same Codex tasks. On Artificial Analysis’s Coding Index, it delivers state-of-the-art intelligence at roughly half the cost of competitive frontier coding models. That combination — better, faster, cheaper — is what turns a model release from a benchmark update into a shift in how software teams actually work.

Google Launches Gemini 3.5 Flash for Agentic Scale

Google’s answer arrived on May 19 with Gemini 3.5 Flash, and the takeaway is speed at frontier scale. Google claims 3.5 Flash delivers four times the output tokens per second of other frontier models while rivaling large flagship models on agentic and coding benchmarks. On Terminal-Bench 2.1 it scores 76.2%, on GDPval-AA it achieves 1656 Elo, and on MCP Atlas it hits 83.6%. It also leads in multimodal understanding with 84.2% on CharXiv Reasoning.

Where Google is trying to differentiate is in deployment at scale. The model is already the default for the Gemini app and AI Mode in Google Search globally. More importantly, Google is positioning 3.5 Flash inside an agent-first platform called Antigravity, where it can deploy collaborative subagents — a builder agent and a player agent, working in rapid self-improvement loops, for example, to develop a game in six hours. Shopify is using subagents in parallel to analyze complex data for merchant growth forecasts. Macquarie Bank is piloting 3.5 Flash to accelerate customer onboarding by reasoning over 100+ page documents. Salesforce is integrating it into Agentforce for multi-subagent enterprise automation.

Google’s bet is that the winning model is not just the smartest in isolation, but the one that can orchestrate other models and tools reliably over long horizons. With Antigravity, 3.5 Flash is being tested as the conductor of an orchestra rather than the soloist.

Anthropic’s Claude Fable 5 and the Mythos Class

Anthropic’s June announcement of Claude Fable 5 and Claude Mythos 5 is probably the most strategically interesting move of the three. Fable 5 is described as exceeding the capabilities of any model Anthropic has ever made generally available, with state-of-the-art performance on nearly all tested benchmarks of AI capability. The longer and more complex the task, the larger Fable 5’s lead over Anthropic’s other models.

Safety, however, is the headline. Anthropic is launching Fable 5 with conservative safeguards: in some areas, queries will instead receive a response from the next-most-capable model, Claude Opus 4.8. The safeguards trigger in less than 5% of sessions on average, but the point is that Anthropic is deliberately capping capability in the general-release version. For a small group of cyberdefenders and infrastructure providers, Mythos 5 is available with safeguards lifted in certain areas, offering the strongest cybersecurity capabilities of any model in the world at ten dollars per million input tokens and fifty dollars per million output tokens — less than half the price of Claude Mythos Preview.

The capabilities Anthropic highlights are striking. Stripe reported that Fable 5 compressed months of engineering into days, performing a codebase-wide migration in a fifty-million-line Ruby codebase in a day that would otherwise have taken a whole team over two months. In vision, Fable 5 beat Pokémon FireRed using only raw game screenshots — no maps, navigation aids, or extra game-state information — completing the game with a minimal, vision-only harness. In memory and long-context, persistent file-based memory improved performance in Slay the Spire three times more than for Opus 4.8, and Fable reached the game’s final act three times more often.

Perhaps most significant for the long-term arc of AI is Anthropic’s claim that Mythos 5 is the first model to consistently produce novel, compelling scientific hypotheses. In blinded head-to-head comparisons against Opus-class models, Anthropic scientists preferred Mythos’s molecular biology hypotheses roughly eighty percent of the time, and several have been advanced to experimental evaluation. One Mythos hypothesis — a novel mechanism for an E. coli protein — was corroborated in a study from an independent lab working on the same problem. That is not a benchmark win. That is a model contributing to real scientific knowledge.

The Mobility Track: Robotaxis Get Real, Even If Timelines Slip

While AI labs race, the physical-world deployment of autonomy is moving fast enough to make headlines without exaggeration. Three developments this week illustrate how the robotaxi story is shifting from aspiration to regulated reality.

Uber and Wayve Plan London Self-Driving Cabs

Wayve and Uber announced they are preparing to launch self-driving taxis in London. Wayve’s AI-native driving platform, combined with Uber’s network and brand, makes the partnership credible in a way that earlier robotaxi announcements sometimes were not. London’s regulatory environment is also more accommodating than many jurisdictions, with clear paths for testing and limited commercial deployment. The timeline is not specified, but the fact that both companies are talking publicly suggests hardware and software validation are far enough along to justify customer-facing preparation.

Rivian Promises Supervised Point-to-Point Driving This Year

Rivian CEO RJ Scaringe laid out a three-stage autonomy roadmap at the Masters of Scale event in Anaheim: supervised point-to-point driving in 2026, eyes-off unsupervised driving in 2027, and a commercial robotaxi service with Uber in 2028. The first stage, arriving later this year on Gen 2 vehicles and the R2, would be “very similar to Tesla’s FSD.”

Comparisons to Tesla are deliberate. Tesla’s FSD relies exclusively on cameras, while Rivian’s platform integrates ten external cameras, five radar units, twelve ultrasonic sensors, and high-precision GPS. Future R2 models will add roof-mounted LiDAR and a custom 5nm RAP1 processor delivering up to 1,600 trillion operations per second. Rivian’s Autonomy+ package costs $2,500 one-time or $49.99 per month — a sharp undercut of Tesla’s $8,000 or $99 per month FSD pricing.

The architecture is worth attention. Rivian’s Large Driving Model is trained end-to-end through reinforcement learning, mapping raw sensor input directly to vehicle trajectory and analyzing multiple driving paths using Group-Relative Policy Optimization. The approach mirrors Tesla’s FSD v12 neural network philosophy, though Rivian’s multi-sensor hardware gives the model wider input data. Eyes-off driving in 2027 is the commercially consequential milestone, and Tesla has been pushing its unsupervised FSD timeline repeatedly — most recently to Q4 2026 at the earliest. Rivian’s target is Level 3 autonomy by 2028 and Level 4 by 2030, timelines that no autonomous driving company has consistently met.

Tesla Cybercab Clears Regulatory Hurdle

While Rivian was outlining a multi-year roadmap, Tesla’s Cybercab — the all-electric, steering-wheel-free ride-hailing vehicle — achieved a significant regulatory milestone that readies it for public roads. Tesla has not provided full technical details, but regulatory approval of a purpose-built robotaxi without manual controls is a meaningful indicator that at least one jurisdiction views the underlying safety case as credible. Combined with XPeng’s announcement that its next-generation VLA 2.0 autonomous driving system targets a 2027 global rollout, the autonomous vehicle market is filling with concrete hardware platforms rather than vaporware prototypes.

The Biotech Track: AI Enters the Drug Pipeline at Scale

The most consequential long-term story may be in biology. AI is no longer just helping researchers analyze data; it is designing molecules, hypothesizing mechanisms, and entering clinical trials.

Cambridge’s AI-Designed Vaccine Enters Human Trials

A team at the University of Cambridge reported the first human trials of a vaccine whose key component — a “super-antigen” — was designed entirely by artificial intelligence. The vaccine targets all coronaviruses, including all Covid variants and animal coronaviruses with pandemic potential. In trials of thirty-nine people designed to assess safety, the impact on the immune system was described as “modest,” but researchers are already moving forward with a second study of roughly two hundred people to understand how well the AI-designed antigen trains immunity.

Prof Jonathan Heeney, who led the work, described it as a fundamental shift in pandemic preparation: “We’re always behind. What we’re trying to do is get ahead of the curve, and so far ahead they could protect against new outbreaks or pandemics.” The Cambridge team is already performing animal research on universal seasonal flu vaccines that would not need annual updates, an H5N1 bird flu vaccine, and a vaccine for viral haemorrhagic fevers including Ebola species that currently lack vaccines. Prof Andy Pollard of the Oxford Vaccine Group called the approach “fascinating data” and said AI will be a “game changer” for vaccine research.

CRISPR Therapy Hits Phase III Success

In a separate milestone, Intellia Therapeutics reported additional positive Phase 3 results for lonvo-z in patients with hereditary angioedema. The therapy uses in vivo CRISPR gene editing, meaning the editing happens inside the patient’s body rather than in extracted cells. The initial Phase III data showed the treatment successfully reduced hereditary angioedema attacks. Broad Institute researchers also reported improvements in nearly every aspect of prime editing — higher efficiency, better delivery, and expanded applicability to more genetic diseases — moving the technology closer to therapeutic reality for conditions that current gene therapies cannot address.

Alnylam and Inceptive Sign $2 Billion AI Collaboration

On the business side, Alnylam Pharmaceuticals and Inceptive Nucleics announced a strategic collaboration valued at up to two billion dollars, with an upfront consideration of thirty million dollars including cash and equity. Alnylam is the leading RNAi therapeutics company with six approved drugs and over twenty years of proprietary siRNA data. Inceptive builds foundation models of life that can learn the patterns underlying biology and generalize across therapeutic modalities without retraining.

The partnership is designed to accelerate siRNA design by modeling target messenger RNAs and jointly exploring sequence space and novel chemical modifications. In joint exploratory work, Inceptive’s model achieved exceptional performance within weeks, uncovering meaningful biological insights from relatively small datasets to characterize siRNA molecules. Jakob Uszkoreit, Inceptive’s CEO, summarized the philosophy: “Most drug design still works through a process of trial and error, testing thousands of molecules and hoping something sticks. Inceptive was built on a different premise: that life follows rules of such complexity that only AI can learn them.”

What It Means

The through-line across all three tracks is efficiency of intelligence. GPT-5.5 delivers more capability per token. Gemini 3.5 Flash delivers more capability per second. Claude Fable 5 delivers more capability per task, at the cost of deliberate safety constraints. Rivian’s Large Driving Model delivers safer routing from raw sensor input. The Cambridge vaccine team delivers immune protection from genetic sequences rather than live virus. Alnylam and Inceptive are trying to deliver drug candidates from biological first principles rather than high-throughput screening.

None of these systems is ready to run unattended. GPT-5.5 still makes mistakes in long-running coding tasks. Robotaxis are still supervised. The AI-designed vaccine still needs to prove durability in larger human cohorts. CRISPR therapies are still approved for rare conditions. But the trend is clear: the cost of applying intelligence to a problem is dropping, and the ceiling of what intelligence can accomplish is rising, in software, in vehicles, and in medicine at the same time.