Supplementary: Model Matrix

This page will contain the complete model matrix for all 35 models tested, including:

  • Model specifications: Family, generation, parameter count, release date, API version, and API provider.
  • Family breakdown: Anthropic (5 models, Claude 3 Haiku → Opus 4), OpenAI (6, GPT-3.5 → GPT-5), Google (4, Gemini 2.0 Flash → 3.1 Pro), xAI (3, Grok 3 → Grok 4.1), Qwen (9, 2.5 7B → 3.5 397B), DeepSeek (3, V3 → R1 → V3.2), Meta (5, Llama 3 8B → 4 Maverick).
  • Cost breakdown per model and per family.
  • Key analytical contrasts: Qwen size-vs-generation disentanglement, Meta fixed-size generation comparison, reasoning model sub-analysis (o3, QwQ, R1).
  • Rate limiting and API details for reproducibility.

Full matrix forthcoming.