Inference - fix model table order (#2436)

jamie-rasmussen · web-flow · commit f9a839928bfa · 2026-04-09T15:13:09.000-05:00
## Description

The script that generates this table has logic to sort newer/larger
models first. While it was working in some cases, it didn't correctly
sort Qwen3.5 before Qwen3 because the "." was sorting after " ". The
script was fixed.

## Testing
- [x] Local build succeeds without errors (`mint dev`)
- [x] Local link check succeeds without errors (`mint broken-links`)
- [x] PR tests succeed
diff --git a/inference/models.mdx b/inference/models.mdx
@@ -23,11 +23,11 @@ W&B Inference provides access to several open-source foundation models. Each mod
 | OpenAI GPT OSS 120B             | `openai/gpt-oss-120b`                          | Text         | 131k           | 5.1B-117B (Active-Total)  | Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.                                            |
 | OpenAI GPT OSS 20B              | `openai/gpt-oss-20b`                           | Text         | 131k           | 3.6B-20B (Active-Total)   | Lower latency Mixture-of-Experts model trained on OpenAI's Harmony response format with reasoning capabilities.                                   |
 | OpenPipe Qwen3 14B Instruct     | `OpenPipe/Qwen3-14B-Instruct`                  | Text         | 32.8k          | 14.8B (Total)             | An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.                             |
+| Qwen3.5 35B A3B                 | `Qwen/Qwen3.5-35B-A3B`                         | Text, Vision | 262k           | 3B-35B (Active-Total)     | Qwen3.5-35B-A3B is an open-weights multimodal MoE model built for efficient, high-throughput inference across chat, reasoning, and agentic tasks. |
 | Qwen3 235B A22B Thinking-2507   | `Qwen/Qwen3-235B-A22B-Thinking-2507`           | Text         | 262k           | 22B-235B (Active-Total)   | High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.                                     |
 | Qwen3 235B A22B-2507            | `Qwen/Qwen3-235B-A22B-Instruct-2507`           | Text         | 262k           | 22B-235B (Active-Total)   | Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.                                             |
 | Qwen3 30B A3B                   | `Qwen/Qwen3-30B-A3B-Instruct-2507`             | Text         | 262k           | 3.3B-30.5B (Active-Total) | Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.               |
 | Qwen3 Coder 480B A35B           | `Qwen/Qwen3-Coder-480B-A35B-Instruct`          | Text         | 262k           | 35B-480B (Active-Total)   | Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.                       |
-| Qwen3.5 35B A3B                 | `Qwen/Qwen3.5-35B-A3B`                         | Text, Vision | 262k           | 3B-35B (Active-Total)     | Qwen3.5-35B-A3B is an open-weights multimodal MoE model built for efficient, high-throughput inference across chat, reasoning, and agentic tasks. |
 | Z.AI GLM 5                      | `zai-org/GLM-5-FP8`                            | Text         | 200k           | 40B-744B (Active-Total)   | Mixture-of-Experts model for long-horizon agentic tasks with strong performance on reasoning and coding.                                          |
 | Meta Llama 4 Scout (deprecated) | `meta-llama/Llama-4-Scout-17B-16E-Instruct`    | Text, Vision | 64k            | 17B-109B (Active-Total)   | Multimodal model integrating text and image understanding, ideal for visual tasks and combined analysis.                                          |