Alibaba Open-Sources Qwen3.5, A Natively Multimodal Model Built For High-Efficiency Inference-Alibaba Group

Alibaba has released Qwen3.5, its latest model series, starting with opening sourcing Qwen3.5-397B-A17B (also named “Qwen3.5-Plus”), a natively multimodal foundation model that delivers strong performance in reasoning, coding, agent capabilities, and multimodal understanding, while lowering inference cost for real-world deployment.

The launch advances Alibaba’s push into agentic AI: building models that are not only capable but also efficient enough to scale broadly, enabling developers and enterprises to deploy multimodal applications without proportionally increasing compute budgets.

Multimodal models are moving quickly from demos to deployment—but for most teams, the limiting factor isn’t capability, it’s cost. As AI takes on more agentic, multi-step tasks across text, images and video, inference efficiency becomes the difference between a pilot and a product.

Qwen3.5 is natively multimodal, trained on trillions of vision-language tokens across multilingual text, images, videos, STEM, and reasoning data. The model can process text, image and video, and generate text.

Qwen supports 201 languages and dialects, up from 119 in the Qwen3 series, including additional low-resource languages such as Hawaiian, Fijian, and Niger-Congo languages. Across a broad set of benchmarks, Qwen3.5-397B-A17B demonstrates strong performance across language understanding and reasoning, code generation, agentic workflows, image and video comprehension, and GUI interaction, rivaling leading frontier models in both versatility and performance.

A key focus of this release is inference efficiency. Qwen3.5 combines a linear attention mechanism with a sparse mixture-of-experts (MoE) design to reduce compute requirements while maintaining capability. Qwen3.5-397B-A17B achieved a significant reduction in deployment expenses, while matching the performance of the much larger Qwen3-Max model, which exceeds one trillion parameters.

Beyond benchmark performance, Qwen3.5 is built as a foundation for real-world multimodal agents, systems that can interpret what they “see” and take actions across devices and interfaces.

Qwen3.5 can:

Act as a visual agent that interacts with smartphones and computers to streamline workflows.
Deliver stronger visual reasoning for scientific problem-solving and other visual reasoning tasks
Support long-form video understanding (up to two hours) for analysis and summarization
Convert hand-drawn UI sketches into functional front-end code by bridging visual understanding and code generation

Qwen3.5 reflects a broader shift in open-source AI: leading models are increasingly being engineered for production constraints, focusing on latency, throughput, and total cost of ownership.

That same focus on practical deployment is also shaping Alibaba’s work with major partners. At Milano Cortina 2026, the IOC recently introduced its first LLM-based system built on Qwen, bringing AI into fan experiences and supporting workflows across the Olympic ecosystem.

Global developers can access to Qwen3.5-397B-A17B via Hugging Face, GitHub and ModelScope, and experience it through Qwen Chat. The model is also available via APIs on Alibaba Cloud’s model development platform, Model Studio, with additional Qwen3.5 models expected to be open-sourced in the coming weeks.