GLM-5: Zhipu AI’s 744B-Parameter Open-Weights Frontier Model

GLM‑5 is Zhipu AI’s 2026 flagship large language model, released on February 11, 2026 as a frontier‑class system that combines open weights with performance close to the best proprietary models. It marks a major leap over the GLM‑4.x line, scaling from 355 billion to 744 billion total parameters while still remaining practical to deploy through a sparse Mixture‑of‑Experts design. Only about 44 billion parameters are active for any given token, which lets GLM‑5 behave like a huge model from a capability perspective without paying the full cost of a dense 700‑plus‑billion‑parameter network. Zhipu positions the model not as a simple chatbot but as an engine for “agentic engineering” that can break down complex projects, manage long‑running workflows, and work alongside tools with minimal hand‑holding.

Architecturally, GLM‑5 uses 256 experts with 8 selected per token, leading to a sparsity rate of roughly six percent. This setup allows the model to route different inputs to different subsets of experts, encouraging specialization and giving it a flexible internal structure that can adapt to coding, analysis, or natural‑language tasks as needed. Zhipu pairs this design with a massive training corpus of about 28.5 trillion tokens, roughly one quarter more data than used for GLM‑4.7. To keep such a large training run stable, the team built a new asynchronous reinforcement‑learning infrastructure nicknamed slime, which separates data collection from model updates so training jobs are not blocked while fresh trajectories are generated. On top of that, a technique called Active Partial Rollouts focuses compute on the hardest long‑tail examples, improving reliability on tricky, real‑world tasks.

One of GLM‑5’s most important practical features is its long‑context capability. The model can handle around 200,000 tokens of context, equivalent to roughly 300 pages of text, which is enough to fit entire repositories, legal contracts, or multi‑month conversation histories into a single prompt. This is enabled by an attention mechanism called DeepSeek Sparse Attention, developed in partnership with the DeepSeek team. Instead of computing attention over every token pair, GLM‑5 uses a two‑stage process with a “lightning indexer” and a token selector that prunes the attention pattern down to only the most relevant positions. That brings the effective complexity close to linear in sequence length and keeps inference fast enough to be viable for enterprise use, even at very long contexts.

The benchmark numbers for GLM‑5 confirm that this design pays off. On the Artificial Analysis Intelligence Index, which combines several reasoning and agentic metrics, the model scores 50 points and becomes the first open‑weights system to break that threshold. On SWE‑Bench Verified, a demanding coding benchmark based on real GitHub issues, GLM‑5 reaches 77.8 percent accuracy, beating many proprietary coding models and coming within a few points of the strongest reported systems. It also performs strongly on Terminal‑Bench 2.0, with a score in the mid‑fifties for command‑line task execution, and shows marked gains on MCP Atlas and Browse‑style evaluations that simulate web‑enabled agents. These results place GLM‑5 squarely in the frontier tier for both software engineering and general reasoning workloads.

Reliability and efficiency are recurring themes in GLM‑5’s evaluation. On the Artificial Analysis Omniscience Index, a metric that penalizes models for confidently wrong answers, GLM‑5 improves from its predecessor’s score of around minus thirty‑six to about minus one. This shift is largely due to the model’s greater willingness to say it does not know or to ask for more information when its confidence is low, instead of hallucinating details. At the same time, GLM‑5 completes benchmark suites while emitting about thirty‑five percent fewer output tokens than GLM‑4.7, meaning it tends to answer more directly and wastes less bandwidth and cost on unnecessary verbosity. For production deployments, that combination of higher accuracy, more cautious behavior, and shorter responses adds up to substantially better real‑world usability.

GLM‑5’s deployment story is split between cloud access and local experimentation. In full BF16 precision the model needs roughly 1.65 terabytes of storage and close to 1.5 terabytes of memory, which makes on‑premise, full‑precision inference realistic only for organizations with multi‑GPU clusters. However, quantization efforts from the open‑source community have shown that GLM‑5 can be compressed to around 241 gigabytes in two‑bit format while still retaining most of its performance. That puts it within reach of well‑equipped workstations and high‑end desktops, including machines like fully loaded Mac Studio configurations. For many teams, the most practical route will still be Zhipu’s API or cloud partners, but the ability to run a quantized version locally is a major differentiator compared with tightly locked proprietary models.

Pricing is another area where GLM‑5 is intentionally disruptive. Public API rates are set at about one US dollar per million input tokens and three dollars and twenty cents per million output tokens, dramatically undercutting Western frontier offerings that can charge fifteen dollars for inputs and seventy‑five dollars for outputs at similar quality levels. In cost terms, that makes GLM‑5 roughly fifteen times cheaper on the input side and more than twenty times cheaper on the output side than some well‑known competitors. Because the model is released under the permissive MIT license, companies can also host it themselves, fine‑tune it for internal domains, or embed it in products without the licensing friction that often accompanies closed models.

Strategically, GLM‑5 is significant beyond its raw specs. Zhipu trained the model entirely on Huawei Ascend accelerators, sidestepping reliance on NVIDIA hardware at a time when export controls and supply constraints remain a concern. That achievement signals that China can now build and deploy open‑weights frontier models on domestic infrastructure from end to end. The market reaction has been strong: Zhipu’s shares rallied after the announcement, and the company has already moved to increase prices on some higher‑value plans, particularly for coding‑focused tiers, as demand has surged. For developers globally, GLM‑5 stands out as a rare combination of frontier‑level capability, open licensing, and aggressive pricing, making it a central option to evaluate when planning long‑term AI strategy.

GLM-5 Launch: Open-Weights Frontier Model with 744B Parameters

Share this article

GLM-5 Launch: Open-Weights Frontier Model with 744B Parameters

Share this article