MiniMax-M1
Scaling Reasoning with Hybrid Attention

The world's first open-weight model with a hybrid MoE architecture and lightning attention, supporting a 1 million token context for unparalleled long-form task mastery.

Try M1 Instantly Explore Models

A New Architecture for a New Era of AI

MiniMax-M1's innovations deliver state-of-the-art performance with breakthrough efficiency.

Hybrid MoE Architecture

Combines a Mixture-of-Experts (MoE) design with 45.9B active parameters out of 456B total, optimizing compute for every token.

1M Token Context

Natively supports an unprecedented 1 million token context window—8 times larger than leading competitors—for deep document analysis.

Lightning Attention

A novel attention mechanism that dramatically reduces computational cost during inference, making long-context tasks feasible and fast.

Hyper-Efficient RL Training

Powered by the CISPO algorithm, M1's RL training finished in just 3 weeks, showcasing a new paradigm in training efficiency.

Leading Performance on Key Benchmarks

MiniMax-M1 demonstrates exceptional capabilities, particularly in complex, real-world tasks that demand deep reasoning and long-context understanding.

56.0%

SWE-bench

Outperforms leading open-weight models in complex, real-world software engineering tasks, showcasing superior problem-solving abilities.

73.4%

OpenAI-MRCR (128k)

Achieves top-tier results in Multi-hop Retrieval and Question Answering, proving its mastery over long-context information processing.

67.8%

TAU-bench (Retail)

Exhibits advanced agentic tool use capabilities, navigating complex scenarios to successfully complete tasks.

A Revolution in AI Economics & Efficiency

M1 was built not just for performance, but for a new paradigm of cost-effective, scalable AI development, directly challenging industry standards.

Breakthrough Cost-Performance

The entire RL training for M1 was completed in just 3 weeks using 512 H800 GPUs, at a leasing cost of only $537,400. This demonstrates a new level of efficiency in large model training, an order of magnitude less than expected.

Unmatched Inference Efficiency

Compared to its peers, M1 is drastically more efficient. When generating 100K tokens, it consumes just 25% of the computational resources required by the DeepSeek-R1 model, setting a new industry benchmark for large-scale inference.

Innovative CISPO Algorithm

Our novel CISPO reinforcement learning algorithm achieves twice the acceleration compared to contemporary methods like DAPO, reaching peak performance in half the training steps and further cementing our lead in efficiency.

Two Models, Tuned for Your Needs

Whether you need balanced performance or maximum reasoning power, there's an M1 for you.

MiniMax-M1 40K

The 40K thinking budget model offers a powerful and efficient baseline, representing an intermediate stage of M1's training. It excels at a wide range of tasks with remarkable speed.

Download Model

MiniMax-M1 80K

The fully-trained 80K thinking budget model provides maximum reasoning depth. It's built for tackling the most complex challenges in software engineering, tool use, and long-context understanding.