Mistral AI, a prominent French artificial intelligence company, was established in Paris in April 2023 by three distinguished French AI researchers: Arthur Mensch, Guillaume Lample, and Timothée Lacroix . The founders brought a wealth of experience from leading AI laboratories; Arthur Mensch previously worked at Google DeepMind, while Guillaume Lample and Timothée Lacroix were key researchers at Meta AI . Their collaboration began during their studies at École Polytechnique . The company's founding was motivated by a desire to counteract the growing trend of closed-door frontier AI research, aiming to re-establish "more openness and information sharing" and challenge the "opaque-box nature of 'big AI'" .
At its core, Mistral AI operates on a "disruptive open-core model," strategically releasing powerful open-weight models while monetizing its more advanced, proprietary models and enterprise services 1. The company's mission statement is "to make frontier AI accessible to everyone," empowering individuals and organizations to build with and benefit from transformative AI technology through open-source, efficient, and innovative models, products, and solutions . Mistral AI champions a decentralized and transparent approach to technology, prioritizing compute efficiency, helpfulness, and trustworthiness . CEO Arthur Mensch emphasizes the critical role of open-source models for community scrutiny and preventing a global AI monopoly 1. Furthermore, Mistral AI positions itself as a "European champion," focusing on data sovereignty and GDPR compliance, thereby offering a "privacy-first" approach particularly appealing to regulated industries 1. While many of its models are released under permissive licenses such as Apache 2.0, some of its most powerful models remain proprietary, a strategy that has drawn "open-washing" criticisms 1. The company's long-term vision is pragmatic, viewing AI primarily as a "new programming language" to enhance human productivity rather than pursuing Artificial General Intelligence (AGI) 1.
Mistral AI has rapidly ascended in the AI landscape, securing substantial funding since its inception. The company made headlines in June 2023 by closing a €105 million ($117 million) seed round, which was the largest in European history at that time . By December 2023, its valuation soared to over $2 billion after a €385 million ($428 million) funding round that included investors such as Andreessen Horowitz, BNP Paribas, and Salesforce . In June 2024, a €600 million ($645 million) funding round, led by General Catalyst, further increased its valuation to €5.8 billion ($6.2 billion) . Most recently, in September 2025, Mistral AI secured a monumental €1.7 billion Series C round, propelling its valuation to €12 billion ($14 billion) . ASML, a leading semiconductor equipment manufacturer, was the lead investor in this round, contributing €1.3 billion and acquiring approximately an 11 percent share in Mistral AI . Other key investors include DST Global, Andreessen Horowitz, Bpifrance, General Catalyst, Index Ventures, Lightspeed, and NVIDIA 2.
Mistral AI Funding Milestones
| Date | Funding Round | Amount | Valuation | Lead Investor(s) |
|---|---|---|---|---|
| June 2023 | Seed Round | €105 million ($117 million) | - | - |
| December 2023 | - | €385 million ($428 million) | >$2 billion | Andreessen Horowitz, BNP Paribas, Salesforce |
| June 2024 | - | €600 million ($645 million) | €5.8 billion ($6.2 billion) | General Catalyst |
| September 2025 | Series C | €1.7 billion | €12 billion ($14 billion) | ASML |
As a key player in the global AI ecosystem, Mistral AI has strategically positioned itself as a "critical third force" challenging the US-dominated generative AI market 1. While headquartered in Paris, France, the company is expanding its global footprint with a growing presence in the United States, United Kingdom, and Singapore . It caters to developers and businesses across diverse industries by offering open, portable, and customizable generative AI solutions 2. Mistral AI's competitive advantages stem from its open-source philosophy, strong European backing, significant funding, and highly capital-efficient model architectures achieved through innovations such as Mixture-of-Experts (MoE), Grouped-Query Attention (GQA), and Sliding Window Attention (SWA) . This efficiency enables Mistral to rival larger models while incurring lower computational costs 1. As of June 2024, Mistral AI was ranked fourth globally in the AI industry and held the top position outside the San Francisco Bay Area by valuation 3. The company has also forged several strategic partnerships, including a multi-year alliance with Microsoft in February 2024, making Mistral's language models available on Azure and providing access to supercomputing infrastructure . Microsoft also invested $16 million in Mistral AI 3. The September 2025 strategic partnership with ASML involves collaboration on AI model usage across ASML's product portfolio and operations, granting ASML an advisory seat on Mistral AI's Strategic Committee 4. Additionally, an April 2025 partnership with shipping giant CMA CGM involves Mistral AI's technology powering an internal assistant . Mistral AI's models are also integrated into platforms from major companies like Snowflake (Cortex Analyst tool), IBM (watsonx.ai platform), SAP, and Cloudflare (Workers AI platform) 1. This comprehensive introduction sets the stage for a detailed examination of Mistral AI's specific AI models and developer tools.
Mistral AI's flagship models encompass a diverse range of general-purpose and specialized models, offering varied capabilities and performance profiles to meet different user needs. These models are often distinguished by their technical specifications, intended applications, and licensing terms, reflecting Mistral AI's commitment to both cutting-edge performance and open-source principles 5. Many models are accessible via La Plateforme, Mistral's infrastructure, and Azure, with self-deployment options available for sensitive applications 6.
These models are designed for a broad spectrum of tasks, balancing performance with versatility across various applications 7.
Mistral 7B is a dense transformer model and Mistral AI's inaugural offering, recognized for its efficiency despite its relatively small size 8. It is well-suited for general-purpose language understanding and is often deployed in chatbots, code generation, and local LLM solutions 8.
| Feature | Detail |
|---|---|
| Parameters | 7 billion (7B) 8 (7.3 billion 7) |
| Architecture | Dense Transformer, utilizing grouped-query attention (GQA) and sliding window attention (SWA) for efficiency 8 |
| Context Window | 8K tokens 8 |
| Key Strengths | Strong general-purpose language understanding, fast inference speed, low memory usage 8 |
| Performance Benchmarks | Outperforms larger models like LLaMA 2 13B and rivals Llama 34B on several tasks 8; Mistral-tiny API (Mistral 7B Instruct v0.2) achieves 7.6 on the MT-Bench 9 |
| Licensing | Apache License 2.0, allowing full commercial use and modification with attribution 8 |
| Use Cases | Developers seeking a robust, open-source LLM for local deployment without high hardware demands 8 |
Mixtral 8x7B is a sparse Mixture-of-Experts (MoE) model that has significantly advanced the field of AI 8. It achieves high performance while maintaining efficient computational usage, effectively behaving like a 12-13B model in practice 8.
| Feature | Detail |
|---|---|
| Parameters | 8 experts, each with 7 billion parameters, totaling 46.7 billion parameters, but only 12.9 billion utilized per token 8 |
| Architecture | Sparse Mixture of Experts (MoE), where a router network selects 2 out of 8 experts for each layer and token 8 |
| Context Window | Up to 32K tokens 8 |
| Key Strengths | Higher performance than most 13B–30B models with efficient compute, suitable for complex reasoning and long-form tasks 8; excels in code generation and supports multiple languages (English, French, Italian, German, Spanish) 7 |
| Performance Benchmarks | Much stronger performance than Mistral 7B and many 30B+ models 8; outperforms Llama 2 70B in various benchmarks (up to six times faster inference) and matches or outperforms GPT-3.5 on most benchmarks 7; achieves an 8.3 MT-Bench score when fine-tuned 7 |
| Licensing | Apache License 2.0, allowing full commercial use and modification 8 |
| Use Cases | Applications requiring advanced reasoning, large context handling, or better quality outputs like document summarization, intelligent agents, and research tasks 8; also ideal for bulk simple tasks such as classification, customer support, or text generation (used by mistral-small API endpoint) 9 |
Developed in collaboration with NVIDIA, Mistral Nemo is designed as a direct replacement for Mistral 7B, featuring an enhanced tokenizer and advanced architectural considerations for efficiency 7.
| Feature | Detail |
|---|---|
| Parameters | 12 billion parameters 7 |
| Context Window | 128K tokens 7 |
| Tokenizer | Tekken tokenizer, trained on over 100 languages, outperforming previous models by 30% in many languages and up to 3x more efficient in Korean and Arabic 7 |
| Other Technicalities | Quantization-aware, supporting FP8 inference without performance degradation 7 |
| Key Strengths | Capable in tasks requiring extensive context (complex reasoning, coding, world knowledge); improved instruction following, effective reasoning, multi-turn conversations, and accurate code generation 7; multilingual support for Romance languages, Chinese, Japanese, Korean, Hindi, and Arabic 5 |
| Licensing | Apache 2.0 license, fully open-sourced with pre-trained base and instruction-tuned checkpoints available 7 |
| API Pricing | $0.3 per 1M tokens for input and output; fine-tuning costs $1 per 1M tokens with a $2 monthly storage fee 7 |
Mistral Large 2 represents the latest iteration of Mistral AI's flagship models, offering top-tier reasoning capabilities and optimized for efficient execution on a single node 7. It provides extensive multilingual support and native function calling.
| Feature | Detail |
|---|---|
| Parameters | 123 billion parameters 7 |
| Context Window | 128K tokens 7 |
| Key Strengths | High performance in code generation, mathematics, and reasoning; natively fluent in multiple languages (English, French, Spanish, German, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, Korean, and over 80 coding languages) with nuanced understanding of grammar and cultural context 6; precise instruction-following and native function calling 6; fine-tuned to reduce incorrect information and better recognize when it lacks sufficient information 7 |
| Performance Benchmarks | Achieves 84.0% on the MMLU benchmark 7; performs on par with GPT-4o and Llama 3 405B in code generation 7; ranked as the world's second-best model generally available through an API (next to GPT-4) 6; scores 81.2% on MMLU (beating Claude 2, Gemini Pro, and Llama-2-70B) and 94.2% accuracy on the Arc Challenge (5 shots) 9; shows top performance in coding and math benchmarks like HumanEval, MBPP, Math maj@4, and GSM8K maj@8 6 |
| Licensing | Mistral Research License for non-commercial use and modification; commercial deployment requires direct contact or access through partners like IBM watsonx™ 5 |
| API Pricing | $3 per 1M tokens for input and $9 per 1M tokens for output; fine-tuning costs $9 per 1M tokens with a $4 monthly storage fee 7 |
| Use Cases | Complex multilingual reasoning tasks, text understanding, transformation, code generation 6, and long-context applications 7 |
These models are tailored for particular tasks, ensuring efficiency and high performance within their respective domains 7.
Codestral is Mistral AI's first specialized model specifically designed for code generation, offering developers tools to write, complete, and refine code efficiently 7.
| Feature | Detail |
|---|---|
| Parameters | 22 billion parameters 7 |
| Context Window | 32K tokens 7 |
| Architecture | Includes a fill-in-the-middle (FIM) mechanism for completing partial code snippets 7 |
| Key Strengths | Assists developers in writing, completing, and refining code; trained on over 80 programming languages (e.g., Python, Java, C++, JavaScript, Swift, Fortran) 7 |
| Performance Benchmarks | Sets new standards in code generation performance and latency; demonstrates strong performance on HumanEval, MBPP, CruxEval, and RepoBench 7 |
| Licensing | Mistral AI Non-Production License, allowing research and testing; commercial licenses granted on request 5 |
| API Pricing | $1 per 1M tokens for input and $3 per 1M tokens for output; fine-tuning costs $3 per 1M tokens with a $2 monthly storage fee 7 |
| Use Cases | Accelerating coding processes, writing tests, filling in missing code, and improving existing codebases 7 |
Mistral Embed is a specialized model for generating high-quality text embeddings, crucial for various natural language processing (NLP) tasks by capturing the semantic meaning of text in vectorial representations 7.
| Feature | Detail |
|---|---|
| Key Strengths | Captures the semantic meaning of text in vectorial representations; optimized for English text 7 |
| Performance Benchmarks | Achieves a retrieval score of 55.26 on the Massive Text Embedding Benchmark (MTEB) 7 |
| API Pricing | $0.01 per 1M tokens for both input and output 7 |
| Use Cases | Crucial for NLP tasks like clustering, classification, and retrieval; useful for semantic similarity, information retrieval, question-answering systems, and Retrieval-Augmented Generation (RAG) systems 7 |
Pixtral 12B is an open multimodal model, distinguished by its capability to process both text and image inputs and outputs, extending conversational interfaces beyond text-only LLMs 5.
| Feature | Detail |
|---|---|
| Architecture | Combines a 12B multimodal decoder (based on Mistral Nemo) and a 400M parameter vision encoder trained from scratch on image data 5 |
| Key Strengths | Handles conversational interfaces similar to text-only LLMs but with the added ability to upload images and answer questions about them 5 |
| Performance Benchmarks | Achieved highly competitive results on most multimodal benchmarks, outperforming Anthropic's Claude 3 Haiku, Google's Gemini 1.5 Flash 8B, and Microsoft's Phi 3.5 Vision models on MMMU, MathVista, ChartQA, DocQA, and VQAv2 5 |
| Licensing | Apache 2.0 license 5 |
These models are developed to advance the field of AI through experimental and cutting-edge algorithms, often released as fully open-source with no commercial restrictions 7.
Codestral Mamba is a language model specifically designed for code generation, built on the Mamba2 architecture to enable efficient handling of extensive and potentially infinite-length sequences 7.
| Feature | Detail |
|---|---|
| Parameters | Over 7 billion parameters 7 |
| Architecture | Mamba2 architecture, allowing for linear time inference 7 |
| Context Window | Tested on in-context retrieval tasks with token lengths up to 256K 7 |
| Key Strengths | Effective for handling sequences of potentially infinite length, ensuring quick responses regardless of input size 7 |
| Performance Benchmarks | Ranks leading transformer models in code generation and reasoning tasks 7 |
| Licensing | Fully open source 5 |
| Use Cases | Local code assistant 7 |
Mathstral is specialized for mathematical reasoning and scientific discovery, designed to tackle complex, multi-step logical tasks, and built upon the Mistral 7B foundation 7.
| Feature | Detail |
|---|---|
| Parameters | 7 billion parameters 7 |
| Architecture | Built on the foundation of Mistral 7B 7 |
| Context Window | 32K tokens 7 |
| Key Strengths | Designed to tackle complex, multi-step logical reasoning tasks 7 |
| Performance Benchmarks | Achieves 56.6% on the MATH benchmark and 63.47% on the MMLU benchmark; performance can increase to 68.37% on MATH with additional computation resources, and 74.59% with a reward model 7 |
| Licensing | Apache 2.0 license 7, fully open source 5 |
| Use Cases | Advanced mathematical problems and STEM-related applications, especially academic research 7 |
Mistral AI also offers legacy models and API endpoints that have either paved the way for newer advancements or serve specific use cases, showcasing the evolution of their AI offerings.
Mixtral 8x22B is a legacy sparse Mixture-of-Experts (sMOE) model known for delivering output quality comparable to much larger dense models while maintaining faster processing speeds 7.
| Feature | Detail |
|---|---|
| Parameters | 141 billion parameters total, with only 39 billion active per token 7 |
| Context Window | 64K tokens 7 |
| Key Strengths | Delivers output quality comparable to much larger dense models while maintaining faster processing speeds; multilingual (English, French, Italian, German, Spanish); strong in mathematics and coding, with native function calling 7; excels at handling extensive documents and precise information recall 7 |
| Licensing | Apache 2.0 license 7 |
| API Pricing | $2 per 1M tokens for input and $6 per 1M tokens for output 7 |
Mistral Small is positioned as an optimized, intermediary solution. While an earlier, lightweight version had 2.7B parameters, the current enterprise-grade Mistral Small v24.09 is larger and benefits from RAG-enablement and function calling 7.
| Feature | Detail |
|---|---|
| Parameters | Current version: 22 billion parameters 5; Legacy version: 2.7 billion parameters 7 |
| Key Strengths | Optimized for latency and cost; benefits from RAG-enablement and function calling 6 |
| Performance Benchmarks | Outperforms Mixtral 8x7B and has lower latency 6 |
| Licensing | Mistral Research License 5 |
| API Pricing | Input priced at $1 per 1M tokens and output at $3 per 1M tokens (for the legacy version) 7 |
| Use Cases | Resource-constrained environments like mobile devices or edge computing (for earlier, smaller versions) 7 |
Mistral Medium was a mid-sized model that served as a prototype, offering a balanced trade-off between performance and resource efficiency, and excelling across various NLP, code generation, and reasoning tasks 7.
| Feature | Detail |
|---|---|
| Parameters | 13 billion parameters 7 |
| Key Strengths | Offers a balanced trade-off between performance and resource efficiency; strong performance across NLP, code generation, and reasoning tasks 7; masters English, French, Italian, German, and Spanish, and is good at coding 9 |
| Performance Benchmarks | Scores 8.6 on MT-Bench, very close to GPT-4 and beating all other models tested 9 |
| API Pricing | Input cost of $2.75 per 1M tokens and output at $8.1 per 1M tokens 7 |
| Use Cases | Ideal for intermediate tasks requiring moderate reasoning such as data extraction, document summarization, or job/product description writing 9 |
Mistral AI's licensing strategy promotes broad adoption while differentiating between commercial and non-commercial uses.
Mistral AI's commitment to open models fosters broader experimentation and innovation, while also addressing concerns regarding transparency and technological sovereignty, particularly within Europe 8.
Mistral AI provides a comprehensive developer ecosystem designed to facilitate the integration of its advanced AI models into various applications, prioritizing flexibility, control, and performance 10. This ecosystem encompasses a suite of tools, extensive documentation, API access, Software Development Kits (SDKs), cloud integrations, and a transparent pricing structure.
Developers can access Mistral AI's AI models and capabilities through their API, available via "La Plateforme" 7. The API supports a wide range of functionalities:
The API offers fine-grained control over model behavior through numerous parameters, including frequency_penalty, max_tokens, messages, model ID, n (number of completions), parallel_tool_calls, presence_penalty, prompt_mode, random_seed, response_format (supporting JSON and JSON schema modes), safe_prompt, stop tokens, stream for partial progress, temperature, tool_choice, tools, and top_p 13. Access to the API requires API keys, which are obtained by activating payments on the user's account 7.
Mistral AI provides official SDKs to streamline interaction with its API:
Comprehensive documentation is a cornerstone of Mistral AI's developer ecosystem:
Mistral AI offers flexible deployment options to meet diverse enterprise requirements:
Mistral AI employs a flexible, usage-based pricing structure for its hosted services, while making its major models freely available for self-hosting 10. Pricing is primarily determined by input and output tokens, with additional costs for fine-tuning and model storage.
Hosted Model Pricing (per one million tokens)
| Model | Input Cost | Output Cost |
|---|---|---|
| Mistral Nemo | $0.30 | $0.30 |
| Mistral Large 2 | $3.00 | $9.00 |
| Codestral | $1.00 | $3.00 |
| Mistral Embed | $0.01 | $0.01 |
| Legacy Mistral 7B | $0.25 | $0.25 |
| Legacy Mixtral 8x7B | $0.70 | $0.70 |
| Legacy Mixtral 8x22B | $2.00 | $6.00 |
| Legacy Mistral Small | $1.00 | $3.00 |
| Legacy Mistral Medium | $2.75 | $8.10 |
Fine-Tuning and Hosting
| Model | Fine-tuning Cost (per one million training tokens) | Monthly Storage Fee (per model) |
|---|---|---|
| Mistral Nemo | $1.00 | $2.00 |
| Codestral | $3.00 | $2.00 |
| Mistral Large 2 | $9.00 | $4.00 |
Specialty APIs
These pricing models encourage adoption through cost-effective local deployment while monetizing value-added services available through Mistral AI's platform 10.