The Moxin Organization is building a version of the AI future that is open, efficient, and sovereign — from Edge to Cloud. We are an open community for discovery and exploration of AI tools, and a welcoming, public space to gather and discover projects and resources related to LLMs, Agents, and other AI related topics.
🌐 Website · GitHub · Hugging Face
Our flagship series of open-source language models, optimized for performance, efficiency, and transparency.
- Moxin-LLM — a family of fully open-source and reproducible language models. The Moxin-7B series delivers SOTA performance in a compact size, with instruction-tuned and reasoning variants.
- Moxin-VLM — built upon the Moxin-LLM backbone, a VLM designed for advanced vision-language understanding and interaction.
- CC-MoE — Collaborative Compression for Large-Scale MoE Deployment on Edge. Extreme quantization enabling 70B+ models (like DeepSeek and Kimi) to run on consumer hardware with minimal loss.
- Moly — AI Super App. A cross-platform desktop + cloud AI chat application built in pure Rust using Makepad Framework and Project Robius platform tools. Works with local and cloud models.
- MoFA Studio — Agent Development IDE with a graphical interface for visual creation, management, and debugging of Dataflows and Nodes.
- OminiX Studio — Native multimodal AI desktop app built with Makepad. Chat, image generation, voice cloning, and speech transcription in a single interface. Connects to local or cloud backends.
- MoFA — Modular Framework for Agents. A software framework for building AI agents through a composition-based approach. AI agents can be constructed via templates and combined in layers to form more powerful Super Agents. Built on DORA-RS runtime for high-performance, low-latency distributed AI computing.
- DORA — Dataflow-Oriented Robotic Architecture. Middleware designed to streamline and simplify the creation of AI-based robotic applications with low latency, composable, and distributed dataflow capabilities.
- OminiX-MLX — Safe Rust bindings to Apple MLX with 14 model crates. GPU-accelerated inference via Metal for LLMs (Qwen, GLM, Mixtral, Mistral), image generation (FLUX, Z-Image), ASR (Paraformer), and TTS (GPT-SoVITS). 45 tok/s on M3 Max.
- OminiX-API — OpenAI-compatible API server wrapping OminiX-MLX. Drop-in local replacement supporting
/v1/chat,/v1/audio,/v1/images, and WebSocket TTS with dynamic model loading. Pure Rust, zero Python.
- Data Sovereignty — Your data never leaves your infrastructure. Run fully private AI models on-premise or in your private cloud.
- Extreme Efficiency — Run 70B+ models on consumer hardware. OminiX optimizes inference on Apple Silicon for dramatically lower latency with zero Python dependencies.
- Full Control — Open source from top to bottom. Modify the model, the agent framework, or the inference engine to fit your needs. Dual-licensed under MIT and Apache 2.0.
- GOSIM AI Paris 2025: Towards Fully Open-Source LLM from Pre-training to Reinforcement Learning [Youtube]
Moly (previously named Moxin):
- GOSIM Europe 2024: A Pure Rust Explorer for Open Source LLMs [Youtube] [Slides]
We welcome contributions, ideas, and suggestions from anyone! We're also open to help you host and maintain your project under the umbrella of the Moxin organization.
