Moxin Organization

Full-Stack Open AI Ecosystem

The Moxin Organization is building a version of the AI future that is open, efficient, and sovereign — from Edge to Cloud. We are an open community for discovery and exploration of AI tools, and a welcoming, public space to gather and discover projects and resources related to LLMs, Agents, and other AI related topics.

🌐 Website · GitHub · Hugging Face

Products

Models — Moxin LM

Our flagship series of open-source language models, optimized for performance, efficiency, and transparency.

Moxin-LLM — a family of fully open-source and reproducible language models. The Moxin-7B series delivers SOTA performance in a compact size, with instruction-tuned and reasoning variants.
Moxin-VLM — built upon the Moxin-LLM backbone, a VLM designed for advanced vision-language understanding and interaction.
CC-MoE — Collaborative Compression for Large-Scale MoE Deployment on Edge. Extreme quantization enabling 70B+ models (like DeepSeek and Kimi) to run on consumer hardware with minimal loss.

Application

Moly — AI Super App. A cross-platform desktop + cloud AI chat application built in pure Rust using Makepad Framework and Project Robius platform tools. Works with local and cloud models.
MoFA Studio — Agent Development IDE with a graphical interface for visual creation, management, and debugging of Dataflows and Nodes.
OminiX Studio — Native multimodal AI desktop app built with Makepad. Chat, image generation, voice cloning, and speech transcription in a single interface. Connects to local or cloud backends.

Framework

MoFA — Modular Framework for Agents. A software framework for building AI agents through a composition-based approach. AI agents can be constructed via templates and combined in layers to form more powerful Super Agents. Built on DORA-RS runtime for high-performance, low-latency distributed AI computing.
DORA — Dataflow-Oriented Robotic Architecture. Middleware designed to streamline and simplify the creation of AI-based robotic applications with low latency, composable, and distributed dataflow capabilities.

Inference

OminiX-MLX — Safe Rust bindings to Apple MLX with 14 model crates. GPU-accelerated inference via Metal for LLMs (Qwen, GLM, Mixtral, Mistral), image generation (FLUX, Z-Image), ASR (Paraformer), and TTS (GPT-SoVITS). 45 tok/s on M3 Max.
OminiX-API — OpenAI-compatible API server wrapping OminiX-MLX. Drop-in local replacement supporting /v1/chat, /v1/audio, /v1/images, and WebSocket TTS with dynamic model loading. Pure Rust, zero Python.

Key Principles

Data Sovereignty — Your data never leaves your infrastructure. Run fully private AI models on-premise or in your private cloud.
Extreme Efficiency — Run 70B+ models on consumer hardware. OminiX optimizes inference on Apple Silicon for dramatically lower latency with zero Python dependencies.
Full Control — Open source from top to bottom. Modify the model, the agent framework, or the inference engine to fit your needs. Dual-licensed under MIT and Apache 2.0.

Presentations

Moxin LLM:

GOSIM AI Paris 2025: Towards Fully Open-Source LLM from Pre-training to Reinforcement Learning [Youtube]

Moly (previously named Moxin):

GOSIM Europe 2024: A Pure Rust Explorer for Open Source LLMs [Youtube] [Slides]

Contributing

We welcome contributions, ideas, and suggestions from anyone! We're also open to help you host and maintain your project under the umbrella of the Moxin organization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moxin Organization

Moxin Organization

Full-Stack Open AI Ecosystem

Products

Models — Moxin LM

Application

Framework

Inference

Key Principles

Presentations

Contributing

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!