FUSE AI 🤖 (FUll SEtup AI Agents)

Practical setup guides, reusable templates, and sync workflows for running local AI coding agents with Ollama and any other compatible tools.

OpenClaw personal AI assistant running via Telegram both local and cloud model

This repository documents a real developer stack built around:

Aider — primary Windows-native repo editing agent
OpenClaw — personal AI assistant via Telegram with multi-agent and tool calling (replaced Hermes)
Pi — lean customizable coding harness
Ollama — local model runtime (9 models, auto-swap fallback chain)
Continue.dev — editor integration and autocomplete
OpenCode — hosted coding workflow option

It is built for people who want a real local workflow for create, read, update, and delete tasks on actual project files, not just chat demos.

Why This Repository Exists

Most AI-agent setup notes are either:

too generic to reproduce
too personal to reuse
too fragile to maintain

This repository tries to be more useful:

real-world setup patterns
model-role mapping that matches actual hardware limits
public-safe templates
sync workflows for keeping docs aligned with a tested local setup
a contribution-friendly structure for people who want to share their own setups

Who This Repository Is For

Fullstack developers
IT generalists
Local-first AI users
Windows users with low-VRAM GPUs
People who want to back up and version their agent configuration cleanly

Supported Agents

Agent	Runtime	Default Model	Best Use
Aider	Windows native	Jan-code	Primary repo editing, file CRUD
OpenClaw	Windows native (Node.js)	Phi4-mini	Personal AI assistant via Telegram, tool calling, multi-agent
Pi	Windows native	Jan-code	Lean customizable coding harness

OpenClaw replaced Hermes Agent (WSL-based). No WSL dependency required anymore.

Related Tooling Ecosystem

These are not all configured as primary agents in the same way, but they are part of the practical tool ecosystem around this setup:

Tool	Role
Ollama	local model runtime
OpenCode	hosted coding workflow option
Continue.dev	editor integration, autocomplete, and inline assistance
CodeGemma	autocomplete-oriented local model
Zed IDE	modern editor worth testing for AI-native workflows

Useful External Resources

https://context7.com -> useful for fetching fresher framework and library documentation context
https://opencode.ai or your OpenCode access path -> strong hosted coding workflow option, often good for free-tier experimentation depending on account access
https://build.nvidia.com -> worth testing for free hosted model access and experimentation
https://zed.dev -> Zed IDE is worth trying if you want another AI-friendly editor workflow
Another free LLM API's list, Free AI Tools Compare
Awesome MCP Servers — curated list of MCP servers for extending tool capabilities

Supported Environments

local-gpu-4gb - fully documented from a real working setup
local-cpu-only - recommended setup path using NovaforgeAI models

Current Local GPU 4GB Model Strategy

Local Models (Ollama)

Model	Size	Role	Tool Calling
`phi4-mini:3.8b-q4_K_M`	2.5 GB	OpenClaw primary — tool calling + step-by-step reasoning	Yes
`fredrezones55/Jan-code:Q4_K_M`	2.7 GB	Aider/Pi default — balanced daily coding	No
`aikid123/Qwen3-coder:latest`	1.4 GB	Fast code and chat with thinking	No
`fredrezones55/qwen3.5-opus:4b`	3.4 GB	Complex reasoning and heavier work	No
`softw8/Nanbeige4.1-3B-q4_K_M`	2.5 GB	Backup tool calling	Yes
`relational/orlex:latest`	3.3 GB	Backup reasoning and planning	No
`rardiolata/CodeTito:latest`	2.0 GB	Backup coding	No
`codegemma:2b`	1.6 GB	Autocomplete (editor integration)	No
`nomic-embed-text:latest`	274 MB	Embeddings only, not for chat/coding	No

Cloud Fallbacks (Optional, Last Resort)

Provider	Model	Purpose
Google	Gemini 2.5 Flash (free tier)	Long context (1M tokens), tool calling
NVIDIA	Llama 3.3 70B (free tier)	High-quality fallback

Cloud models are only used when all local models fail. See OpenClaw Model Guide for the full fallback chain.

OpenClaw Auto-Swap Fallback Chain

Phi4-mini → Qwen3-coder → Jan-code → Qwen3.5-opus → Orlex → CodeTito → Nanbeige → Gemini Flash → NVIDIA Llama
(local)     (local)        (local)    (local)         (local)  (local)    (local)     (cloud free)   (cloud free)

Environment Comparison

Environment	Status	Best Default	Notes
Local GPU 4GB	Tested	`phi4-mini:3.8b-q4_K_M` (OpenClaw) / `Jan-code` (Aider)	Multi-model role split with auto-swap fallback chain
Local CPU Only	Recommended	NovaforgeAI small models or API-backed providers	Good contribution target for future validation

Cost-Saving Strategies: Hybrid Workflow with Local + API Models

While there are technically ways to exploit models and quotas using OpenCode, sorry i will not discuss those here but i will share some of the best strategies for maximizing output while minimizing API token costs, which is a critical part of running AI agents in a sustainable way.

InshaAllah, with this blessed approach, the savings achieved will be more meaningful and sustainable. 🙏

"Halal sustenance brings long-term blessings."

One of the most powerful patterns is combining local models for execution with premium API models (Claude Opus, Gemini Pro, GPT-4+) for architecture and complex reasoning. This dramatically reduces API token costs while maintaining high-quality output.

Also you can use Claude Code as an agent with free-tier API keys from OpenRouter (e.g., qwen3 models) instead of paid Claude API. See Claude Code + OpenRouter Integration Guide for setup instructions.

⚠️ AI Cost Warning

Recommendation: Use Claude Pro ($20/month) with the official Claude Code extension rather than OpenRouter pay-as-you-go API.

In agentic mode (Zed, Claude Code), a single prompt can trigger 20–50+ API calls automatically — each with full context. Real (in my) case: $11.65 spent in one day from just a 2 prompts using Claude Opus via OpenRouter.

If you must use an API, always set a spending limit on your platform if available and use cheaper models (Sonnet / Qwen free) for execution tasks. OpenRouter's also offers Enable 1% discount on all LLMs Consent to OpenRouter using your inputs/outputs to improve the product.

Recommended Execution Models

For actual code implementation, these local models provide excellent cost savings:

Fast Execution (Recommended):

qwen2.5-coder:3b - Fast and capable, works best when tasks are broken into multiple focused plan files rather than one long context

Better Quality:

deepseek-coder:6.7b-instruct-q4_0 - Higher quality output, slightly slower but more reliable for complex tasks

Strategy 1: Architect & Builder Workflow

Don't waste expensive API tokens on boilerplate code. Use premium models purely for high-level design.

Step 1: The Architect (Claude Opus / Gemini Pro)

Prompt the premium model with this instruction:

Act as an Expert Software Architect and Principal Engineer. Your task is to design a comprehensive Technical Specification Document for the feature/system requested below.

CRITICAL INSTRUCTION TO SAVE TOKENS:
DO NOT write the actual implementation code. DO NOT write full functions. Your output must ONLY contain high-level architecture, logic flows, file structures, and strict pseudocode.

Please provide the output in clean Markdown format with the following sections:
1. System Overview: A brief summary of how the solution works.
2. File Structure: A tree representation of the files to be created or modified.
3. Tech Stack & Dependencies: Any specific libraries, modules, or APIs needed.
4. Data Flow / State Management: How data moves between components.
5. Step-by-Step Logic (Pseudocode): The exact logical steps for core algorithms, edge cases, and security considerations.
6. Execution Order: A numbered list of which file/component should be built first.

Here is the feature I want to build:
[DESCRIBE YOUR FEATURE HERE]

Step 2: The Builder (Gemini Flash / Minimax 2.5 / Local Model)

Take the Technical Spec from Claude and feed it to a cheaper/free executor:

Act as a Senior Full-Stack Developer. I will provide you with a Technical Specification Document written by an Expert Architect.

Your task is to write the COMPLETE, production-ready, and fully functional code based EXACTLY on this specification.

Rules:
1. Follow the file structure and execution order provided.
2. Write clean, well-commented code.
3. Do not skip any logic mentioned in the pseudocode.
4. Output the code block by block, clearly stating the filename above each code block.

Here is the Technical Specification:
[PASTE CLAUDE'S SPEC HERE]

Use:

Gemini 3 Flash - Blazing fast, huge context window, very cheap for mass execution
Local qwen2.5-coder:3b - Free, fast, works great with focused specs
Local deepseek-coder:6.7b - Free, better quality for complex implementations

Strategy 2: Output Diffs Only (Anti-Rewrite)

One of the biggest token drains is when AI rewrites 500 lines when you only asked to change 3 lines.

Always end your prompts with:

Only provide the code that changed, with comments indicating where to insert it.
NEVER rewrite entire files that haven't changed.

Strategy 3: Provide a "Map", Not the "Territory" (Skeletal Context)

When debugging large projects, don't paste entire files into chat.

Solution:

Generate a directory tree structure (use tree command or OpenCode)
Provide the "Directory Map" to the AI
Let the AI determine which files it actually needs to see
Only then provide the specific file contents

Strategy 4: Offload Trivial Tasks to Local Models

If we use Ollama and VSCodium (Continue.dev/OpenCode), use them as your first line of defense:

Autocomplete (0 tokens):

Let qwen2.5-coder:1.5b-base handle line-by-line autocomplete as you type

Small Refactoring (0 tokens):

Use local deepseek-r1:1.5b in OpenCode terminal for:
- Regex generation
- Code formatting
- Simple unit tests
- Variable renaming

Only escalate to premium APIs (Claude Opus/Gemini Pro) when:

Hitting a dead end (complex Nginx config bugs, intricate state management)
Architectural decisions needed
Complex debugging requiring deep reasoning

Cost Comparison Example

Traditional approach (all premium API):

Architect + Implementation: ~50,000 tokens ($$$)

Hybrid approach:

Architect (Claude Opus): ~5,000 tokens ($)
Implementation (Gemini Flash or local): ~0-1,000 tokens (¢ or free)
Savings: ~90%

GPT,Llama models also good in tool calling mode, so you can use them for tool orchestration,bash,deploy,etc while letting other models handle the actual code generation.

For complete master prompts and detailed strategies, see:

guides/COST_SAVING_PROMPTS.md - Battle-tested prompts for Architect & Builder workflow, also use the magic of DCP plugins to manage your context and token usage in opencode, this is a quick summary of my patterns for maximizing output while minimizing tokens.

1. AGENTS.md → model fallback chain
2. DCP       → 20-40% saving ctx
3. Plan mode → focused & minimal iterations
4. Session/task → keep context clean
5. /dcp sweep {n} → prune irrelevant context
6. /compact       → aggressive compression

1. /dcp context  ← check current state
2. /dcp sweep 10 ← prune least relevant ctx
3. /dcp context  ← confirm pruning results
4. /compact      ← last resort

Quick Start

Read docs/quick-start.md
For the tested setup, start with docs/installation/local-gpu-4gb.md
For the lighter CPU-oriented path, read docs/installation/local-cpu-only.md

Repository Structure

docs/       -> setup guides, operations, troubleshooting, contribution notes
templates/  -> reusable public-safe config templates by environment
guides/     -> quick command references
scripts/    -> sync helpers and utilities (including Ollama auto-start fix)
assets/     -> screenshots and images

Utilities

Ollama Auto-Start Fix

Automated solution for Ollama startup issues after PC restart. Solves the common problem where Ollama processes hang and require manual restart after reboot.

Location: scripts/ollama/

Quick Setup:

cscript scripts\ollama\setup-ollama-startup.vbs

Features:

Automatic startup after PC restart
Kills hung processes automatically
10-second delay for system stability
Runs minimized in background
No admin privileges required

Documentation:

Documentation Map

Agents

docs/agents/openclaw-agent.md - OpenClaw setup, multi-agent, Telegram, skills, cloud fallback
docs/agents/aider.md - Aider configuration and usage
docs/agents/pi-agent.md - Pi agent setup
docs/agents/continue-dev.md - Continue.dev editor integration
docs/agents/opencode.md - OpenCode hosted workflow

Guides

guides/OPENCLAW_MODEL_GUIDE.md - Model strategy, Phi4-mini, fallback chain, VRAM tuning
guides/OPENCLAW_PERFORMANCE_GUIDE.md - 4GB VRAM optimization, timeouts, context window, disk management
guides/AGENT_COMMANDS.md - All launcher commands for Aider, OpenClaw, Pi
guides/COST_SAVING_PROMPTS.md - Architect & Builder workflow prompts

Setup

docs/overview.md
docs/quick-start.md
docs/installation/local-gpu-4gb.md
docs/installation/local-cpu-only.md
docs/models/model-strategy.md

Operations

docs/operations/security.md - Security model, pairing, data privacy
docs/operations/backup-and-restore.md
docs/operations/update-workflow.md
docs/operations/contributor-sync-guide.md
docs/operations/troubleshooting.md

Utilities

scripts/ollama/README.md - Ollama auto-start utilities

Why Back Up Agent Configs at All?

If you use Aider, OpenClaw, Pi, Ollama, custom launchers, and local model routing, your setup becomes part of your engineering environment.

Backing it up gives you:

a faster restore path on a new machine
versioned changes to prompts, configs, and launchers
cleaner experimentation with different models
easier sharing of sanitized setups with other people

Public Repo vs Private Repo

This project works best with two repositories:

public repo -> reusable docs, templates, contributor-friendly structure
private repo -> your live working backup and machine-specific copies

Recommended rule:

private repo first
public repo second

Keeping the Repository in Sync With a Local Setup

This repository is designed to be updated from a real local working setup.

Recommended order:

update and validate the local machine setup
sync the private backup repository
refresh the public templates
update the docs if commands or model strategy changed

Useful scripts:

# from the public repo
powershell -ExecutionPolicy Bypass -File .\scripts\sync-all.ps1

# or run the public step only
powershell -ExecutionPolicy Bypass -File .\scripts\refresh-public-templates.ps1

For more detail, read:

docs/operations/update-workflow.md
docs/operations/contributor-sync-guide.md

Share Your Setup

This repository is intentionally structured so other people can contribute their own working AI-agent setups.

Useful contributions include:

cloud VPS without GPU
Linux desktop
macOS
larger local GPU setups
better CPU-only model recommendations
alternative local model families
Continue.dev presets and editor configs
OpenCode or Build NVIDIA hosted usage notes
Zed IDE workflow notes
different launcher strategies

If you maintain your own agent stack, consider contributing:

sanitized config templates
documented model choices
launcher patterns
hardware notes
troubleshooting notes

The goal is simple: make it easier for other people to back up, understand, and share agent setups that actually work.

If your setup differs from this one, that is a feature, not a problem. Different hardware tiers, operating systems, model families, and launcher strategies are all useful contributions when they are documented clearly.

Contributing

Contributions are welcome.

See docs/contribution-guide.md for the preferred direction and docs/operations/contributor-sync-guide.md for the sync/update workflow.

Good contribution examples:

a tested CPU-only setup with measured tradeoffs
a Linux-native workflow
a cloud VPS no-GPU workflow
a cleaner launcher design
better restore or troubleshooting notes

Security Notes

Do not commit real API keys, auth files, or session state
Treat these templates as public-safe examples unless explicitly marked as private backup material
Review all launcher scripts before running them on production machines

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
assets/images		assets/images
docs		docs
guides		guides
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

FUSE AI 🤖 (FUll SEtup AI Agents)

Why This Repository Exists

Who This Repository Is For

Supported Agents

Related Tooling Ecosystem

Useful External Resources

Supported Environments

Current Local GPU 4GB Model Strategy

Local Models (Ollama)

Cloud Fallbacks (Optional, Last Resort)

OpenClaw Auto-Swap Fallback Chain

Environment Comparison

Cost-Saving Strategies: Hybrid Workflow with Local + API Models

⚠️ AI Cost Warning

Recommended Execution Models

Strategy 1: Architect & Builder Workflow

Strategy 2: Output Diffs Only (Anti-Rewrite)

Strategy 3: Provide a "Map", Not the "Territory" (Skeletal Context)

Strategy 4: Offload Trivial Tasks to Local Models

Cost Comparison Example

Quick Start

Repository Structure

Utilities

Ollama Auto-Start Fix

Documentation Map

Agents

Guides

Setup

Operations

Utilities

Why Back Up Agent Configs at All?

Public Repo vs Private Repo

Keeping the Repository in Sync With a Local Setup

Share Your Setup

Contributing

Security Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages