Agent Cortex is a local, multi-tool AI assistant built with LangChain, Fast retrieval, and a reasoning-capable language model (Mistral 7B). It can answer questions using a mix of:
- 🔍 Document retrieval (RAG)
- 💬 Short-term memory (chat history)
- 🧠 Long-term memory (fact retention across sessions)
- 🌐 Web search (DuckDuckGo)
- 🐍 Python interpreter tool
- 🧮 Math calculations
All running locally with no paid APIs.
In this terminal session, I tell Agent Cortex name which it identifies as an input fact which it saves to long-term storage vector database as well as short term memory. When asking again in the same session it is able to remember my name because it is given in the context because of short term memory. After shutting the system down and restarting, it is able to recall my name by determining to use the long-term memory tool which uses RAG:
-
Retrieval (RAG) —
Where are the fireworks on July 3rd?
→ Retrieved from local documents indexed about Bristol, RI events. -
Web Search —
What is the weather in Bristol, RI tomorrow?
→ Uses DuckDuckGo to search live internet results and summarizes forecast. -
Calculator Tool —
What is 5 * 7 + 15?
→ Routes through a custom calculator tool to evaluate the expression. -
Short-Term-Memory —
My name is Jacob.then later:What is my name?
→ Agent remembers and recalls personal facts. -
Long-Term-Memory — The agent determins if input is long term storage desirable and stores in a local vector database. upon shutting the agent down and startup up later it is able to recall information about the user that it had saved. → Agent remembers and recalls personal facts.
-
Python REPL —
sum([1, 2, 3])
→ Executes Python code securely using a local interpreter.
Each query is interpreted by the ReAct-based agent and routed to the appropriate tool — all executed locally with no API calls or internet billing of an LLM. The websearch is real though but not using outside LLMs for reasoning.
- Retrieval-Augmented Generation from
.txtdocuments - Short-term memory using chat history context
- Long-term memory — agent remembers facts across sessions
- Tool-based reasoning using LangChain’s ReAct agent
- DuckDuckGo web search integration
- 🐍 Python REPL tool for executing code
- Calculator for numeric inputs
- Mistral 7B via Ollama (runs locally as CLI)
- Fallback + context injection for vague queries
- CLI-based agent chat interface loop
- LangChain
- Ollama (local LLM hosting)
- Mistral 7B
- ChromaDB (vector store)
- HuggingFace Embeddings (
all-MiniLM-L6-v2) - Python 3.10
- Poetry for dependency management
This project uses Ollama to run the mistral model locally.
brew install ollamaOr download from ollama.com/download and install the desktop app.
ollama run mistralThis will download and launch the Mistral model. Leave it running.
Make sure
curl http://localhost:11434returns{"status":"ok"}
git clone https://github.com/YOUR_USERNAME/agent_cortex_v1.git
cd agent_cortex_v1poetry installPlace .txt files under data/documents, then run:
PYTHONPATH=. poetry run python scripts/index_documents.pypoetry run python main.pyYou'll be prompted with:
You:
Try asking:
"What time is the Fourth of July parade?""My name is Jacob" followed by "What is my name?""sum([2, 4, 6])""Who is todays date and the weather look like in Boston?""What is 25 * 4 + 3?"
While Agent Cortex v1 is functional, it's an early prototype with several known limitations:
- Long-term memory is fact-based only: It stores facts like names and locations, not full conversations.
- Short-term memory is session-only: Once you close the CLI, short-term context is reset.
- No agent reflection or self-correction: It does not retry intelligently or summarize thoughts beyond what the base model provides.
- Inconsistent ReAct formatting: The LLM may sometimes fail to produce valid Thought / Action / Action Input format, causing parsing errors or retries.
- Fallbacks are basic and do not yet include streaming or error correction
- Only supports .txt files: No PDF, HTML, or Markdown parsing.
- No document metadata or filtering: The retriever does not rank sources by type, date, or confidence.
- No chunking or advanced preprocessing: Raw text is split into single documents without semantic boundaries.
- No multi-vector fusion: Only single-query similarity search; no query rewriting or reranking logic.
- Static index: You must manually re-index documents after any updates.
- No streaming output: The full response is printed only after the agent completes.
- Latency: Mistral via Ollama is slower than hosted APIs, especially on lower-spec machines.
- Ollama dependency: Requires installing and running the Ollama server separately, which some users may find nontrivial.
- No fine-tuning: The Mistral model is used out-of-the-box with no task-specific customization.
- No prompt injection prevention: User input is not sanitized or structured securely for prompt-based attacks.
- No multi-turn tool use: Tools are single-action only — no recursive or multi-step reasoning chains.
MIT
