Qdrant Setup Guide for KiloCode Codebase Indexing

Date: November 23, 2025
Project: KiloCode Codebase Indexing
Model: qwen3-embedding:8b-fp16
Vector Database: Qdrant (local Docker deployment)

Introduction

This guide walks you through setting up Qdrant vector database to work with KiloCode's codebase indexing feature using the Qwen3-Embedding-8B model. We'll deploy Qdrant (optionally on the same Docker network as Ollama if using custom networks), configure it for 4096-dimensional embeddings (what Qwen3-8B outputs via Ollama), and integrate it with KiloCode.

Prerequisites:

Ollama running (Docker or native installation)
Model qwen3-embedding:8b-fp16 already pulled in Ollama
Docker and Docker Compose installed
Ubuntu Desktop with GPU access

Docker Network Configuration (Optional)
Deploy Qdrant with Docker Compose
Verify Qdrant is Running
Create the Collection
Test End-to-End Integration
Configure KiloCode
Monitor Initial Indexing
Understanding the Data Flow
Performance Expectations
Troubleshooting
Maintenance Commands

Docker Network Configuration (Optional)

If you're using a custom Docker network (like ollama-network in my setup), deploying Qdrant on the same network provides benefits:

Container-to-container communication: Direct internal communication without host routing
Name resolution: Containers can reference each other by name (e.g., http://qdrant:6333)
Security: Internal network traffic doesn't expose ports unnecessarily
Performance: Slightly faster than localhost routing

However, this is entirely optional. If you're running Ollama without a custom Docker network, simply omit the networks section from the docker-compose.yml below. Qdrant will work perfectly fine using localhost connections.

Note: KiloCode (running on your host) will always use localhost:6333 to connect to Qdrant, regardless of Docker network configuration.

Deploy Qdrant with Docker Compose

Create a file called docker-compose.yml in your preferred directory:

services:
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    networks:
      - ollama-network
    ports:
      - "6333:6333"  # HTTP API
      - "6334:6334"  # gRPC (optional)
    volumes:
      - qdrant_storage:/qdrant/storage
    restart: unless-stopped
    # Optional: Add health check
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:6333/"]
      interval: 30s
      timeout: 10s
      retries: 3

networks:
  ollama-network:
    external: true  # Use existing network

volumes:
  qdrant_storage:
    driver: local

What each section does:

image: Latest Qdrant version (~200MB download)
networks: (Optional) Joins your existing ollama-network if you use one; omit this section entirely if you don't use custom Docker networks
ports:
- 6333: HTTP API (for KiloCode and curl commands)
- 6334: gRPC (optional, for advanced use)
volumes: Persistent storage for your vector data
restart: Auto-restart on system reboot
healthcheck: Monitors Qdrant's health status

Note: If you're not using a custom Docker network like ollama-network, simply remove the networks: section from both the service definition (lines 62-63) and the networks declaration (lines 77-79). Qdrant will work perfectly with localhost connections.

Deploy the container:

docker compose up -d

Expected output (with custom network):

[+] Running 2/2
 ✔ Network ollama-network    Found
 ✔ Container qdrant          Started

Expected output (without custom network):

[+] Running 1/1
 ✔ Container qdrant          Started

Verify Qdrant is Running

Run these verification commands:

# 1. Check container is running
docker ps | grep qdrant

# Expected output:
# qdrant  qdrant/qdrant:latest  ... Up ... 0.0.0.0:6333->6333/tcp

# 2. Test the API endpoint
curl http://localhost:6333/

# Expected output:
# {"title":"qdrant - vector search engine","version":"1.x.x"}

# 3. (Optional) If using custom Docker network, verify Qdrant joined it
docker network inspect ollama-network | grep -A 5 qdrant

# Should show qdrant in the containers list (skip if not using custom network)

# 4. Open Qdrant dashboard (optional)
# Browse to: http://localhost:6333/dashboard

If all commands succeed, Qdrant is ready!

Create the Collection

Create a collection configured for Qwen3's 4096-dimensional embeddings:

curl -X PUT http://localhost:6333/collections/kilocode_codebase \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": {
      "size": 4096,
      "distance": "Cosine"
    }
  }'

Expected response:

{"result":true,"status":"ok","time":0.001234}

What this creates:

Collection name: kilocode_codebase
Vector dimensions: 4096 (matches Qwen3-Embedding-8B-FP16 output via Ollama)
Distance metric: Cosine (best for embeddings)

Verify collection was created:

curl http://localhost:6333/collections

# Should list kilocode_codebase in the result

View collection details:

curl http://localhost:6333/collections/kilocode_codebase | jq

# Shows: vectors_count, indexed_vectors_count, points_count, status

Test End-to-End Integration

Test that Ollama and Qdrant can work together:

# Generate a test embedding with Qwen3
curl http://localhost:11434/api/embeddings -d '{
  "model": "qwen3-embedding:8b-fp16",
  "prompt": "def hello_world(): print(\"Hello, World!\")"
}' > test_embedding.json

# Check the embedding was generated
cat test_embedding.json | head -20

# If you have jq installed, verify dimension:
cat test_embedding.json | jq '.embedding | length'
# Should output: 4096

What this tests:

Ollama responds and generates embeddings
Qwen3-Embedding-8B model is working
Output is 4096 dimensions (verified for this model)

Configure KiloCode

Open KiloCode settings (⚙️ icon in VS Code) and navigate to Codebase Indexing:

Settings Configuration

Codebase Indexing:
  ✅ Enable Codebase Indexing: ON

Embedding Provider:
  Provider: Ollama
  Base URL: http://localhost:11434
  Model: qwen3-embedding:8b-fp16
  Dimensions: 4096 (must match model output)

Vector Database:
  Provider: Qdrant
  URL: http://localhost:6333
  API Key: (leave empty for local setup)
  Collection Name: kilocode_codebase

Search Settings:
  Max Search Results: 50
  Min Block Size: 100 chars
  Max Block Size: 1000 chars

Critical Configuration Notes

Model Name Must Match Exactly: qwen3-embedding:8b-fp16
- Case-sensitive
- Must include the :8b-fp16 tag
Dimensions:
- Must be set to 4096 to match Qwen3-8B output
- Verify with: curl http://localhost:11434/api/embeddings -d '{"model": "qwen3-embedding:8b-fp16", "prompt": "test"}' | jq '.embedding | length'
No API Key Needed:
- Both services run locally without authentication
- Only set API key if you've secured Qdrant (advanced)
Collection Name:
- Must match the collection we created: kilocode_codebase

Click Save to start indexing your codebase.

Monitor Initial Indexing

Watch the indexing process:

# Monitor GPU usage (should spike during indexing)
watch -n 1 nvidia-smi

# Monitor container resources
docker stats ollama qdrant

# Check Qdrant collection size (grows as vectors are added)
watch -n 5 'curl -s http://localhost:6333/collections/kilocode_codebase | jq .result.points_count'

Expected behavior during indexing:

GPU 0 (RTX 4090): VRAM usage increases to ~15GB
CPU: Spikes as Tree-sitter parses files
Qdrant: points_count increases as code blocks are indexed
KiloCode UI: Shows "Indexing" status (yellow indicator)

When complete:

KiloCode status shows "Indexed" (green indicator)
Qdrant points_count matches your codebase size
GPU usage drops back to idle

Understanding the Data Flow

Here's how the entire system works:

Indexing Flow

1. KiloCode scans your project files
   ↓
2. Tree-sitter parses code into semantic blocks (functions, classes, methods)
   ↓
3. Each code block → Ollama API (http://localhost:11434)
   ↓
4. Qwen3-Embedding-8B processes text on GPU
   ↓
5. Returns 4096-dimensional vector
   ↓
6. KiloCode stores vector in Qdrant (http://localhost:6333)
   ↓
7. Qdrant indexes vector for fast similarity search

Search Flow

Your search query
   ↓
Ollama (Qwen3-Embedding-8B)
   ↓
4096-dimensional query vector
   ↓
Qdrant similarity search
   ↓
Top-K most similar code vectors
   ↓
KiloCode retrieves corresponding code blocks
   ↓
Results displayed in KiloCode

Key Points:

Parsing happens locally (Tree-sitter)
Embeddings generated locally (Ollama + GPU)
Vectors stored locally (Qdrant)
No data leaves your machine

Performance Expectations

For a Typical Project (5K-10K code blocks)

Indexing Performance:

Initial build time: Varies by codebase size (GPU-accelerated)
Bottleneck: GPU embedding generation (not Qdrant)
VRAM usage: ~15GB (Qwen3) + minimal for OS

Search Performance:

Query latency: Fast local search (milliseconds)
Embedding generation: GPU-accelerated (Qwen3)
Vector search: Fast similarity matching (Qdrant)
Post-processing: Minimal overhead (KiloCode)
Consistent latency (local = no network variance)

Quality Metrics:

High retrieval accuracy observed
Top results typically very relevant to query
Semantic understanding finds conceptually similar code

Resource Usage

GPU (RTX 4090):

Idle:      ~2GB VRAM (OS)
Indexing:  ~15GB VRAM (Qwen3 model)
Searching: ~15GB VRAM (Qwen3 model)
Available: ~9GB VRAM (for other tasks)

Qdrant Memory (typical codebase with 10K code blocks):

Vectors:   ~160MB (10K × 4096 dims × 4 bytes)
Qdrant:    ~240MB (indexes + overhead)
Total:     ~400MB RAM

Disk Storage:

Qwen3 model:     ~15GB (one-time)
Qdrant data:     ~100-500MB (depends on codebase size)
Docker images:   ~200MB (Qdrant)

Troubleshooting

Note: For Ollama-specific issues (model not found, embedding generation errors), see 3_QWEN3_OLLAMA_GUIDE.md. For general KiloCode troubleshooting, see README.md or FAQ.md. This section covers Qdrant-specific issues only.

Issue 1: Qdrant connection refused

Symptom: KiloCode can't connect to Qdrant

Diagnosis:

# Check Qdrant is running
docker ps | grep qdrant

# Test API endpoint
curl http://localhost:6333/

Fix:

# Restart Qdrant
docker compose restart qdrant

# Check logs for errors
docker logs qdrant --tail 50

Issue 2: Dimension mismatch error

Symptom: Error about vector dimensions not matching

This means: Collection was created with wrong dimensions or model output changed

Fix:

# Delete collection
curl -X DELETE http://localhost:6333/collections/kilocode_codebase

# Recreate with correct dimensions (4096)
curl -X PUT http://localhost:6333/collections/kilocode_codebase \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": {
      "size": 4096,
      "distance": "Cosine"
    }
  }'

# Rebuild index in KiloCode (Settings → Rebuild Index)

Issue 3: Search results seem irrelevant

Symptom: Search doesn't find expected code

Possible causes:

Index is stale (code changed but not reindexed)
Files excluded by .gitignore or .kilocode patterns
Search query too vague

Fix:

# Check what's indexed
curl http://localhost:6333/collections/kilocode_codebase | jq .result.points_count

# Rebuild index in KiloCode
# Settings → Codebase Indexing → Rebuild Index button

# Try more specific search queries
# Example: "authentication middleware" vs "auth"

Maintenance Commands

Check Collection Status

# View collection info
curl http://localhost:6333/collections/kilocode_codebase | jq

# Count indexed vectors
curl -s http://localhost:6333/collections/kilocode_codebase | jq .result.points_count

# Check collection health
curl http://localhost:6333/collections/kilocode_codebase | jq .result.status
# Should be "green"

Backup Qdrant Data

# Create backup of Qdrant storage
docker run --rm \
  -v qdrant_storage:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/qdrant-backup-$(date +%Y%m%d).tar.gz /data

# Backup will be saved in current directory
ls -lh qdrant-backup-*.tar.gz

Restore from Backup

# Stop Qdrant
docker compose stop qdrant

# Restore backup
docker run --rm \
  -v qdrant_storage:/data \
  -v $(pwd):/backup \
  alpine tar xzf /backup/qdrant-backup-20251123.tar.gz -C /

# Start Qdrant
docker compose start qdrant

View Qdrant Dashboard

Open in your browser:

http://localhost:6333/dashboard

Dashboard features:

Collection overview
Vector count and storage
Search performance metrics
Collection configuration

Restart Services

# Restart Qdrant only
docker compose restart qdrant

# Restart both Ollama and Qdrant
docker restart ollama
docker compose restart qdrant

# View logs
docker logs qdrant --tail 100 --follow
docker logs ollama --tail 100 --follow

Update Qdrant

# Pull latest image
docker compose pull qdrant

# Recreate container with new image
docker compose up -d --force-recreate qdrant

# Verify version
curl http://localhost:6333/ | jq .version

Optional: Add Security (For LAN Sharing)

If you want to secure Qdrant with an API key:

1. Update docker-compose.yml:

services:
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    networks:
      - ollama-network
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_storage:/qdrant/storage
    restart: unless-stopped
    environment:
      - QDRANT__SERVICE__API_KEY=your-super-secret-key-here  # Add this line

networks:
  ollama-network:
    external: true

volumes:
  qdrant_storage:
    driver: local

2. Recreate container:

docker compose up -d --force-recreate qdrant

3. Update KiloCode settings:

Add the API key in "Qdrant API Key" field
Save settings

4. Test with API key:

curl -H "api-key: your-super-secret-key-here" \
  http://localhost:6333/collections

Next Steps After Setup

Test search quality: Try natural language queries in KiloCode
- Example: "user authentication logic"
- Example: "database connection setup"
- Example: "error handling patterns"
Monitor performance: Check the Qdrant dashboard
- Watch search latency
- Verify vector count matches expectations
- Check memory usage
Adjust settings if needed:
- Increase "Max Search Results" for more context (20 → 50)
- Modify "Max Block Size" for larger code blocks (1000 → 1500)
Set up file watching: KiloCode auto-reindexes changed files
- Edit a file and save
- Watch Qdrant points_count update
- Verify search finds new content

Summary

You now have a production-ready local codebase indexing system:

✅ Qwen3-Embedding-8B: State-of-the-art code embeddings (SOTA for consumer GPUs, 80.68 on MTEB Code) ✅ Qdrant: Fast, efficient vector database ✅ 4096 dimensions: Maximum quality (Qwen3-8B output) ✅ Local setup: Complete privacy, no API costs ✅ GPU-accelerated: Fast indexing and search

Your setup delivers:

Fast local search (milliseconds)
High retrieval accuracy
Minimal ongoing electricity costs
Unlimited searches with no rate limits

Architecture:

KiloCode (VS Code)
    ↓
Ollama (qwen3-embedding:8b-fp16) → RTX 4090 (15GB VRAM)
    ↓
Qdrant (kilocode_codebase collection) → ~100MB RAM

Happy coding! 🚀

Document Version: 1.0
Last Updated: November 23, 2025
Author: AI Implementation Guide
Project: KiloCode Codebase Indexing with Qwen3-Embedding-8B + Qdrant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant Setup Guide for KiloCode Codebase Indexing

Introduction

Table of Contents

Docker Network Configuration (Optional)

Deploy Qdrant with Docker Compose

Verify Qdrant is Running

Create the Collection

Test End-to-End Integration

Configure KiloCode

Settings Configuration

Critical Configuration Notes

Monitor Initial Indexing

Understanding the Data Flow

Indexing Flow

Search Flow

Performance Expectations

For a Typical Project (5K-10K code blocks)

Resource Usage

Troubleshooting

Issue 1: Qdrant connection refused

Issue 2: Dimension mismatch error

Issue 3: Search results seem irrelevant

Maintenance Commands

Check Collection Status

Backup Qdrant Data

Restore from Backup

View Qdrant Dashboard

Restart Services

Update Qdrant

Optional: Add Security (For LAN Sharing)

Next Steps After Setup

Summary

FilesExpand file tree

4_QDRANT_INSTALLATION_GUIDE.md

Latest commit

History

4_QDRANT_INSTALLATION_GUIDE.md

File metadata and controls

Qdrant Setup Guide for KiloCode Codebase Indexing

Introduction

Table of Contents

Docker Network Configuration (Optional)

Deploy Qdrant with Docker Compose

Verify Qdrant is Running

Create the Collection

Test End-to-End Integration

Configure KiloCode

Settings Configuration

Critical Configuration Notes

Monitor Initial Indexing

Understanding the Data Flow

Indexing Flow

Search Flow

Performance Expectations

For a Typical Project (5K-10K code blocks)

Resource Usage

Troubleshooting

Issue 1: Qdrant connection refused

Issue 2: Dimension mismatch error

Issue 3: Search results seem irrelevant

Maintenance Commands

Check Collection Status

Backup Qdrant Data

Restore from Backup

View Qdrant Dashboard

Restart Services

Update Qdrant

Optional: Add Security (For LAN Sharing)

Next Steps After Setup

Summary