Concourse Architecture

This document provides a comprehensive overview of Concourse's system architecture, explaining how the various components work together to deliver a distributed database warehouse for transactions, search, and analytics across time.

System Overview
Module Structure
Data Model
Storage Engine Architecture
Automatic Indexing
Data Transport
Compaction
Version Control
Concurrency Model
Communication Layer
Plugin System

System Overview

Concourse is architected as a multi-layered system with clear separation between client drivers, the core server, storage engine, and extensible plugin framework.

┌─────────────────────────────────────────────────────────────────────────┐
│                           Client Layer                                   │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │  Java    │ │  Python  │ │   PHP    │ │   Ruby   │ │  CaSH    │       │
│  │  Driver  │ │  Driver  │ │  Driver  │ │  Driver  │ │  Shell   │       │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘       │
└───────┼────────────┼────────────┼────────────┼────────────┼─────────────┘
        │            │            │            │            │
        └────────────┴────────────┴─────┬──────┴────────────┘
                                        │
┌───────────────────────────────────────┼─────────────────────────────────┐
│                          Concourse Server                                │
│  ┌────────────────────────────────────┼────────────────────────────┐    │
│  │              Communication Layer   │                             │    │
│  │  ┌─────────────────┐  ┌───────────┴───────────┐                 │    │
│  │  │   Thrift RPC    │  │     HTTP/REST API     │                 │    │
│  │  └────────┬────────┘  └───────────┬───────────┘                 │    │
│  └───────────┼───────────────────────┼─────────────────────────────┘    │
│              └───────────┬───────────┘                                   │
│  ┌───────────────────────┼─────────────────────────────────────────┐    │
│  │           ConcourseServer (Core API)                             │    │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │    │
│  │  │   Operations    │  │  Plugin Manager │  │  HTTP Server    │  │    │
│  │  └────────┬────────┘  └────────┬────────┘  └─────────────────┘  │    │
│  └───────────┼────────────────────┼────────────────────────────────┘    │
│              │                    │                                      │
│  ┌───────────┼────────────────────┼────────────────────────────────┐    │
│  │           │      Storage Engine│                                 │    │
│  │  ┌────────┴────────┐  ┌────────┴────────┐                       │    │
│  │  │     Engine      │  │   Environments  │                       │    │
│  │  └────────┬────────┘  └─────────────────┘                       │    │
│  │  ┌────────┴────────┐  ┌─────────────────┐                       │    │
│  │  │     Buffer      │  │    Database     │                       │    │
│  │  │ (Write-Optimized)│  │(Read-Optimized) │                       │    │
│  │  └─────────────────┘  └─────────────────┘                       │    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
                                        │
                                        │ IPC
                                        ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         Plugin JVMs (External)                           │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐          │
│  │    Plugin 1     │  │    Plugin 2     │  │    Plugin N     │          │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘          │
└─────────────────────────────────────────────────────────────────────────┘

Module Structure

Concourse is organized as a multi-module Gradle project. Each module serves a specific purpose:

Core Modules

Module	Description
`concourse-server`	The core database server containing the storage engine, transaction management, plugin system, and server APIs
`concourse-driver-java`	The official Java client driver providing the `Concourse` API for connecting to and interacting with the server
`concourse-plugin-core`	Core framework for developing Concourse plugins, including base classes and communication utilities
`concourse-shell`	The CaSH (Concourse Shell) - an interactive Groovy-based shell for database operations
`concourse-cli`	Command-line interface tools for server management and administration

Data Tools

Module	Description
`concourse-import`	Tools for importing data from various formats (CSV, JSON, etc.) into Concourse
`concourse-export`	Tools for exporting data from Concourse to various formats

Other Language Drivers

Module	Description
`concourse-driver-python`	Python client driver
`concourse-driver-php`	PHP client driver
`concourse-driver-ruby`	Ruby client driver

Testing Modules

Module	Description
`concourse-unit-test-core`	Base classes and utilities for unit testing
`concourse-ete-test-core`	End-to-end testing framework for running tests against real server instances
`concourse-integration-tests`	Integration test suite
`concourse-ete-tests`	End-to-end tests including cross-version and performance tests
`concourse-plugin-core-tests`	Tests for the plugin framework
`concourse-upgrade-tests`	Tests for upgrade scenarios between versions

Automation

Module	Description
`concourse-automation`	Tools for programmatic interaction with the codebase and release artifacts

Data Model

Concourse uses a document-graph data model that is both flexible and powerful:

Core Concepts

Record: A logical grouping of data about a single entity (person, place, thing). Each record has a unique 64-bit identifier and contains a collection of key/value pairs.
Key: A string attribute name that maps to one or more values within a record. Keys are schemaless - they don't need to be declared in advance.
Value: A dynamically-typed quantity associated with a key. Concourse natively supports: boolean, double, float, integer, long, string (UTF-8), and Link.
Link: A special value type that creates a reference from one record to another, enabling graph-like relationships.

Schema-Free Design

Concourse is schemaless, meaning:

No tables or collections to define
No schema migrations required
Keys can be added to any record at any time
Values are dynamically typed
The same key can hold different value types across records

Document-Graph Structure

Record 1                          Record 2
┌─────────────────────────┐      ┌─────────────────────────┐
│ name: "John Doe"        │      │ name: "Acme Corp"       │
│ age: 30                 │      │ industry: "Technology"  │
│ employer: @Link(2) ─────┼──────┼─▶                       │
│ skills: ["Java", "SQL"] │      │ employees: [@Link(1)]   │
└─────────────────────────┘      └─────────────────────────┘

Storage Engine Architecture

The storage engine is the heart of Concourse, designed for both high write throughput and efficient reads.

BufferedStore Pattern

Concourse uses a BufferedStore pattern that combines two complementary storage systems:

                    ┌─────────────────────────────────────┐
                    │              Engine                  │
                    │  ┌─────────┐       ┌─────────────┐  │
   Write ──────────▶│  │ Buffer  │──────▶│  Database   │  │
                    │  │ (Limbo) │       │  (Durable)  │  │
                    │  └─────────┘       └─────────────┘  │
                    │       │                   │         │
   Read ◀───────────│◀──────┴───────────────────┘         │
                    └─────────────────────────────────────┘

Engine

The Engine class is the primary coordinator for all storage operations:

Schedules concurrent CRUD operations
Manages ACID transactions
Versions writes and coordinates indexing
Provides global locking coordination via LockBroker

Buffer

The Buffer is a write-optimized, append-only storage system:

Accepts writes immediately with constant-time performance
Stores data in memory-mapped Pages
Provides durability through append-only log structure
Data is eventually transported to the Database for indexing

Database

The Database is a read-optimized, indexed storage system:

Stores data in immutable Segments
Maintains multiple index structures for efficient queries
Supports background compaction
Handles historical queries via version tracking

Segments

A Segment is an immutable storage unit containing three types of Chunks:

Chunk Type	Purpose	Record Type
`TableChunk`	Primary data storage	`TableRecord` - stores key/value pairs per record
`IndexChunk`	Secondary indexes	`IndexRecord` - inverted index mapping values to records
`CorpusChunk`	Full-text search index	`CorpusRecord` - search corpus with positional data

Segment
┌─────────────────────────────────────────────────────────┐
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │
│  │ TableChunk  │  │ IndexChunk  │  │  CorpusChunk    │  │
│  │             │  │             │  │                 │  │
│  │ Record Data │  │ Value→Recs  │  │ Term→Positions │  │
│  └─────────────┘  └─────────────┘  └─────────────────┘  │
│                                                         │
│  ┌─────────────┐  ┌─────────────────────────────────┐  │
│  │  Manifest   │  │         Bloom Filters           │  │
│  └─────────────┘  └─────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Automatic Indexing

One of Concourse's key features is automatic indexing - every value is indexed without manual configuration.

Index Types

Table Records: Primary storage mapping (record, key) → values
Index Records: Secondary indexes mapping (key, value) → records
Corpus Records: Full-text search indexes mapping (key, term, position) → records

Constant-Time Writes

Concourse guarantees constant-time writes through:

Append-only Buffer: Writes are immediately appended to the Buffer
Background Indexing: Full indexing happens asynchronously during transport
SearchIndexer: Multi-threaded search indexing in the background

Search Indexing

The SearchIndexer handles asynchronous full-text indexing:

Write Operation
      │
      ▼
┌───────────┐     ┌──────────────┐     ┌──────────────┐
│   Buffer  │────▶│  Transporter │────▶│   Segment    │
└───────────┘     └──────────────┘     └──────┬───────┘
                                              │
                  ┌───────────────────────────┘
                  │
                  ▼
            ┌───────────┐
            │  Search   │──▶ Worker Thread 1
            │  Indexer  │──▶ Worker Thread 2
            │           │──▶ Worker Thread N
            └───────────┘

Data Transport

Data transport moves writes from the Buffer to the Database, where they become fully indexed.

Transport Strategies

Concourse supports two transport strategies:

Streaming Transporter (Legacy)

Processes writes incrementally in small batches
Transport operations compete with reads/writes for resources
Amortizes indexing cost across normal operations
Can cause "stop-the-world" situations under high load

Batch Transporter (Default)

Performs indexing entirely in the background
Reads and writes continue uninterrupted during indexing
Only brief critical section when merging indexed data
Dramatically improves system throughput

Batch Transport Flow
────────────────────

Buffer                    Background                  Database
┌─────────┐              ┌──────────────┐           ┌──────────┐
│ Page 1  │───────────▶  │ Create       │           │          │
│ Page 2  │              │ Segment +    │           │          │
│ Page 3  │              │ Index        │           │          │
└─────────┘              └──────┬───────┘           │          │
                                │                    │          │
                         ┌──────▼───────┐           │          │
                         │ Merge        │──────────▶│ Segments │
                         │ (Critical)   │           │          │
                         └──────────────┘           └──────────┘

Configuration

Transport behavior is configured in concourse.yaml:

# Simple configuration
transporter: batch  # or "streaming"

# Advanced configuration
transporter:
  type: batch
  num_threads: 2

Compaction

Compaction optimizes storage by merging adjacent Segments.

SimilarityCompactor

The default compaction strategy merges Segments with similar characteristics:

Reduces total number of Segments
Eliminates redundant revision data
Improves read performance by reducing I/O

Compaction Modes

Incremental Compaction: Opportunistic, runs frequently but only if no conflicting work
Full Compaction: Aggressive, blocks until it can process all Segments

Compaction Cycle

Segments: [S1] [S2] [S3] [S4] [S5]
              ╲   ╱
         Shift 1: Try compact S2+S3
                    │
                    ▼
Segments: [S1] [S2'] [S4] [S5]    (if successful)
               ╲    ╱
         Shift 2: Try compact S2'+S4
                    │
                    ▼
              ... continue ...

Version Control

Concourse automatically tracks all changes to data, enabling powerful time-travel queries.

Revision-Based Storage

Every write creates a new Revision rather than modifying existing data:

Key "name" in Record 1
────────────────────────
Version 1: ADD "John"     @ timestamp T1
Version 2: REMOVE "John"  @ timestamp T2
Version 3: ADD "Johnny"   @ timestamp T3

Historical Queries

Query data as it existed at any point in time:

// Get current value
get(key="name", record=1)

// Get value as of specific time
get(key="name", record=1, time=time("2023-01-01"))

// Find records matching criteria in the past
find("status = active", time=time("last month"))

Audit Trail

The audit() function provides a complete history of changes:

audit(record=1)
// Returns: Map<Timestamp, String> describing each revision

Revert Capability

Atomically revert data to a previous state:

revert(key="name", record=1, time=time("yesterday"))

Concurrency Model

Concourse uses sophisticated concurrency controls to ensure ACID compliance while maximizing throughput.

Just-in-Time (JIT) Locking

Rather than acquiring all locks upfront, Concourse uses JIT locking:

Locks are acquired only when needed
Reduces contention for non-conflicting operations
Enables higher concurrency

Lock Types

Lock Type	Purpose
Token Locks	Granular locks for specific `(key, record)` pairs
Range Locks	Protect range queries from concurrent modifications
Transport Locks	Coordinate Buffer-to-Database transport
Shared Locks	Record-level coordination

LockBroker

The LockBroker is the central coordinator for all locks:

                    ┌─────────────────┐
                    │   LockBroker    │
                    │                 │
   Operation 1 ────▶│  Token Locks    │◀──── Operation 2
                    │  Range Locks    │
   Operation 3 ────▶│  Shared Locks   │◀──── Operation 4
                    └─────────────────┘

Atomic Operations and Transactions

AtomicOperation: Lightweight atomic operations within a single environment
Transaction: Full ACID transactions that can span multiple operations

// Atomic operation
stage
add("balance", 100, record1)
remove("balance", 50, record1)
commit

// Transaction with isolation
stage({
    // All operations here are isolated and atomic
    set("status", "complete", order)
    add("completed_orders", Link.to(order), customer)
})

Communication Layer

Thrift RPC (Primary)

The primary communication protocol uses Apache Thrift:

Defined in interface/concourse.thrift
Supports all client drivers
Binary protocol for efficiency

HTTP/REST API

RESTful API for web-based access:

JSON request/response format
RESTQL-compatible query syntax
Token-based authentication

Thrift Interface

The Thrift service definition includes:

service ConcourseService {
    void abort(AccessToken creds, TransactionToken transaction, string environment)
    long addKeyValue(string key, TObject value, AccessToken creds, ...)
    Map<Long, Boolean> addKeyValueRecords(string key, TObject value, list<i64> records, ...)
    // ... hundreds of methods
}

Plugin System

Concourse's plugin system enables extending functionality without modifying core code.

Plugin Architecture

Plugins run in separate JVMs and communicate via IPC:

┌─────────────────────────────────────────────────────────┐
│                    Concourse Server                      │
│  ┌─────────────────────────────────────────────────┐    │
│  │               Plugin Manager                     │    │
│  │  ┌──────────────┐  ┌──────────────────────┐     │    │
│  │  │   Registry   │  │  IPC Channels        │     │    │
│  │  └──────────────┘  │  - fromServer        │     │    │
│  │                    │  - fromPlugin        │     │    │
│  │                    └──────────────────────┘     │    │
│  └─────────────────────────────────────────────────┘    │
└───────────────────────────┬─────────────────────────────┘
                            │ IPC (Shared Memory)
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│   Plugin A    │   │   Plugin B    │   │   Plugin C    │
│   (JVM 1)     │   │   (JVM 2)     │   │   (JVM 3)     │
└───────────────┘   └───────────────┘   └───────────────┘

Plugin Lifecycle

Install: Bundle is extracted to the plugins directory
Activate: Plugin classes are discovered (internal or external bootstrapping)
Launch: Plugin JVM is started with IPC channels
Run: Plugin processes requests and can call back to server
Stop: Plugin JVM is terminated gracefully

Plugin Types

Standard Plugin: Extends Plugin base class
RealTimePlugin: Receives real-time data streams for live processing

Cross-Version Compatibility

Plugins can be compiled against different Java versions than the server:

External Bootstrapping: Discovers plugins via bytecode scanning without loading classes
Internal Bootstrapping: Uses Reflections for class discovery (requires version compatibility)

Plugin Configuration

plugins:
  bootstrapper: external  # or "internal"

# Per-plugin Java runtime
plugin_name:
  java_binary: /path/to/java

FilesExpand file tree

architecture.md

Latest commit

History