Skip to content

graphsense/graphsense-lib

Repository files navigation

GraphSense Library

Test and Build Status PyPI version Python Downloads

A comprehensive Python library for the GraphSense crypto-analytics platform. It provides database access, data ingestion, maintenance tools, and analysis capabilities for cryptocurrency transactions and networks.

Note: This library uses optional dependencies. Use graphsense-lib[all] to install all features.

Quick Start

Installation

# Install with all features
uv add graphsense-lib[all]

# Install from source
git clone https://github.com/graphsense/graphsense-lib.git
cd graphsense-lib
make install

Serving the REST API locally

The web API requires two backend connections: a Cassandra cluster (blockchain data) and a TagStore (PostgreSQL). You can configure them via environment variables or a YAML config file.

Option A: Environment variables only

GS_CASSANDRA_ASYNC_NODES='["<cassandra-host>"]' \
GRAPHSENSE_TAGSTORE_READ_URL='postgresql+asyncpg://<user>:<password>@<host>:<port>/tagstore' \
GS_CASSANDRA_ASYNC_CURRENCIES='{"btc":{"raw": "btc_raw", "transformed": "btc_transformed"},"eth":{}}' \
uv run --extra web uvicorn graphsenselib.web.app:create_app --factory --host localhost --port 9000 --reload

Option B: YAML config file

Point CONFIG_FILE to a REST-specific config (see instance/config.yaml for a full example):

CONFIG_FILE=./instance/config.yaml make serve-web

Or without Make:

CONFIG_FILE=./instance/config.yaml \
uv run --extra web uvicorn graphsenselib.web.app:create_app --factory --host localhost --port 9000 --reload

Option C: .graphsense.yaml with a web key

If you already have a .graphsense.yaml (or ~/.graphsense.yaml) for the CLI, you can add a web key containing the REST config. The app will pick it up automatically without setting CONFIG_FILE:

# .graphsense.yaml
environments:
  # ... your existing CLI config ...

web:
  database:
    nodes: ["<cassandra-host>"]
    currencies:
      btc:
      eth:
  gs-tagstore:
    url: "postgresql+asyncpg://<user>:<password>@<host>:<port>/tagstore"
make serve-web

Config resolution order: explicit config_file param > CONFIG_FILE env var > ./instance/config.yaml > .graphsense.yaml web key > env vars only.

Optional REST settings (env vars)

Variable Default Description
GSREST_DISABLE_AUTH false Disable API key authentication
GSREST_ENSURE_TAGSTORE_SCHEMA_ON_STARTUP false Auto-initialize TagStore tables/views at startup when missing
GRAPHSENSE_TAGSTORE_READ_URL TagStore database URL (e.g., postgresql://user:password@host:5432/tagstore)
GSREST_ALLOWED_ORIGINS * CORS allowed origins
GSREST_LOGGING_LEVEL Logging level (DEBUG, INFO, …)
GS_CASSANDRA_ASYNC_PORT 9042 Cassandra port
GS_CASSANDRA_ASYNC_USERNAME Cassandra username
GS_CASSANDRA_ASYNC_PASSWORD Cassandra password

When enabling GSREST_ENSURE_TAGSTORE_SCHEMA_ON_STARTUP=true, keep in mind:

  • The DB user must have DDL privileges (create tables/views/indexes/extensions/procedures).
  • Startup may be slower because schema checks and potential initialization run before the app serves traffic.
  • In multi-replica deployments, initialize schema once (migration/init job) to avoid startup races.

If TagStore is not configured (gs-tagstore missing) or the TagStore URL is unreachable, the REST app now falls back to a mock TagStore so endpoints still work. In this mode, tag-specific responses (labels, actors, taxonomies, tag counts) are empty.

REST API evolution and deprecation policy

The REST API follows semantic versioning via info.version in the OpenAPI spec and the __api_version__ field in the library:

  • Patch (2.10.x): bug fixes, no schema changes.
  • Minor (2.y.0): additive changes — new endpoints, new fields, new optional parameters. Deprecations may be introduced here but deprecated surfaces continue to work.
  • Major (x.0.0): removal of deprecated surfaces and other breaking changes. Major bumps are rare and announced in advance.

Deprecated endpoints and fields remain fully functional for at least two minor releases or six months, whichever is longer. Deprecations are announced through three mechanisms, listed from most to least machine-readable:

OpenAPI schema (deprecated: true)

Deprecated paths and response fields carry deprecated: true in /openapi.json, which renders as a strikethrough in Swagger UI (/docs) and is propagated to the generated Python client's docstrings. Check the spec at build time to fail CI if you depend on a deprecated surface.

HTTP response headers (RFC 9745 / RFC 8594)

Responses from deprecated routes carry:

  • Deprecation: trueRFC 9745. Signals that this specific endpoint is deprecated. Currently emitted as the literal string true; future releases may upgrade it to an @<epoch> timestamp indicating when deprecation took effect.
  • Link: </docs#section/Deprecation-policy>; rel="deprecation"; type="text/html" — points clients at the authoritative policy page.
  • Sunset: <HTTP-date>RFC 8594. Announces the committed removal date for the endpoint. For the /entities/... endpoints (superseded by /clusters/...), the sunset is set to Sat, 31 Oct 2026 00:00:00 GMT. After that date the deprecated endpoints may be removed without further notice. Other deprecations may be introduced with different sunset dates in future releases.

To detect deprecation in your own client code, inspect the Deprecation and Sunset response headers and log a warning (or fail CI) when you hit a deprecated surface. Example with the generated Python client:

from graphsense import ApiClient, Configuration
from graphsense.api import ClustersApi

cfg = Configuration(host="https://api.iknaio.com", api_key={"api_key": "..."})
with ApiClient(cfg) as api_client:
    clusters = ClustersApi(api_client)
    response = clusters.get_cluster_with_http_info(currency="btc", cluster=264711)
    headers = response.headers
    if headers.get("Deprecation"):
        sunset = headers.get("Sunset", "no sunset date set")
        print(f"WARNING: endpoint deprecated (sunset: {sunset})")

CHANGELOG

Every deprecation is recorded in CHANGELOG.md under the release that introduced it, and every removal is recorded in the major release that applies it. Use the changelog as the audit trail when planning client upgrades.

Basic Usage

Database Access with Configuration File

from graphsenselib.db import DbFactory

# Using GraphSense config file (default: ~/.graphsense.yaml)
with DbFactory().from_config("development", "btc") as db:
    highest_block = db.transformed.get_highest_block()
    print(f"Highest BTC block: {highest_block}")

    # Get block details
    block = db.transformed.get_block(100000)
    print(f"Block 100000: {block.block_hash}")

Direct Database Connection

from graphsenselib.db import DbFactory

# Direct connection without config file
with DbFactory().from_name(
    raw_keyspace_name="eth_raw",
    transformed_keyspace_name="eth_transformed",
    schema_type="account",
    cassandra_nodes=["localhost"],
    currency="eth"
) as db:
    print(f"Highest block: {db.transformed.get_highest_block()}")

Async Database Services

The async services are used internally by the REST API and can also be used standalone. AddressesService depends on several other services:

from graphsenselib.db.asynchronous.services import (
    BlocksService, AddressesService, TagsService,
    EntitiesService, RatesService,
)

# Services are initialized with their dependencies
blocks_service = BlocksService(db, rates_service, config, logger)
addresses_service = AddressesService(
    db, tags_service, entities_service, blocks_service, rates_service, logger
)

address_info = await addresses_service.get_address("btc", "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa")
txs = await addresses_service.list_address_txs("btc", "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa")

Command Line Interface

GraphSense-lib exposes a comprehensive CLI tool: graphsense-cli

Basic Commands

# Show help and available commands
graphsense-cli --help

# Check version
graphsense-cli version

# Show current configuration
graphsense-cli config show

# Generate config template
graphsense-cli config template > ~/.graphsense.yaml

# Show config file path
graphsense-cli config path

Modules

Database Management

Query and manage the GraphSense database state.

# Show database management options
graphsense-cli db --help

# Check database state/summary
graphsense-cli db state -e development

# Get block information
graphsense-cli db block info -e development -c btc --height 100000

# Query logs (for Ethereum-based chains)
graphsense-cli db logs -e development -c eth --from-block 1000000 --to-block 1000100

Schema Operations

Create and validate database schemas.

# Show schema options
graphsense-cli schema --help

# Create database schema for a currency
graphsense-cli schema create -e dev -c btc

# Validate existing schema
graphsense-cli schema validate -e dev -c btc

# Show expected schema for currency
graphsense-cli schema show-by-currency btc

# Show schema by type (utxo/account)
graphsense-cli schema show-by-schema-type utxo

Data Ingestion

Ingest raw cryptocurrency data from nodes.

# Show ingestion options
graphsense-cli ingest --help

# Ingest blocks from cryptocurrency node
graphsense-cli ingest from-node \
    -e dev \
    -c btc \
    --start-block 0 \
    --end-block 1000 \
    --create-schema

# Ingest with custom batch size
graphsense-cli ingest from-node \
    -e dev \
    -c eth \
    --start-block 1000000 \
    --end-block 1001000 \
    --batch-size 100

Delta Updates

Update transformed keyspace from raw keyspace.

# Show delta update options
graphsense-cli delta-update --help

# Check update status
graphsense-cli delta-update status -e dev -c btc

# Perform delta update
graphsense-cli delta-update update -e dev -c btc

# Validate delta update consistency
graphsense-cli delta-update validate -e dev -c btc

# Patch exchange rates for specific blocks
graphsense-cli delta-update patch-exchange-rates \
    -e dev \
    -c btc \
    --start-block 100000 \
    --end-block 200000

Exchange Rates

Fetch and ingest exchange rates from various sources.

# Show exchange rate options
graphsense-cli exchange-rates --help

# Fetch from CoinDesk
graphsense-cli exchange-rates coindesk -e dev -c btc

# Fetch from CoinMarketCap (requires API key in config)
graphsense-cli exchange-rates coinmarketcap -e dev -c btc

Monitoring

Monitor GraphSense infrastructure health and state.

# Show monitoring options
graphsense-cli monitoring --help

# Get database summary
graphsense-cli monitoring get-summary -e dev

# Get summary for specific currency
graphsense-cli monitoring get-summary -e dev -c btc

# Send notifications to configured handlers
graphsense-cli monitoring notify \
    --topic "database-update" \
    --message "BTC ingestion completed"

Event Watching (Alpha)

Watch for cryptocurrency events and generate notifications.

# Show watch options
graphsense-cli watch --help

# Watch for money flows on specific addresses
graphsense-cli watch money-flows \
    -e dev \
    -c btc \
    --address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa \
    --threshold 1000000  # satoshis

File Conversion Tools

Convert between different file formats.

# Show conversion options
graphsense-cli convert --help

Configuration

GraphSense-lib uses a YAML configuration file that defines database connections and environment settings. Default locations: ./.graphsense.yaml, ~/.graphsense.yaml.

Generate Configuration Template

graphsense-cli config template > ~/.graphsense.yaml

Example Configuration Structure

# Optional: default environment to use
default_environment: dev

environments:
  dev:
    # Cassandra cluster configuration
    cassandra_nodes: ["localhost"]
    port: 9042
    # Optional authentication
    # username: "cassandra"
    # password: "cassandra"

    # Currency/keyspace configurations
    keyspaces:
      btc:
        raw_keyspace_name: "btc_raw"
        transformed_keyspace_name: "btc_transformed"
        schema_type: "utxo"

        # Node connection for ingestion
        ingest_config:
          node_reference: "http://localhost:8332"
          # Optional authentication for node
          # username: "rpcuser"
          # password: "rpcpassword"

        # Keyspace setup for schema creation
        keyspace_setup_config:
          raw:
            replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"
          transformed:
            replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"

      eth:
        raw_keyspace_name: "eth_raw"
        transformed_keyspace_name: "eth_transformed"
        schema_type: "account"

        ingest_config:
          node_reference: "http://localhost:8545"

        keyspace_setup_config:
          raw:
            replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"
          transformed:
            replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"

  prod:
    cassandra_nodes: ["cassandra1.prod", "cassandra2.prod", "cassandra3.prod"]
    username: "gs_user"
    password: "secure_password"

    keyspaces:
      btc:
        raw_keyspace_name: "btc_raw"
        transformed_keyspace_name: "btc_transformed"
        schema_type: "utxo"

        ingest_config:
          node_reference: "http://bitcoin-node.internal:8332"

        keyspace_setup_config:
          raw:
            replication_config: "{'class': 'NetworkTopologyStrategy', 'datacenter1': 3}"
          transformed:
            replication_config: "{'class': 'NetworkTopologyStrategy', 'datacenter1': 3}"

# Optional: Slack notification configuration
slack_topics:
  database-update:
    hooks: ["https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"]

  payment_flow_notifications:
    hooks: ["https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"]

# Optional: API keys for external services
coingecko_api_key: ""
coinmarketcap_api_key: "YOUR_CMC_API_KEY"

# Optional: cache directory for temporary files
cache_directory: "~/.graphsense/cache"

Advanced Features

Tagpack Management

GraphSense-lib includes comprehensive tagpack management tools (formerly standalone tagpack-tool). For detailed documentation, see Tagpack README.

# Validate tagpacks
graphsense-cli tagpack-tool tagpack validate /path/to/tagpack

# Insert tagpack into tagstore
graphsense-cli tagpack-tool insert \
    --url "postgresql://user:pass@localhost/tagstore" \
    /path/to/tagpack

# Show quality measures
graphsense-cli tagpack-tool quality show-measures \
    --url "postgresql://user:pass@localhost/tagstore"

Tagstore Operations

# Initialize tagstore database
graphsense-cli tagstore init

# Initialize with custom database URL
graphsense-cli tagstore init --db-url "postgresql://user:pass@localhost/tagstore"

# Get DDL SQL for manual setup
graphsense-cli tagstore get-create-sql

Cross-chain Analysis

# Using an initialized AddressesService (see above for setup)
related = await addresses_service.get_cross_chain_pubkey_related_addresses(
    "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa"
)

for addr in related:
    print(f"Network: {addr.network}, Address: {addr.address}")

Function Call Parsing

from graphsenselib.utils.function_call_parser import parse_function_call

# Parse Ethereum function calls
function_signatures = {
    "0xa9059cbb": [{
        "name": "transfer",
        "inputs": [
            {"name": "to", "type": "address"},
            {"name": "value", "type": "uint256"}
        ]
    }]
}

parsed = parse_function_call(tx_input_bytes, function_signatures)
if parsed:
    print(f"Function: {parsed['name']}")
    print(f"Parameters: {parsed['parameters']}")

Development

Important: Requires Python >=3.10, <3.13.

Setup Development Environment

# Initialize development environment (installs deps + pre-commit hooks)
make dev

# Or install dev dependencies only
make install-dev

Code Quality and Testing

Before committing, please format, lint, and test your code:

# Format code
make format

# Lint code
make lint

# Run fast tests
make test

# Or run all steps at once
make pre-commit

For comprehensive testing:

# Run complete test suite (including slow tests)
make test

Podman Notes

If you run the test suite with Podman, make sure your shell points at the Podman socket:

export DOCKER_HOST="unix://${XDG_RUNTIME_DIR}/podman/podman.sock"

The test fixtures automatically disable Ryuk when DOCKER_HOST contains podman.sock and rely on explicit fixture cleanup instead.

Release Process

This repository uses two source-of-truth versions in the root Makefile:

  • Library version: RELEASESEM (released with vX.Y.Z, vX.Y.Z-rc.N, or vX.Y.Z-dev.N tags)
  • OpenAPI/API version: WEBAPISEM (written to src/graphsenselib/web/version.py)

The Python client package version is derived from the API version and should match it.

Library package versioning is dynamic via setuptools_scm (pyproject.toml):

  • Git tag v2.9.8 -> package version 2.9.8
  • Git tag v2.9.8-rc.1 -> package version 2.9.8rc1
  • Git tag v2.9.8-dev.1 -> package version 2.9.8.dev1
  • Commits after a tag append local metadata, for example 2.9.8.dev1+g<sha>.d<date>

Use the root Makefile helpers:

# Show all current versions
make show-versions

# Update and validate OpenAPI contract version
make update-api-version WEBAPISEM=v2.10.0
make check-api-version WEBAPISEM=v2.10.0

# Sync client version from API version and validate
make sync-client-version WEBAPISEM=v2.10.0
make check-client-version WEBAPISEM=v2.10.0

# Generate Python client (package version = OpenAPI info.version)
make generate-python-client

# Create both release tags from Makefile versions
make tag-version

Tagging behavior:

  • Library release tag: vX.Y.Z, vX.Y.Z-rc.N, or vX.Y.Z-dev.N (from RELEASESEM)
  • Client release tag: webapi-vA.B.C (from WEBAPISEM)

Recommended library versioning routine:

  1. For development prereleases, set RELEASESEM to vX.Y.Z-dev.N (for example v2.10.0-dev.1)
  2. For release candidates, set RELEASESEM to vX.Y.Z-rc.N
  3. For stable releases, set RELEASESEM to vX.Y.Z
  4. Create tags with make tag-version
  5. Push tags with git push origin --tags

CI trigger background:

  • Stable library tags (vX.Y.Z) trigger:
    • GitHub Release creation
    • Python library package build/publish (graphsense-lib)
    • Docker image build/publish
  • Client tags (webapi-vA.B.C) trigger Python client package build/publish (clients/python)
  • Other library tags (vX.Y.Z-rc.N, vX.Y.Z-dev.N) do not trigger GitHub Release or Python package publish; they only trigger Docker image build/publish
  1. Update CHANGELOG.md with new features and fixes
  2. Update relevant versions (library/API/client) based on what changed
  3. Sync API/client versions if needed (make update-api-version + make sync-client-version)
  4. Create and push tags:
make tag-version
git push origin --tags

Troubleshooting

OpenSSL Errors

Some components use OpenSSL hash functions that aren't available by default in OpenSSL 3.0+ (e.g., ripemd160). This can cause test suite failures. To fix this, enable legacy providers in your OpenSSL configuration. See the "fix openssl legacy mode" step in .github/workflows/run_tests.yaml for an example.

Common Issues

  1. Connection Refused: Verify Cassandra is running and accessible
  2. Schema Validation Errors: Ensure database schema matches expected version
  3. Import Errors: Install with [all] option for complete feature set
  4. Python Version: Requires Python >=3.10, <3.13

Getting Help

License

See LICENSE file for licensing details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run make pre-commit to ensure code quality
  5. Submit a pull request

GraphSense - Open Source Crypto Analytics Platform Website: https://graphsense.github.io/

About

A central repository for Python utility functions and all components that interact with the GraphSense backend. The repository provides a CLI interface for managing essential GraphSense maintenance tasks and provides a REST interface used by the frontend (UI). It acts as the core repository, delivering foundational tool

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors