Build observability from dlt pipeline traces.
dlt_tracing is a lightweight utility that extends the native dlt experience by capturing pipeline trace information (last_trace) immediately after a pipeline run. These traces can be stored using a filesystem-backed dlt pipeline and later consumed for observability, BI reporting, or AI-driven anomaly detection.
dlt pipelines already produce rich execution traces, but they are often underutilized.
This package makes it easy to:
- Capture
pipeline.last_traceafter every run - Persist traces in a structured, queryable format
- Build observability dashboards (e.g., Power BI)
- Feed traces into AI agents for anomaly detection and insights
- Keep the developer experience native and minimal
No decorators. No monkey-patching. Just a clean wrapper around pipeline.run().
- Drop-in replacement for
pipeline.run() - Automatically collects dlt
last_trace - Stores traces using dlt filesystem destination
- Packaged as a pip-installable wheel
- Safe for production pipelines
- Fully dlt-native behavior
dlt_tracing/
├── dlt_tracing/
│ ├── init.py
│ └── trace_pipeline.py
├── setup.py
└── README.md
- Python 3.9+
- pip
- Virtual environment (recommended)
From the root of the project (where setup.py is located):
# Upgrade build tools
python -m pip install --upgrade pip setuptools wheel
# Build the wheel file
python setup.py bdist_wheel
Install the Wheel
Install from local path
pip install dist/dlt_tracing-0.1.0-py3-none-any.whl
Install from a shared location
pip install /path/to/dlt_tracing-0.1.0-py3-none-any.whl
All required dependencies will be installed automatically:
dlt[filesystem]
enlighten
duckdb
pandas
Usage
Basic Example
import dlt
from dlt_tracing import trace_pipeline
pipeline = dlt.pipeline(
pipeline_name="orders_pipeline",
destination="duckdb+parquet:///"
)
load_info = trace_pipeline(
pipeline,
bucket_url="/logs",
log_pipeline_name="dlthub_pipeline_logs",
table_name="breaches_logs"
).run(source)
print(load_info)
Your pipeline runs exactly as before
After completion, pipeline.last_trace is retrieved
Trace data is written using a filesystem-backed dlt pipeline
The original load_info is returned unchanged
If trace collection fails, the pipeline run still succeeds.
Build Power BI dashboards for pipeline monitoring
Track pipeline health, latency, and failures
Centralize traces in cloud storage (ADLS, S3, local FS)
Feed traces into AI agents for:
Anomaly detection
Trend analysis
Root-cause insights
Predictive monitoring
This enables a shift from reactive monitoring to proactive, AI-driven observability.
Design Principles
Explicit over implicit
No hidden behavior
No dlt internals modification
Production-safe defaults
Developer-controlled storage and analysis
License
MIT License