Skip to content

man4ish/omnibioai-workflow-bundles

Repository files navigation

OmniBioAI Workflow Bundles

Overview

omnibioai-workflow-bundles is the canonical repository for engine-agnostic, versioned bioinformatics workflow bundles used by the OmniBioAI Workflow Registry Service.

This repository is used for authoring, versioning, and testing workflows, and is not required in deployed OmniBioAI runtime environments.

All workflows in this repository are:

  • Authored and version-controlled in Git
  • Packaged as immutable workflow bundles
  • Uploaded via CLI into OmniBioAI
  • Stored as objects in OmniObjectService
  • Indexed in the Workflow Registry Service

Important: This repository is not accessed at runtime by OmniBioAI services or plugins. Runtime execution always resolves workflows via the Workflow Registry + Object Storage layer, never directly from Git.


What Is a Workflow Bundle?

A workflow bundle is a versioned, self-contained artifact that defines everything required to execute a computational pipeline using a specific workflow engine.

Each bundle may include:

  • Workflow definition files:
    • WDL (Cromwell)
    • Nextflow
    • Snakemake
    • CWL
  • Engine-specific configuration files
  • Container definitions (Docker / Conda / Apptainer)
  • Reference datasets or helper scripts (optional)
  • A strict manifest.json describing metadata and entrypoints

Key Principle: Versioned Immutability

Bundles in Git are mutable during development, but:

Every upload to OmniBioAI produces an immutable runtime artifact

Once registered:

  • Bundles are never modified
  • Each update creates a new version
  • Each version receives a unique object_id

Supported Workflow Engines

This repository supports multiple workflow engines:

  • WDL (Cromwell-compatible)
  • Nextflow
  • Snakemake
  • CWL

Important Rule

Each workflow bundle targets exactly one engine.

Equivalent workflows implemented in different engines are stored as separate bundles.


Repository Structure

This repository is organized by biological domain, with each subfolder containing multiple workflow bundles.


omnibioai-workflow-bundles/
├── wes/
│   └── omnibioai_wes_snakemake_v1/
│       ├── workflow/
│       │   ├── Snakefile
│       ├── config/
│       │   └── inputs.json
│       ├── envs/
│       ├── docker/
│       ├── test/
│       └── manifest.json
│
├── rnaseq/
├── chipseq/
├── atacseq/
├── wgs/
├── sv/
├── spatial/
├── cellranger/
└── README.md

Key Rules

  • Each directory under a domain (e.g., wes/, rnaseq/) contains multiple workflow bundles
  • Each bundle is self-contained and versioned
  • Directory names are human-readable only
  • Canonical identity is defined by manifest.json, not filesystem paths

Bundle Identity and Versioning

Each workflow bundle is uniquely identified by:


(category, engine, name, version)

Example


category: wes
engine: snakemake
name: omnibioai_wes_snakemake_v1
version: 1.0.0

Versioning Rules

When a new version is uploaded:

  • A new immutable bundle is created
  • A new object_id is generated
  • A new registry entry is inserted
  • Previous versions remain fully accessible and unchanged

Manifest Contract (manifest.json)

Each bundle MUST include a manifest.json file describing its canonical metadata.

Example

{
  "name": "omnibioai_wes_snakemake_v1",
  "display_name": "Whole Exome Sequencing Pipeline (Snakemake)",
  "category": "wes",
  "engine": "snakemake",
  "version": "1.0.0",
  "entrypoint": "workflow/Snakefile",
  "configs": ["config/inputs.json"],
  "description": "End-to-end WES pipeline including QC, trimming, alignment, and variant calling",
  "container_support": {
    "docker": true,
    "conda": true,
    "apptainer": false
  },
  "tools": [
    "trimmomatic",
    "bwa",
    "samtools",
    "gatk"
  ]
}

Contract Rules

  • Manifest is the single source of truth
  • Registry never infers metadata from file paths
  • Entry points must be explicit
  • Tool dependencies should be declared

Dependency & Execution Model (IMPORTANT UPDATE)

Modern workflow execution supports multiple isolation strategies:

Supported Execution Modes

  • Conda environments (recommended for Snakemake)
  • Docker containers (optional per-rule or engine-level)
  • Apptainer/Singularity (HPC environments)

Key Requirement

Workflows must NOT assume globally installed bioinformatics tools.

Each rule should declare its runtime environment explicitly when possible.


Relationship to the Workflow Registry

The Workflow Registry Service is the authoritative metadata index for all OmniBioAI workflows.

Separation of Responsibilities

Component Responsibility
Workflow Bundles Repo Authoring & version control
CLI Upload Tool Validation & packaging
Workflow Registry Metadata indexing & discovery
OmniObjectService Immutable bundle storage
Execution Engine Workflow materialization & runtime

Registry = metadata Object Store = immutable artifacts

The registry does not store files or paths — only object_id references.


Workflow Ingestion (CLI-first)

Bundles are uploaded via the OmniBioAI CLI.

Example Staging Structure

input/
  omnibioai_wes_snakemake_v1/
    manifest.json
    workflow/
    config/
    envs/

Upload Command

python manage.py workflow_upload \
  --bundle input/omnibioai_wes_snakemake_v1 \
  --created-by manish \
  --enable

Ingestion Process

  1. Validate bundle structure
  2. Parse manifest.json
  3. Validate entrypoint & configs
  4. Package bundle into archive
  5. Upload to OmniObjectService
  6. Receive object_id
  7. Create immutable registry entry

How Workflows Are Used at Runtime

OmniBioAI plugins do not access this repository directly.

Instead:

  1. Plugin queries Workflow Catalog Service

  2. User selects workflow + version

  3. Execution request is submitted

  4. Runtime system:

    • Resolves registry entry
    • Fetches bundle via object_id
    • Materializes workflow in execution environment

Guarantees

  • No Git dependency at runtime
  • No filesystem-based discovery
  • Fully reproducible execution
  • Complete audit trail per object_id

Design Principles

  • ID-first architecture

  • Immutable workflow artifacts

  • Engine-agnostic bundle specification

  • Metadata-driven discovery

  • CLI-first ingestion workflow

  • Strict separation of:

    • Authoring
    • Registry
    • Storage
    • Execution

Intended Audience

This repository is intended for:

  • Workflow engineers
  • Bioinformatics developers
  • OmniBioAI platform maintainers

End users interact only through the OmniBioAI UI and APIs, not directly with this repository.


What This Repository Is NOT

  • ❌ A runtime execution environment
  • ❌ A plugin system
  • ❌ A database or registry
  • ❌ A production workflow scheduler
  • ❌ A mutable shared execution workspace

One-Sentence Summary

omnibioai-workflow-bundles is a version-controlled authoring repository for engine-specific bioinformatics workflows that are packaged into immutable artifacts and registered in the OmniBioAI Workflow Registry for reproducible, metadata-driven execution across multiple workflow engines.

About

Engine-agnostic, versioned bioinformatics workflow bundles for OmniBioAI, supporting WDL, Nextflow, Snakemake, and CWL. Designed for reproducible, registry-driven pipeline plugins across genomics and multi-omics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors