Skip to content

STiFLeR7/imgshape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

153 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ–ΌοΈ imgshape

The Data-Centric AI Toolkit for Vision Engineers

Version 4.2.0 PyPI Version Python 3.8+ Downloads


"Automatically analyze any image dataset and get model-ready preprocessing recommendations in one command."


πŸš€ Live Demo (Web) β€’ πŸ“– Documentation β€’ πŸ’¬ Report Bug / Discuss


✨ What's New in v4.2 "Bento Intelligence"

  • 🍱 Bento Grid UI: A complete UX overhaul using a modular 12-column grid for high-density dataset insights.
  • 🌊 Semantic Drift 2.0: detect dataset shifts using DINOv2 vision transformer embeddings.
  • πŸš€ Atlas Bento Engine: 40% faster fingerprinting via vectorized IO and multi-stage caching.
  • 🧩 Domain Profiles: One-click configurations for Medical, Satellite, and OCR datasets.

⚑ 30-Second Start

Don't guess your dataset's health. Audit it immediately with the Atlas engine.

pip install imgshape

from imgshape import Atlas

# 1. Initialize the Atlas Orchestrator
atlas = Atlas()

# 2. Extract deterministic fingerprint
result = atlas.extract_fingerprint("./my_dataset")

# 3. View the verdict
print(result.summary())

System Output:

{
  "fingerprint_id": "fp_8a7d9f2",
  "total_images": 4502,
  "corrupt_files": 12,
  "metrics": {
    "avg_resolution": "1024x768",
    "diversity_score": 0.89,
    "channel_consistency": "FAIL"
  },
  "issues": ["Found 14 grayscale images in RGB dataset"]
}

πŸ” The Visual Dashboard (Atlas UI)

Experience imgshape's capabilities visually. The dashboard provides a real-time interface for dataset fingerprinting, augmentation previews, and pipeline configuration using the new Bento Grid layout.

imgshape Dashboard

Dashboard v4.2.0 showing Bento Grid layout and semantic drift detection.


πŸš€ Why imgshape?

Most vision models fail because of garbage dataβ€”corrupt files, mixed channels (RGBA vs RGB), or weird aspect ratios. imgshape catches these before you train using a deterministic rule engine.

Module Technical Function
πŸ” Instant Audit Multi-threaded + GPU-accelerated scan for entropy, blur, and variance using PyTorch.
🧠 Decision Engine Heuristic-based suggestion engine with Provenance IDs and Reproducibility Hashes.
πŸ“Š Semantic Drift NEW: DINOv2-powered drift analysis between dataset versions.
🍱 Bento Grid UI NEW: High-density Modular Dashboard for interactive exploration.
πŸ› οΈ Pipeline Export Generates serialization-safe code for PyTorch, TensorFlow, and Albumentations.

πŸ“¦ Installation Matrix

Choose your deployment flavor.

Command Use Case Size
pip install imgshape Core / CI/CD ~12MB
pip install "imgshape[full]" Research / Power User ~45MB
pip install "imgshape[ui]" Interactive / Dashboard ~30MB

πŸ’‘ Practical Use Cases

1. The "Sanity Check" (CI/CD Integration)

Block bad data from entering your training bucket. Ideal for GitHub Actions or Jenkins.

# Returns exit code 1 if corrupt files or schema violations are found
imgshape --check ./new_batch_v2 --strict-schema

2. The "Pipeline Builder"

Don't guess augmentation parameters. Let the entropy statistics decide.

# analyze -> recommend -> export PyTorch snippet
imgshape --path ./train_data --analyze --recommend --out transforms.py

3. The "Visual Explorer"

Verify RandomCrop or ColorJitter intensity manually before training.

# Launches local studio with auto-reload
imgshape --web --reload

πŸ—οΈ Architecture & Internal Mechanics

imgshape (Aurora Engine) operates on a Fingerprint-Analyze-Decide loop, acting as a middleware between raw storage and compute.

graph TD
    subgraph "Data Layer"
    A[Raw Images]
    end

    subgraph "imgshape Core (Atlas Bento)"
    B[Fingerprint Extractor] -->|Hash & Meta| C{Decision Engine}
    C -->|Rules v4.2| D[Recommendation]
    end

    subgraph "Integration Layer"
    D --> E[PyTorch/TF Code]
    D --> F[JSON Artifacts]
    D --> G[HTML/PDF Reports]
    end

    A --> B
Loading

Core Components

  • Atlas Bento Orchestrator: The central intent-driven API that manages the lifecycle of an analysis session.
  • Fingerprint Extractor: A stateless module that computes immutable signatures for datasets (distributions, channel counts, hashes).
  • Decision Engine: A rule-based system that maps dataset signatures + User Intent (e.g., "Speed" vs "Accuracy") to concrete preprocessing steps.

🀝 Community & Support

Built by Stifler for the AI Engineering community.

Star on GitHub ⭐ β€” it helps more people find clean data.

About

imgshape is a dataset intelligence toolkit for computer vision that generates deterministic dataset fingerprints, explainable decisions, and production-ready artifacts for reproducible ML workflows.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors