Skip to content

Latest commit

 

History

History
249 lines (201 loc) · 5.45 KB

File metadata and controls

249 lines (201 loc) · 5.45 KB

Example Configuration Files

Cleaning Profiles (YAML)

Safe Profile (safe.yaml)

name: safe
description: Remove sensitive fields while keeping technical metadata
remove_categories:
  - location
  - identity
  - device
  - thumbnail
keep_categories:
  - technical
remove_fields:
  - GPS*
  - Author
  - Creator
  - Artist
  - Copyright
  - Make
  - Model
  - Software
  - HostComputer
  - CameraOwnerName
  - *SerialNumber
preserve_fields:
  - Orientation
  - ColorSpace
  - ImageWidth
  - ImageHeight

Minimal Profile (minimal.yaml)

name: minimal
description: Keep only essential technical metadata
keep_categories:
  - technical
keep_fields:
  - ImageWidth
  - ImageHeight
  - Orientation
  - ColorSpace
  - XResolution
  - YResolution

Complete Profile (complete.yaml)

name: complete
description: Remove all metadata
remove_all: true

Batch Processing Config (batch_config.yaml)

# Batch processing configuration
batch:
  max_workers: 4
  recursive: true
  create_backup: true

  # File filters
  extensions:
    - .jpg
    - .jpeg
    - .png
    - .pdf
    - .docx

  # Exclude patterns
  exclude_patterns:
    - "*/test/*"
    - "*/backup/*"
    - "*/.originals/*"

  # Default profile
  default_profile: safe

  # Progress reporting
  progress:
    enabled: true
    update_interval: 10 # files

# Profile-specific settings
profiles:
  safe:
    watermark: "Metadata sanitized by MetaSanitize"
    preserve_orientation: true

  complete:
    confirm_required: true
    warning: "This will remove ALL metadata"

Automation Rules (rules.yaml)

# Rule-based automation
rules:
  # Rule 1: Clean all images in photos directory
  - name: "Clean photos"
    match:
      path: "*/photos/*"
      extensions: [.jpg, .jpeg, .png]
    action:
      operation: clean
      profile: safe
      backup: true

  # Rule 2: Strip GPS from uploads
  - name: "Remove GPS from uploads"
    match:
      path: "*/uploads/*"
      has_metadata:
        - GPS*
    action:
      operation: modify
      remove_fields:
        - GPSLatitude
        - GPSLongitude
        - GPSAltitude
        - GPSTimeStamp

  # Rule 3: Anonymize documents
  - name: "Anonymize documents"
    match:
      extensions: [.docx, .pdf]
    action:
      operation: clean
      profile: safe
      replace_author: "Anonymous"

Example Usage

CLI Examples

# Scan single file and view privacy risk
python -m metasanitize scan photo.jpg

# Scan with JSON output
python -m metasanitize scan photo.jpg --json

# Save scan report to file
python -m metasanitize scan photo.jpg -o report.txt

# Clean with safe profile (removes GPS, device info)
python -m metasanitize clean photo.jpg --profile safe

# Clean with complete profile (removes ALL metadata)
python -m metasanitize clean photo.jpg --profile complete

# Batch process directory recursively
python -m metasanitize clean ./photos -r --profile safe

# Dry run to preview changes
python -m metasanitize clean photo.jpg --dry-run

# Skip backup creation
python -m metasanitize clean photo.jpg --no-backup

# Inject dummy DSLR metadata
python -m metasanitize inject photo.jpg --device dslr

# Inject smartphone metadata with fake EU GPS
python -m metasanitize inject photo.jpg --device smartphone --gps --region EU

# Preview injection without applying
python -m metasanitize inject photo.jpg --device dslr --dry-run

# Compare metadata between two files
python -m metasanitize diff original.jpg cleaned.jpg

# Save diff to file
python -m metasanitize diff before.jpg after.jpg -o diff.txt

Python API Examples

from metasanitize.core.engine import MetadataEngine
from metasanitize.privacy.analyzer import PrivacyAnalyzer
from metasanitize.profiles.generator import DummyMetadataGenerator
from pathlib import Path

# Extract and analyze metadata
engine = MetadataEngine()
analyzer = PrivacyAnalyzer()

file_path = Path("photo.jpg")
record = engine.extract(file_path)
risk = analyzer.analyze(record)

print(f"Risk Score: {risk.score}/100 - {risk.level.upper()}")
print(f"Location Exposed: {risk.location_exposed}")
print(f"Device Fingerprint: {risk.device_fingerprint}")
print(f"Sensitive fields: {len(record.get_sensitive_fields())}")

# Clean metadata with safe profile
result = engine.clean(file_path, profile="safe", create_backup=True)
print(f"Success: {result.success}")
print(f"Backup: {result.backup_path}")

# Clean with complete profile (removes ALL metadata)
result = engine.clean(file_path, profile="complete")

# Generate and inject dummy metadata
generator = DummyMetadataGenerator()
dummy_data = generator.generate_image_metadata(
    profile='dslr',      # 'smartphone', 'dslr', or 'scanner'
    include_gps=True,
    region='EU'          # 'US', 'EU', or 'Asia'
)

# Apply dummy metadata to file
result = engine.modify(file_path, fields_to_modify=dummy_data)

# Compare two files
record1 = engine.extract(Path("original.jpg"))
record2 = engine.extract(Path("cleaned.jpg"))
diff = analyzer.compare(record1, record2)

print(f"Fields removed: {len(diff.removed_fields)}")
print(f"Risk reduction: {diff.risk_reduction} points")

# Batch processing
from metasanitize.utils.batch import BatchProcessor

processor = BatchProcessor(max_workers=4)
results = processor.process_directory(
    Path("./photos"),
    operation='clean',
    profile='safe',
    recursive=True
)

# Generate report
report = processor.generate_report(results)
print(f"Processed {report['total_files']} files")
print(f"Success rate: {report['success_rate']:.1f}%")