Goal
Generate synthetic IoT traffic data for controlled testing and validation.
Motivation
Why synthetic data?
- Controlled experiments: Test specific attack patterns
- Class balancing: Generate minority class samples
- Edge case testing: Create rare but important scenarios
- Privacy: Shareable data without privacy concerns
- Research contribution: Synthetic IoT traffic generation is valuable
Approach
Phase 1: Tool Selection
Options (2025):
- SDV (Synthetic Data Vault): Best for tabular data
- CTGAN: GAN-based tabular synthesis
- SMOTE: Classic oversampling (simple, works well)
Recommendation: Start with SDV and CTGAN
Phase 2: Experiments
Experiment 1: Data Augmentation (synthetic + real)
Experiment 2: Generalization (train synthetic, test real)
Experiment 3: Edge Case Testing
Experiment 4: Privacy-preserving dataset publication
Deliverables
Timeline
6 weeks total (stretch goal)
Priority
STRETCH GOAL - High research value, not critical for core modernization
Goal
Generate synthetic IoT traffic data for controlled testing and validation.
Motivation
Why synthetic data?
Approach
Phase 1: Tool Selection
Options (2025):
Recommendation: Start with SDV and CTGAN
Phase 2: Experiments
Experiment 1: Data Augmentation (synthetic + real)
Experiment 2: Generalization (train synthetic, test real)
Experiment 3: Edge Case Testing
Experiment 4: Privacy-preserving dataset publication
Deliverables
Timeline
6 weeks total (stretch goal)
Priority
STRETCH GOAL - High research value, not critical for core modernization