Skip to content

OverCV/Intel-II

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Intelligent Systems II πŸ€–

University of Caldas course repository for Intelligent Systems II.
Class taken with Prof. Jorge Alberto Jaramillo GarzΓ³n

Python Streamlit scikit-learn PyTorch

Live Demo: Bank Marketing ML Analysis


πŸ“š About This Course

Intelligent Systems II - Advanced machine learning and artificial intelligence techniques
Institution: Universidad de Caldas | Computer Engineering
Professor: Jorge Alberto Jaramillo GarzΓ³n
Academic Period: 2024-2025

Course Structure

  • 50% Partial Exams (Zero, First, Second, Third)
  • 50% Final Project (Cybersecurity Incident Prediction)

🎯 Topics Covered

Machine Learning Β· Data Science Β· Support Vector Machines Β· Neural Networks
PCA Analysis Β· Bayesian Inference Β· Ensemble Methods Β· Deep Learning
Dimensionality Reduction Β· Cross-Validation Β· Hyperparameter Tuning
Kernel Tricks Β· Gradient Boosting Β· Graph Neural Networks Β· Transformers

Technologies: python streamlit scikit-learn pytorch xgboost pandas numpy matplotlib seaborn plotly solara


πŸ“‚ Repository Structure

.jorge/
β”œβ”€β”€ partials/                 # Partial Exams (50%)
β”‚   β”œβ”€β”€ zero/                 # Demo exam (practice)
β”‚   β”œβ”€β”€ first/                # Bayesian & K-NN classifiers
β”‚   β”œβ”€β”€ second/               # SVM, ANN, PCA ⭐ COMPLETE
β”‚   └── third/                # TBD
β”‚
β”œβ”€β”€ project/                  # Final Project (50%)
β”‚   β”œβ”€β”€ Cybersecurity Incident Predictor
β”‚   └── Microsoft GUIDE Dataset Analysis
β”‚
└── notebooks/                # Class Materials
    β”œβ”€β”€ Theory: Perceptron, SVM, Kernels, ANN
    └── Weekly notes

πŸ“ Partial Exams

Demo Exam (Zero) - Practice Exercises

Location: partials/zero/
Topics: Validation, Bayesian classifiers, K-NN
Dataset: Iris

Exercises:

  1. Cross-Validation (10-fold) - Compare Bayesian vs Geometric classifiers on Iris
  2. Bootstrapping K-NN - Investigate performance vs number of neighbors
  3. Classifier Comparison - Contrast assumptions, requirements, dimensionality impact

First Exam - Fundamental ML Concepts

Location: partials/first/
Topics: Data preprocessing, model training, evaluation
Dataset: Iris classification

Completed:

  • βœ… Data preprocessing pipeline
  • βœ… Multiple classifier training
  • βœ… Performance evaluation metrics

Second Exam ⭐ - Interactive ML Dashboard

Location: partials/second/
Status: βœ… COMPLETE (2.5/2.5 points)
Live: intel-ii-exam-ii.streamlit.app

Tasks Completed:

Task 1: SVM Analysis (0.9 pts)

  • 4 kernel types: Linear, RBF, Polynomial, Sigmoid
  • Hyperparameter tuning: C, gamma, degree
  • Cross-validation with K-Fold and Train/Test
  • Experiment history tracking and comparison
  • Confusion matrix visualization
  • Best model auto-identification

Task 2: ANN Analysis (0.9 pts)

  • 12 architectures: 1-3 layers (10-100 neurons)
  • 3 activation functions: ReLU, Tanh, Logistic
  • 3 solvers: Adam, SGD, L-BFGS
  • Learning curves visualization
  • Performance comparison charts
  • Best model saver

Task 3: PCA Analysis (0.7 pts)

  • Feature analysis: correlation, distributions, Q-Q plots
  • Data exploration: 6 plots (3Γ—2D + 3Γ—3D interactive)
  • PCA transformation with variance thresholds
  • Model retraining on PCA data
  • BEFORE vs AFTER comparison
  • Automated insights and recommendations
  • Answer: "What can you conclude for YOUR dataset?"

Features:

  • 🎯 Interactive Streamlit dashboard
  • πŸ“Š Real-time experiment tracking
  • πŸ’Ύ Persistent experiment history
  • πŸ“ˆ Automated performance insights
  • 🎨 Professional visualizations
  • πŸ’‘ Smart recommendations (USE/AVOID PCA)
  • πŸ“₯ CSV export functionality

Tech Stack:
Python | Streamlit | scikit-learn | pandas | matplotlib | seaborn | plotly

Documentation: See .jorge/partials/second/README.md


Third Exam - TBD

Location: partials/third/
Status: πŸ”œ Pending


πŸš€ Final Project - Cybersecurity Incident Predictor

Location: .jorge/project/

Overview

Advanced ensemble ML platform for predicting cybersecurity incidents before they occur. Transforms reactive cybersecurity into proactive prevention for Security Operations Centers (SOCs).

Innovation

  • Predictive (not reactive): Predicts incidents 1-24 hours in advance
  • Hybrid ensemble: LSTM + GNN + XGBoost + Transformer
  • Meta-learning: Adaptive model weighting by context
  • Production-ready: Professional Solara dashboard

Architecture

Level 0: Specialized Base Models

  1. LSTM/GRU - Temporal pattern recognition

    • Learns incident sequences over time
    • Captures long-term dependencies
  2. Graph Neural Networks - Entity relationship modeling

    • Models risk propagation through network
    • 33 entity types (users, IPs, domains, etc.)
  3. XGBoost - Alert pattern classification

    • Complex decision rules
    • 9,100+ detector patterns
  4. Transformers - Evidence sequence analysis

    • Self-attention over evidence chains
    • MITRE ATT&CK technique mapping

Level 1: Meta-Ensemble

  • Adaptive weight learning by context
  • Organization-specific optimization
  • Online learning for drift adaptation

Dataset: Microsoft GUIDE

  • 13M+ evidences from real cybersecurity incidents
  • 1.6M alerts from 9,100+ unique detectors
  • 1M incidents with expert triage labels
  • 6,100+ organizations across industries
  • 441 MITRE ATT&CK techniques mapped
  • 2-week period with temporal resolution

Performance Metrics

Prediction Accuracy (4h):  94.2%
Early Warning Score:       0.89
Cost-Weighted Recall:      0.91
Alert Fatigue Score:       0.85
MTTD Reduction:            4+ hours

Business Impact

  • βœ… Prevents incidents before escalation
  • βœ… Reduces MTTD by 4+ hours
  • βœ… Optimizes analyst workload with intelligent prioritization
  • βœ… Scales across organizations with adaptive learning

Tech Stack

Python 3.10+ | Solara | PyTorch | XGBoost | scikit-learn | Microsoft GUIDE Dataset

Documentation:


πŸ““ Class Materials

Theory Notebooks (.jorge/notebooks/docs/)

4-PerceptrΓ³n-SVM.ipynb

  • Perceptron fundamentals
  • Linear classification
  • Support Vector Machine theory
  • Margin maximization

5-SVM-Kernel.ipynb

  • Kernel trick explained
  • Mapping Ο† to higher dimensions
  • RBF, Polynomial, Sigmoid kernels
  • Computational advantages: O(nΒ²) vs O(nΒ²d)
  • Parameter selection (Οƒ, degree)

Key Concepts:

  • Kernel function: K(x,y) = Ο†(x)α΅€Ο†(y) computed without explicit mapping
  • RBF kernel: Maps to infinite dimensions
  • Polynomial kernel: (xα΅€y + c)ᡈ captures interactions
  • Binomial theorem: Connects products in original/transformed space

6-ANN.ipynb

  • Neural network architectures
  • Backpropagation algorithm
  • Activation functions (ReLU, Tanh, Sigmoid)
  • Training strategies

πŸ› οΈ Quick Start

Run Second Exam Dashboard

cd .jorge/partials/second

# Install dependencies
pip install streamlit pandas numpy scikit-learn matplotlib seaborn plotly

# Launch app
streamlit run app.py

Visit: http://localhost:8501

Run Cybersecurity Project

cd .jorge/project

# Install with UV
uv sync

# Download Microsoft GUIDE dataset from Kaggle
# Extract to data/microsoft_guide/

# Run dashboard
uv run cybersec-dashboard

Visit: http://localhost:8765

View Theory Notebooks

cd .jorge/notebooks/docs
jupyter notebook

πŸ“Š Course Progress

Component Status Description Grade
Demo Exam βœ… Complete Bayesian, K-NN, Validation Practice
First Exam βœ… Complete Fundamentals, Iris dataset TBD
Second Exam βœ… Complete SVM + ANN + PCA Dashboard 2.5/2.5
Third Exam πŸ”œ Pending TBD -
Final Project βœ… Complete Cybersecurity Incident Predictor TBD

Overall Progress: 80% Complete


πŸŽ“ Learning Outcomes

By the end of this course, you will master:

Classical Machine Learning

  • βœ… Support Vector Machines with kernel methods
  • βœ… Bayesian classifiers and probabilistic inference
  • βœ… K-Nearest Neighbors algorithms
  • βœ… Cross-validation and bootstrapping
  • βœ… Hyperparameter optimization

Deep Learning

  • βœ… Artificial Neural Networks (feedforward)
  • βœ… LSTM/GRU for temporal sequences
  • βœ… Graph Neural Networks for relationships
  • βœ… Transformers and attention mechanisms

Dimensionality Reduction

  • βœ… Principal Component Analysis (PCA)
  • βœ… Feature selection and engineering
  • βœ… Variance analysis and scree plots
  • βœ… Component interpretation

Ensemble Methods

  • βœ… Random Forest, XGBoost, LightGBM
  • βœ… Gradient boosting techniques
  • βœ… Meta-ensemble with adaptive weighting
  • βœ… Model stacking strategies

Practical Skills

  • βœ… Deploy ML models in production
  • βœ… Build interactive dashboards (Streamlit, Solara)
  • βœ… Handle imbalanced datasets (SMOTE, SMOTE-ENN)
  • βœ… Evaluate with business-focused metrics
  • βœ… Make data-driven conclusions
  • βœ… Communicate technical results effectively

πŸ† Featured Work

Second Exam - Bank Marketing ML Analysis ⭐

Interactive dashboard comparing SVM, ANN, and PCA on UCI Bank Marketing dataset

Highlights:

  • 3 ML algorithms with comprehensive tuning
  • Automated experiment tracking
  • PCA impact analysis with insights
  • Smart recommendations based on results
  • Professional production deployment

Live Demo: https://intel-ii-exam-ii.streamlit.app/


Final Project - Cybersecurity Incident Predictor

Enterprise ML platform for SOC teams with 4-hour incident predictions

Innovation:

  • Hybrid ensemble (LSTM + GNN + XGBoost + Transformer)
  • Meta-learning with context adaptation
  • Microsoft GUIDE dataset (13M+ evidences)
  • Professional Solara dashboard
  • 94.2% prediction accuracy

Impact: Prevents incidents before escalation, saves millions in damages


πŸ“š Documentation Index

Exam Documentation

  • Second Exam: .jorge/partials/second/README.md
    • SVM Guide: .jorge/partials/second/ui/pages/svm/README.md
    • ANN Guide: .jorge/partials/second/ui/pages/ann/README.md
    • PCA Guide: .jorge/partials/second/ui/pages/pca/README.md
    • Deployment: .jorge/partials/second/DEPLOYMENT.md

Project Documentation

  • Overview: .jorge/project/README.md
  • Project Vision: .jorge/project/project_overview.md
  • Architecture: .jorge/project/architecture_design.md
  • Dataset: .jorge/project/dataset_guide.md
  • Metrics: .jorge/project/evaluation_metrics.md

Theory Materials

  • SVM Theory: .jorge/project/clase-03.md
  • Notebooks: .jorge/notebooks/docs/

🌟 Key Achievements

  • βœ… Deployed Production ML App - Streamlit Cloud
  • βœ… Built Professional SOC Dashboard - Solara
  • βœ… Implemented Ensemble Learning - 4 specialized models
  • βœ… Achieved 94%+ Accuracy - Real-world dataset
  • βœ… Created Comprehensive Documentation - Theory + Practice
  • βœ… Applied Advanced ML Techniques - Kernels, PCA, Meta-learning

πŸ‘¨β€πŸ’» Author

Jorge Alberto Jaramillo GarzΓ³n
Computer Engineering Student
Universidad de Caldas


πŸ™ Acknowledgments

  • Professor: Jorge Alberto Jaramillo GarzΓ³n
  • Institution: Universidad de Caldas
  • Course: Sistemas Inteligentes II (Intelligent Systems II)
  • Datasets:
    • UCI Machine Learning Repository (Bank Marketing, Iris)
    • Microsoft GUIDE (Cybersecurity Incidents)
    • CIC-IDS2017, UNSW-NB15 (Network intrusion)

πŸ“„ License

Academic project for Universidad de Caldas coursework.


Last Updated: October 12, 2025
Status: Active Development | 80% Complete

Python Streamlit scikit-learn