diff --git a/ABOUT.md b/ABOUT.md
index a739b9d..68ee428 100644
--- a/ABOUT.md
+++ b/ABOUT.md
@@ -155,7 +155,7 @@ Both exist simultaneously, creating a living curriculum.
 ### Near-term (2026)
 - ✅ Launch master repository (this!)
 - ✅ Complete Foundation Track (5 chapters — all available!)
-- 🔄 Release Practitioner Track (2 of 10 chapters available)
+- 🔄 Release Practitioner Track (3 of 10 chapters available)
 - 🔄 Establish community request process
 - 🔄 Build 100+ community-contributed chapters
 
@@ -178,11 +178,11 @@ Both exist simultaneously, creating a living curriculum.
 ## 📊 By The Numbers
 
 **Current State:**
-- 7 chapters available (Foundation complete + Practitioner started)
+- 8 chapters available (Foundation complete + Practitioner started)
 - 21 Jupyter notebooks with interactive content
 - 21 professional SVG diagrams
 - 37 exercises with solutions
-- 56 hours of learning content available
+- 64 hours of learning content available
 - 5 practice datasets
 - 25+ total chapters planned
 - $0 barrier to entry
diff --git a/GITHUB_PROFILE_README.md b/GITHUB_PROFILE_README.md
index b10f849..66f93ad 100644
--- a/GITHUB_PROFILE_README.md
+++ b/GITHUB_PROFILE_README.md
@@ -10,7 +10,7 @@
 
 **[Berta AI](https://berta.one)** — AI-powered tools for tomorrow's world
 
-- **[Berta Chapters](https://github.com/luigipascal/berta-chapters)** — Free, open-source AI curriculum. 7 chapters live, 25 planned. Learn Python to production ML through interactive notebooks, exercises, and an online playground. No paywall, no signup.
+- **[Berta Chapters](https://github.com/luigipascal/berta-chapters)** — Free, open-source AI curriculum. 8 chapters live, 25 planned. Learn Python to production ML through interactive notebooks, exercises, and an online playground. No paywall, no signup.
 - **[LLM Cost Optimizer](https://llm.berta.one)** — Cut LLM API costs 80-95% while keeping data private. Local processing, text anonymization, automatic model routing.
 - **OrbaOS** — A framework for post-project work. AI handles coordination so teams focus on strategy and creative output.
 
diff --git a/README.md b/README.md
index a6b150f..59d88a5 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ Apply what you've learned to real-world machine learning and AI problems.
 |---------|-------|------|--------|
 | 6 | [Introduction to Machine Learning](./chapters/chapter-06-intro-machine-learning/) | 8h | ✅ Available |
 | 7 | [Supervised Learning: Regression & Classification](./chapters/chapter-07-supervised-learning/) | 10h | ✅ Available |
-| 8 | Unsupervised Learning: Clustering & Dimensionality Reduction | 8h | 🔄 Coming Soon |
+| 8 | [Unsupervised Learning: Clustering & Dimensionality Reduction](./chapters/chapter-08-unsupervised-learning/) | 8h | ✅ Available |
 | 9 | Deep Learning Fundamentals | 12h | 🔄 Coming Soon |
 | 10 | Natural Language Processing Basics | 10h | 🔄 Coming Soon |
 | 11 | Large Language Models & Transformers | 10h | 🔄 Coming Soon |
@@ -268,7 +268,7 @@ pie title Curriculum Breakdown
     "Community Requested" : 999
 ```
 
-- **Chapters Available Now**: 7 (56 hours of content)
+- **Chapters Available Now**: 8 (64 hours of content)
 - **Total Planned Chapters**: 25+
 - **Jupyter Notebooks**: 21 interactive notebooks
 - **SVG Diagrams**: 21 professional diagrams
diff --git a/ROADMAP.md b/ROADMAP.md
index 8f05088..a52b19c 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8,11 +8,11 @@ Our vision for the future of AI education. This is a living document—prioritie
 
 **Master Repository**: ✅ Live  
 **Foundation Track**: ✅ Complete (5 chapters available)  
-**Practitioner Track**: 🔄 In progress (2 of 10 chapters available)  
+**Practitioner Track**: 🔄 In progress (3 of 10 chapters available)  
 **Advanced Track**: 📋 Planned (10 chapters)  
 **Community Requests**: 🚀 Starting (unlimited)  
 **Total Planned**: 25+ chapters, 500+ hours of content  
-**Currently Available**: 7 chapters, 56 hours of content, 21 SVG diagrams
+**Currently Available**: 8 chapters, 64 hours of content, 24 SVG diagrams
 
 ---
 
@@ -21,7 +21,7 @@ Our vision for the future of AI education. This is a living document—prioritie
 ### Objectives
 - ✅ Establish master repository (DONE)
 - ✅ Complete Foundation Track (DONE)
-- ✅ Begin Practitioner Track (Ch 6-7 available)
+- ✅ Begin Practitioner Track (Ch 6-8 available)
 - 🔄 Establish community request process
 - 🔄 Build first 100 community chapters
 - ✅ Create core infrastructure and documentation (DONE)
@@ -37,11 +37,11 @@ Our vision for the future of AI education. This is a living document—prioritie
 - One new chapter released per week
 - New chapters unlock after reaching **10 newsletter subscribers**
 - ✅ Foundation Track complete (Chapters 1-5)
-- ✅ Practitioner Track started (Chapters 6-7)
+- ✅ Practitioner Track started (Chapters 6-8)
 
 ### Metrics to Track
 - Newsletter subscribers (target: 10 to unlock weekly releases)
-- Chapters completed: 7 / 25
+- Chapters completed: 8 / 25
 - Community requests received
 - Stars on master repo
 
@@ -59,7 +59,7 @@ Our vision for the future of AI education. This is a living document—prioritie
 ### Practitioner Track Chapters
 - [x] Chapter 6: Introduction to Machine Learning
 - [x] Chapter 7: Supervised Learning (Regression & Classification)
-- [ ] Chapter 8: Unsupervised Learning
+- [x] Chapter 8: Unsupervised Learning
 - [ ] Chapter 9: Deep Learning Fundamentals
 - [ ] Chapter 10: Natural Language Processing Basics
 - [ ] Chapter 11: Large Language Models & Transformers
diff --git a/SYLLABUS.md b/SYLLABUS.md
index 04493d3..0601119 100644
--- a/SYLLABUS.md
+++ b/SYLLABUS.md
@@ -16,7 +16,7 @@ graph TD
 
     CH6["Ch 6: Intro to ML<br/>8h | Available"]
     CH7["Ch 7: Supervised Learning<br/>10h | Available"]
-    CH8["Ch 8: Unsupervised Learning<br/>8h | Coming Soon"]
+    CH8["Ch 8: Unsupervised Learning<br/>8h | Available"]
     CH9["Ch 9: Deep Learning<br/>12h | Coming Soon"]
     CH10["Ch 10: NLP Basics<br/>10h | Coming Soon"]
     CH11["Ch 11: LLMs & Transformers<br/>10h | Coming Soon"]
@@ -56,7 +56,7 @@ graph TD
     style CH5 fill:#4caf50,color:#fff
     style CH6 fill:#4caf50,color:#fff
     style CH7 fill:#4caf50,color:#fff
-    style CH8 fill:#f3e5f5
+    style CH8 fill:#4caf50,color:#fff
     style CH9 fill:#f3e5f5
     style CH10 fill:#f3e5f5
     style CH11 fill:#f3e5f5
@@ -66,7 +66,7 @@ graph TD
     style CH15 fill:#f3e5f5
 ```
 
-**Legend**: Green = Available | Purple = Practitioner (Coming Soon) | Chapters 1-7 fully available with SVG diagrams
+**Legend**: Green = Available | Purple = Practitioner (Coming Soon) | Chapters 1-8 fully available with SVG diagrams
 
 ---
 
@@ -81,7 +81,7 @@ graph TD
 | 5 | [Software Design & Best Practices](./chapters/chapter-05-software-design/) | Foundation | 6h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
 | 6 | [Introduction to Machine Learning](./chapters/chapter-06-intro-machine-learning/) | Practitioner | 8h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
 | 7 | [Supervised Learning](./chapters/chapter-07-supervised-learning/) | Practitioner | 10h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
-| 8 | Unsupervised Learning | Practitioner | 8h | Planned | - |
+| 8 | [Unsupervised Learning](./chapters/chapter-08-unsupervised-learning/) | Practitioner | 8h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
 | 9 | Deep Learning Fundamentals | Practitioner | 12h | Planned | - |
 | 10 | Natural Language Processing | Practitioner | 10h | Planned | - |
 | 11 | LLMs & Transformers | Practitioner | 10h | Planned | - |
diff --git a/chapters/chapter-08-unsupervised-learning/README.md b/chapters/chapter-08-unsupervised-learning/README.md
new file mode 100644
index 0000000..0326ae6
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/README.md
@@ -0,0 +1,61 @@
+# Chapter 8: Unsupervised Learning
+
+**Track**: Practitioner | **Time**: 8 hours | **Prerequisites**: Chapters 1-6
+
+---
+
+## Learning Objectives
+
+By the end of this chapter, you will be able to:
+
+- Understand the difference between supervised and unsupervised learning
+- Implement K-Means clustering from scratch using NumPy
+- Apply hierarchical (agglomerative) clustering and interpret dendrograms
+- Use DBSCAN for density-based clustering with automatic cluster count detection
+- Evaluate clusters with the silhouette score, inertia, and the elbow method
+- Apply Principal Component Analysis (PCA) for dimensionality reduction
+- Implement t-SNE for 2D visualization of high-dimensional data
+- Perform anomaly detection with Isolation Forest and statistical methods
+- Build a complete customer segmentation pipeline end-to-end
+
+---
+
+## Chapter Structure
+
+```
+chapter-08-unsupervised-learning/
+├── README.md
+├── requirements.txt
+├── notebooks/
+│   ├── 01_introduction.ipynb     # K-Means, evaluation metrics, elbow method
+│   ├── 02_intermediate.ipynb     # Hierarchical, DBSCAN, Gaussian Mixture Models
+│   └── 03_advanced.ipynb         # PCA, t-SNE, anomaly detection, customer segmentation capstone
+├── scripts/
+│   ├── unsupervised_toolkit.py   # KMeansScratch, PCA, plotting utilities
+│   └── utilities.py              # Helper functions
+├── exercises/
+│   ├── exercises.py              # 5 exercises
+│   └── solutions/
+│       └── solutions.py          # Complete solutions
+├── assets/diagrams/
+│   ├── clustering_algorithms.svg # K-Means, Hierarchical, DBSCAN comparison
+│   ├── dimensionality_reduction.svg # PCA and t-SNE visual
+│   └── anomaly_detection.svg     # Normal vs anomalous points
+├── datasets/
+│   ├── customers.csv             # Synthetic customer data (300+ rows)
+│   └── sensors.csv               # Synthetic sensor data with anomalies (200+ rows)
+```
+
+## Time Estimate
+
+| Section | Time |
+|---------|------|
+| Notebook 01: Introduction (Clustering Basics) | 2.5 hours |
+| Notebook 02: Intermediate (Advanced Clustering) | 2.5 hours |
+| Notebook 03: Advanced (Dimensionality Reduction & Capstone) | 3 hours |
+| Exercises | Included in notebooks |
+| **Total** | **8 hours** |
+
+---
+
+**Generated by Berta AI | Created by Luigi Pascal Rondanini**
diff --git a/chapters/chapter-08-unsupervised-learning/assets/diagrams/anomaly_detection.svg b/chapters/chapter-08-unsupervised-learning/assets/diagrams/anomaly_detection.svg
new file mode 100644
index 0000000..92452f7
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/assets/diagrams/anomaly_detection.svg
@@ -0,0 +1,90 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+  </style>
+
+  <!-- Panel 1: Normal Distribution -->
+  <rect x="10" y="10" width="270" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="145" y="35" text-anchor="middle" class="title" fill="#4a90d9">Statistical (Z-Score)</text>
+  <line x1="40" y1="240" x2="250" y2="240" class="axis"/>
+  <line x1="40" y1="240" x2="40" y2="60" class="axis"/>
+  <!-- Bell curve -->
+  <path d="M50,235 Q80,230 100,200 Q120,140 145,80 Q170,140 190,200 Q210,230 240,235" fill="none" stroke="#4a90d9" stroke-width="2.5"/>
+  <!-- Threshold lines -->
+  <line x1="75" y1="60" x2="75" y2="240" stroke="#c0392b" stroke-width="1.5" stroke-dasharray="4,3"/>
+  <line x1="215" y1="60" x2="215" y2="240" stroke="#c0392b" stroke-width="1.5" stroke-dasharray="4,3"/>
+  <text x="75" y="55" text-anchor="middle" style="font-size:9px" fill="#c0392b">-3 sigma</text>
+  <text x="215" y="55" text-anchor="middle" style="font-size:9px" fill="#c0392b">+3 sigma</text>
+  <!-- Normal points -->
+  <circle cx="120" cy="225" r="3" fill="#4a90d9"/><circle cx="140" cy="225" r="3" fill="#4a90d9"/>
+  <circle cx="155" cy="225" r="3" fill="#4a90d9"/><circle cx="170" cy="225" r="3" fill="#4a90d9"/>
+  <circle cx="130" cy="225" r="3" fill="#4a90d9"/><circle cx="160" cy="225" r="3" fill="#4a90d9"/>
+  <!-- Anomaly points -->
+  <circle cx="55" cy="225" r="5" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <circle cx="240" cy="225" r="5" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <text x="55" y="218" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">!</text>
+  <text x="240" y="218" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">!</text>
+  <text x="145" y="265" text-anchor="middle" class="label">Points beyond threshold</text>
+  <text x="145" y="280" text-anchor="middle" class="label" fill="#4a90d9">Simple, fast, assumes normal</text>
+
+  <!-- Panel 2: Isolation Forest -->
+  <rect x="300" y="10" width="270" height="280" rx="8" fill="#f0fff0" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="435" y="35" text-anchor="middle" class="title" fill="#27ae60">Isolation Forest</text>
+  <!-- Normal cluster -->
+  <circle cx="380" cy="160" r="3" fill="#27ae60"/><circle cx="395" cy="150" r="3" fill="#27ae60"/>
+  <circle cx="390" cy="170" r="3" fill="#27ae60"/><circle cx="405" cy="155" r="3" fill="#27ae60"/>
+  <circle cx="385" cy="165" r="3" fill="#27ae60"/><circle cx="400" cy="145" r="3" fill="#27ae60"/>
+  <circle cx="375" cy="155" r="3" fill="#27ae60"/><circle cx="410" cy="165" r="3" fill="#27ae60"/>
+  <circle cx="398" cy="160" r="3" fill="#27ae60"/><circle cx="388" cy="148" r="3" fill="#27ae60"/>
+  <circle cx="402" cy="172" r="3" fill="#27ae60"/><circle cx="392" cy="158" r="3" fill="#27ae60"/>
+  <!-- Anomaly (isolated) -->
+  <circle cx="500" cy="80" r="6" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <text x="500" y="72" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">anomaly</text>
+  <!-- Tree splits -->
+  <line x1="350" y1="100" x2="350" y2="230" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <line x1="350" y1="130" x2="530" y2="130" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <text x="480" y="115" style="font-size:9px" fill="#27ae60">Split 1</text>
+  <text x="340" y="115" style="font-size:9px" fill="#27ae60">Split 2</text>
+  <!-- Short path annotation -->
+  <path d="M500,90 L500,100 L505,100" fill="none" stroke="#c0392b" stroke-width="1.5"/>
+  <text x="515" y="104" style="font-size:9px" fill="#c0392b">short path</text>
+  <!-- Long path annotation -->
+  <path d="M392,175 L392,190 L385,190 L385,200 L390,200" fill="none" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="400" y="205" style="font-size:9px" fill="#27ae60">long path</text>
+  <text x="435" y="265" text-anchor="middle" class="label">Anomalies isolated quickly</text>
+  <text x="435" y="280" text-anchor="middle" class="label" fill="#27ae60">Works with any distribution</text>
+
+  <!-- Panel 3: Applications -->
+  <rect x="590" y="10" width="270" height="280" rx="8" fill="#fff7f0" stroke="#e67e22" stroke-width="1.5"/>
+  <text x="725" y="35" text-anchor="middle" class="title" fill="#e67e22">Applications</text>
+  <!-- Fraud icon -->
+  <rect x="620" y="55" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="75" text-anchor="middle" style="font-size:10px" fill="#e67e22">$</text>
+  <text x="670" y="75" class="label" fill="#333">Fraud Detection</text>
+  <text x="670" y="88" style="font-size:9px" fill="#888">Unusual transactions</text>
+  <!-- Defect icon -->
+  <rect x="620" y="105" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="124" text-anchor="middle" style="font-size:14px" fill="#e67e22">!</text>
+  <text x="670" y="122" class="label" fill="#333">Manufacturing QA</text>
+  <text x="670" y="135" style="font-size:9px" fill="#888">Defective products</text>
+  <!-- Network icon -->
+  <rect x="620" y="155" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <circle cx="633" cy="170" r="4" fill="none" stroke="#e67e22" stroke-width="1.5"/>
+  <circle cx="647" cy="170" r="4" fill="none" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="637" y1="170" x2="643" y2="170" stroke="#e67e22" stroke-width="1"/>
+  <text x="670" y="172" class="label" fill="#333">Network Intrusion</text>
+  <text x="670" y="185" style="font-size:9px" fill="#888">Unusual traffic patterns</text>
+  <!-- Health icon -->
+  <rect x="620" y="205" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="225" text-anchor="middle" style="font-size:12px" fill="#e67e22">+</text>
+  <text x="670" y="222" class="label" fill="#333">Health Monitoring</text>
+  <text x="670" y="235" style="font-size:9px" fill="#888">Abnormal sensor readings</text>
+  <!-- Sensor icon -->
+  <rect x="620" y="250" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="268" text-anchor="middle" style="font-size:9px" fill="#e67e22">IoT</text>
+  <text x="670" y="268" class="label" fill="#333">Predictive Maintenance</text>
+  <text x="670" y="281" style="font-size:9px" fill="#888">Equipment failure warnings</text>
+</svg>
diff --git a/chapters/chapter-08-unsupervised-learning/assets/diagrams/clustering_algorithms.svg b/chapters/chapter-08-unsupervised-learning/assets/diagrams/clustering_algorithms.svg
new file mode 100644
index 0000000..f17f560
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/assets/diagrams/clustering_algorithms.svg
@@ -0,0 +1,92 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+  </style>
+
+  <!-- Panel 1: K-Means -->
+  <rect x="10" y="10" width="270" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="145" y="35" text-anchor="middle" class="title" fill="#4a90d9">K-Means</text>
+  <line x1="40" y1="250" x2="250" y2="250" class="axis"/>
+  <line x1="40" y1="250" x2="40" y2="60" class="axis"/>
+  <!-- Cluster 1 (blue) -->
+  <circle cx="80" cy="100" r="3" fill="#4a90d9"/><circle cx="90" cy="110" r="3" fill="#4a90d9"/>
+  <circle cx="75" cy="115" r="3" fill="#4a90d9"/><circle cx="95" cy="95" r="3" fill="#4a90d9"/>
+  <circle cx="85" cy="105" r="3" fill="#4a90d9"/><circle cx="70" cy="108" r="3" fill="#4a90d9"/>
+  <!-- Centroid 1 -->
+  <circle cx="83" cy="105" r="6" fill="none" stroke="#4a90d9" stroke-width="2"/>
+  <line x1="79" y1="105" x2="87" y2="105" stroke="#4a90d9" stroke-width="2"/>
+  <line x1="83" y1="101" x2="83" y2="109" stroke="#4a90d9" stroke-width="2"/>
+  <!-- Cluster 2 (orange) -->
+  <circle cx="180" cy="180" r="3" fill="#e67e22"/><circle cx="190" cy="170" r="3" fill="#e67e22"/>
+  <circle cx="175" cy="190" r="3" fill="#e67e22"/><circle cx="195" cy="185" r="3" fill="#e67e22"/>
+  <circle cx="185" cy="175" r="3" fill="#e67e22"/><circle cx="170" cy="178" r="3" fill="#e67e22"/>
+  <!-- Centroid 2 -->
+  <circle cx="183" cy="180" r="6" fill="none" stroke="#e67e22" stroke-width="2"/>
+  <line x1="179" y1="180" x2="187" y2="180" stroke="#e67e22" stroke-width="2"/>
+  <line x1="183" y1="176" x2="183" y2="184" stroke="#e67e22" stroke-width="2"/>
+  <!-- Cluster 3 (green) -->
+  <circle cx="200" cy="90" r="3" fill="#27ae60"/><circle cx="210" cy="100" r="3" fill="#27ae60"/>
+  <circle cx="220" cy="85" r="3" fill="#27ae60"/><circle cx="215" cy="95" r="3" fill="#27ae60"/>
+  <circle cx="205" cy="88" r="3" fill="#27ae60"/>
+  <!-- Centroid 3 -->
+  <circle cx="210" cy="92" r="6" fill="none" stroke="#27ae60" stroke-width="2"/>
+  <line x1="206" y1="92" x2="214" y2="92" stroke="#27ae60" stroke-width="2"/>
+  <line x1="210" y1="88" x2="210" y2="96" stroke="#27ae60" stroke-width="2"/>
+  <text x="145" y="265" text-anchor="middle" class="label">Spherical clusters, fixed K</text>
+  <text x="145" y="280" text-anchor="middle" class="label" fill="#4a90d9">Assigns to nearest centroid</text>
+
+  <!-- Panel 2: Hierarchical -->
+  <rect x="300" y="10" width="270" height="280" rx="8" fill="#fff7f0" stroke="#e67e22" stroke-width="1.5"/>
+  <text x="435" y="35" text-anchor="middle" class="title" fill="#e67e22">Hierarchical</text>
+  <!-- Dendrogram -->
+  <line x1="340" y1="230" x2="340" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="370" y1="230" x2="370" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="340" y1="180" x2="370" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="355" y1="180" x2="355" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="430" y1="230" x2="430" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="460" y1="230" x2="460" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="430" y1="160" x2="460" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="445" y1="160" x2="445" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="355" y1="130" x2="445" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="400" y1="130" x2="400" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="510" y1="230" x2="510" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="400" y1="80" x2="510" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <!-- Cut line -->
+  <line x1="320" y1="140" x2="540" y2="140" stroke="red" stroke-width="1.5" stroke-dasharray="5,3"/>
+  <text x="530" y="136" class="label" fill="red">cut</text>
+  <!-- Labels -->
+  <text x="340" y="245" text-anchor="middle" style="font-size:10px" fill="#555">A</text>
+  <text x="370" y="245" text-anchor="middle" style="font-size:10px" fill="#555">B</text>
+  <text x="430" y="245" text-anchor="middle" style="font-size:10px" fill="#555">C</text>
+  <text x="460" y="245" text-anchor="middle" style="font-size:10px" fill="#555">D</text>
+  <text x="510" y="245" text-anchor="middle" style="font-size:10px" fill="#555">E</text>
+  <text x="435" y="265" text-anchor="middle" class="label">Dendrogram, cut to get K</text>
+  <text x="435" y="280" text-anchor="middle" class="label" fill="#e67e22">Bottom-up merging</text>
+
+  <!-- Panel 3: DBSCAN -->
+  <rect x="590" y="10" width="270" height="280" rx="8" fill="#f0fff0" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="725" y="35" text-anchor="middle" class="title" fill="#27ae60">DBSCAN</text>
+  <line x1="620" y1="250" x2="830" y2="250" class="axis"/>
+  <line x1="620" y1="250" x2="620" y2="60" class="axis"/>
+  <!-- Moon shape 1 (green) -->
+  <circle cx="660" cy="150" r="3" fill="#27ae60"/><circle cx="670" cy="130" r="3" fill="#27ae60"/>
+  <circle cx="690" cy="115" r="3" fill="#27ae60"/><circle cx="710" cy="110" r="3" fill="#27ae60"/>
+  <circle cx="730" cy="115" r="3" fill="#27ae60"/><circle cx="750" cy="130" r="3" fill="#27ae60"/>
+  <circle cx="760" cy="150" r="3" fill="#27ae60"/>
+  <!-- Moon shape 2 (purple) -->
+  <circle cx="680" cy="170" r="3" fill="#8e44ad"/><circle cx="690" cy="190" r="3" fill="#8e44ad"/>
+  <circle cx="710" cy="200" r="3" fill="#8e44ad"/><circle cx="730" cy="200" r="3" fill="#8e44ad"/>
+  <circle cx="750" cy="190" r="3" fill="#8e44ad"/><circle cx="760" cy="170" r="3" fill="#8e44ad"/>
+  <!-- Noise points -->
+  <circle cx="650" cy="80" r="3" fill="#999"/><circle cx="800" cy="220" r="3" fill="#999"/>
+  <text x="650" y="73" style="font-size:9px" fill="#999">noise</text>
+  <text x="800" y="213" style="font-size:9px" fill="#999">noise</text>
+  <!-- eps radius example -->
+  <circle cx="710" cy="110" r="18" fill="none" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <text x="732" y="100" style="font-size:9px" fill="#27ae60">eps</text>
+  <text x="725" y="265" text-anchor="middle" class="label">Arbitrary shapes, auto K</text>
+  <text x="725" y="280" text-anchor="middle" class="label" fill="#27ae60">Density-based, detects noise</text>
+</svg>
diff --git a/chapters/chapter-08-unsupervised-learning/assets/diagrams/dimensionality_reduction.svg b/chapters/chapter-08-unsupervised-learning/assets/diagrams/dimensionality_reduction.svg
new file mode 100644
index 0000000..7f4b92b
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/assets/diagrams/dimensionality_reduction.svg
@@ -0,0 +1,81 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+    .arrow { stroke: #333; stroke-width: 1.5; fill: #333; }
+  </style>
+
+  <!-- Panel 1: High-dimensional data -->
+  <rect x="10" y="10" width="250" height="280" rx="8" fill="#f5f5f5" stroke="#666" stroke-width="1.5"/>
+  <text x="135" y="35" text-anchor="middle" class="title" fill="#666">High-Dimensional Data</text>
+  <!-- Feature matrix -->
+  <rect x="40" y="55" width="190" height="120" rx="4" fill="#fff" stroke="#ccc" stroke-width="1"/>
+  <text x="50" y="75" style="font-size:10px; font-family:monospace" fill="#333">f1  f2  f3  f4  ...  fN</text>
+  <line x1="40" y1="80" x2="230" y2="80" stroke="#ccc" stroke-width="0.5"/>
+  <text x="50" y="95" style="font-size:10px; font-family:monospace" fill="#888">2.1 0.3 1.7 4.2 ... 0.9</text>
+  <text x="50" y="110" style="font-size:10px; font-family:monospace" fill="#888">1.5 2.8 0.4 3.1 ... 1.2</text>
+  <text x="50" y="125" style="font-size:10px; font-family:monospace" fill="#888">3.2 1.1 2.9 0.8 ... 2.4</text>
+  <text x="135" y="145" text-anchor="middle" style="font-size:10px" fill="#aaa">... n rows x d features ...</text>
+  <text x="135" y="165" style="font-size:10px" fill="#aaa" text-anchor="middle">d = 50, 100, 1000+</text>
+  <!-- Problems -->
+  <text x="135" y="200" text-anchor="middle" class="label" fill="#c0392b">Curse of dimensionality</text>
+  <text x="135" y="216" text-anchor="middle" class="label" fill="#c0392b">Hard to visualize</text>
+  <text x="135" y="232" text-anchor="middle" class="label" fill="#c0392b">Noisy, redundant features</text>
+  <text x="135" y="248" text-anchor="middle" class="label" fill="#c0392b">Slow computation</text>
+
+  <!-- Arrow -->
+  <polygon points="270,150 290,140 290,148 330,148 330,152 290,152 290,160" fill="#333"/>
+  <text x="300" y="140" text-anchor="middle" style="font-size:10px; font-weight:bold" fill="#333">Reduce</text>
+
+  <!-- Panel 2: PCA -->
+  <rect x="340" y="10" width="250" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="465" y="35" text-anchor="middle" class="title" fill="#4a90d9">PCA (Linear)</text>
+  <line x1="370" y1="240" x2="560" y2="240" class="axis"/>
+  <line x1="370" y1="240" x2="370" y2="60" class="axis"/>
+  <text x="465" y="258" text-anchor="middle" style="font-size:10px" fill="#555">PC1</text>
+  <text x="362" y="150" text-anchor="middle" style="font-size:10px" fill="#555" transform="rotate(-90, 362, 150)">PC2</text>
+  <!-- Elongated cluster along PC1 -->
+  <circle cx="390" cy="160" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="410" cy="150" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="425" cy="145" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="440" cy="155" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="460" cy="140" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="475" cy="148" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="490" cy="135" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="505" cy="142" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="520" cy="130" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="535" cy="138" r="3" fill="#4a90d9" opacity="0.7"/>
+  <!-- PC1 direction arrow -->
+  <line x1="385" y1="165" x2="545" y2="125" stroke="#4a90d9" stroke-width="2" stroke-dasharray="5,3"/>
+  <!-- Variance bar -->
+  <rect x="380" y="200" width="140" height="12" rx="2" fill="#4a90d9" opacity="0.6"/>
+  <rect x="380" y="216" width="40" height="12" rx="2" fill="#4a90d9" opacity="0.3"/>
+  <text x="530" y="210" style="font-size:9px" fill="#4a90d9">PC1: 72%</text>
+  <text x="430" y="226" style="font-size:9px" fill="#4a90d9">PC2: 18%</text>
+  <text x="465" y="272" text-anchor="middle" class="label" fill="#4a90d9">Max variance directions</text>
+  <text x="465" y="285" text-anchor="middle" class="label">Global structure preserved</text>
+
+  <!-- Panel 3: t-SNE -->
+  <rect x="610" y="10" width="250" height="280" rx="8" fill="#fff0f5" stroke="#8e44ad" stroke-width="1.5"/>
+  <text x="735" y="35" text-anchor="middle" class="title" fill="#8e44ad">t-SNE (Nonlinear)</text>
+  <line x1="640" y1="240" x2="830" y2="240" class="axis"/>
+  <line x1="640" y1="240" x2="640" y2="60" class="axis"/>
+  <!-- Well-separated clusters -->
+  <!-- Cluster A -->
+  <circle cx="680" cy="100" r="3" fill="#e74c3c"/><circle cx="690" cy="90" r="3" fill="#e74c3c"/>
+  <circle cx="675" cy="95" r="3" fill="#e74c3c"/><circle cx="695" cy="105" r="3" fill="#e74c3c"/>
+  <circle cx="685" cy="98" r="3" fill="#e74c3c"/><circle cx="670" cy="88" r="3" fill="#e74c3c"/>
+  <!-- Cluster B -->
+  <circle cx="770" cy="90" r="3" fill="#27ae60"/><circle cx="780" cy="100" r="3" fill="#27ae60"/>
+  <circle cx="775" cy="85" r="3" fill="#27ae60"/><circle cx="785" cy="95" r="3" fill="#27ae60"/>
+  <circle cx="765" cy="92" r="3" fill="#27ae60"/>
+  <!-- Cluster C -->
+  <circle cx="720" cy="190" r="3" fill="#3498db"/><circle cx="730" cy="180" r="3" fill="#3498db"/>
+  <circle cx="715" cy="185" r="3" fill="#3498db"/><circle cx="735" cy="195" r="3" fill="#3498db"/>
+  <circle cx="725" cy="188" r="3" fill="#3498db"/><circle cx="740" cy="185" r="3" fill="#3498db"/>
+  <circle cx="710" cy="192" r="3" fill="#3498db"/>
+  <text x="735" y="265" text-anchor="middle" class="label" fill="#8e44ad">Preserves local neighborhoods</text>
+  <text x="735" y="280" text-anchor="middle" class="label">Best for visualization</text>
+</svg>
diff --git a/chapters/chapter-08-unsupervised-learning/datasets/customers.csv b/chapters/chapter-08-unsupervised-learning/datasets/customers.csv
new file mode 100644
index 0000000..889bba4
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/datasets/customers.csv
@@ -0,0 +1,301 @@
+age,income,spending_score,visits,online_ratio
+34,54766,39,9,0.54
+49,111074,67,9,0.25
+49,95903,27,1,0.21
+51,121394,41,4,0.38
+18,28010,72,16,1.0
+25,42858,84,13,0.73
+21,33927,54,8,0.96
+52,103491,78,10,0.93
+49,69579,40,0,0.39
+55,73860,68,3,0.23
+27,31448,57,15,0.78
+46,106849,69,9,0.37
+43,66166,93,12,0.67
+23,24789,58,15,0.91
+22,25771,64,16,0.59
+21,34690,48,10,0.8
+37,57274,47,4,0.2
+30,67012,30,6,0.51
+35,22561,64,5,0.59
+29,43617,92,13,0.84
+44,70128,46,5,0.28
+30,50813,85,7,0.77
+18,34524,60,10,0.68
+32,71483,70,5,0.63
+31,53350,61,7,0.26
+53,93115,32,0,0.29
+27,40318,72,9,0.79
+32,70778,36,6,0.46
+51,92763,37,5,0.29
+24,20205,56,8,0.9
+62,85856,30,4,0.3
+19,23195,55,11,0.67
+23,25355,76,10,0.87
+69,158343,80,8,0.62
+58,106604,82,4,0.73
+50,77316,90,7,0.78
+21,56478,28,8,0.43
+25,33556,79,15,0.83
+48,94480,37,3,0.6
+46,75340,49,5,0.33
+30,28927,73,7,0.86
+33,57335,73,3,0.49
+20,23720,54,9,0.82
+42,156212,82,10,0.5
+25,29032,75,15,0.8
+51,98058,41,4,0.46
+24,49848,56,11,0.59
+43,46690,52,4,0.42
+37,38124,53,6,0.41
+28,27017,65,12,0.81
+54,128594,99,2,0.9
+58,45557,48,4,0.31
+35,120819,80,16,0.84
+54,101037,67,7,0.51
+54,72377,36,3,0.45
+42,96894,70,6,0.61
+26,17794,81,12,0.81
+21,40100,79,15,0.73
+23,25585,54,14,0.62
+31,56380,57,6,0.52
+38,53078,50,9,0.52
+29,31476,67,4,0.97
+29,117914,92,7,0.73
+31,45976,50,4,0.62
+29,46492,86,17,0.89
+28,45447,35,1,0.61
+29,38192,79,13,0.72
+58,121733,77,10,0.42
+20,36226,75,9,0.9
+40,124420,72,7,0.51
+50,66955,42,0,0.31
+28,29973,85,8,0.81
+24,28638,73,18,0.84
+45,153511,72,3,0.73
+39,66490,45,4,0.42
+55,89575,43,3,0.34
+55,75070,57,4,0.21
+23,41820,93,9,0.81
+39,49203,48,5,0.42
+30,28450,92,12,1.0
+37,91936,78,9,0.63
+62,64967,29,1,0.35
+52,51777,42,6,0.39
+25,29026,38,11,0.85
+35,29178,31,11,0.44
+40,46370,16,4,0.2
+46,41564,51,3,0.37
+51,77134,63,1,0.07
+33,146182,66,7,0.64
+38,55922,44,8,0.32
+20,28188,54,10,0.77
+44,76056,42,7,0.23
+59,80336,51,5,0.25
+37,88206,44,5,0.15
+45,62287,39,0,0.04
+34,71931,55,7,0.5
+35,53816,43,6,0.38
+42,55226,50,6,0.23
+24,31127,88,14,0.84
+20,15852,95,18,0.88
+25,32585,78,15,0.84
+32,88877,45,1,0.28
+27,47650,50,1,0.6
+23,59338,55,2,0.41
+61,116713,76,6,0.73
+23,25738,75,6,0.87
+25,54601,38,10,0.55
+45,57987,57,4,0.31
+20,29779,76,13,0.74
+26,44178,76,16,0.85
+26,43290,70,9,0.9
+22,26343,78,2,0.76
+38,107425,71,11,0.62
+32,126587,68,9,0.54
+37,55986,37,2,0.25
+37,114139,82,6,0.83
+47,100405,87,6,0.4
+23,39815,66,6,0.83
+48,97095,84,0,0.81
+42,76303,41,6,0.52
+25,118320,84,9,0.55
+48,102106,71,8,0.73
+53,67006,70,6,0.65
+19,32938,63,9,0.8
+24,37308,72,11,0.81
+34,51999,52,10,0.46
+37,132492,81,8,1.0
+44,131800,100,7,0.69
+27,29829,79,9,0.83
+31,64729,46,5,0.6
+34,64521,48,8,0.66
+27,37072,60,15,0.8
+28,35894,73,9,0.88
+35,35050,53,7,0.5
+43,100917,87,8,0.47
+45,68520,50,0,0.54
+18,41908,52,2,0.45
+38,86995,41,3,0.11
+43,41731,34,5,0.73
+51,179096,88,12,0.37
+34,100350,68,4,0.63
+29,19147,46,12,0.7
+25,70723,41,0,0.47
+29,71748,71,5,0.75
+29,17078,66,14,0.67
+25,39317,66,17,0.61
+37,110354,45,1,0.3
+43,45329,47,7,0.66
+20,31604,83,17,0.82
+22,39189,60,18,0.76
+52,169812,81,5,0.78
+22,30493,54,7,0.84
+21,33430,62,12,0.9
+39,58361,53,6,0.61
+26,110321,100,5,0.55
+34,27063,50,9,0.82
+48,73561,48,2,0.09
+50,88828,41,0,0.31
+20,21422,78,5,0.74
+33,86932,31,3,0.0
+27,49763,48,1,0.44
+44,59552,39,0,0.32
+41,82845,59,2,0.27
+32,50559,65,1,0.31
+18,34953,78,3,0.9
+42,113005,83,4,0.61
+41,61731,48,3,0.17
+31,24175,74,13,0.89
+18,28536,72,8,0.77
+57,77136,45,4,0.27
+50,90796,41,6,0.25
+25,23606,54,13,0.88
+24,53712,41,2,0.43
+28,22373,43,14,0.88
+57,102261,46,2,0.02
+24,42997,57,9,0.75
+58,75304,37,4,0.62
+46,56715,27,4,0.48
+44,132046,65,6,0.76
+19,17494,60,6,0.69
+31,62656,46,7,0.6
+31,29377,65,12,0.84
+45,116863,90,4,0.57
+31,128196,78,8,0.45
+32,32390,77,10,0.83
+46,79008,42,0,0.32
+38,114255,78,12,0.68
+53,76294,34,2,0.27
+37,31248,44,5,0.39
+33,18493,82,6,0.8
+23,37353,69,4,0.78
+49,112161,62,12,0.63
+45,143045,73,9,0.83
+53,99819,50,5,0.1
+35,24636,27,4,0.67
+37,51567,65,8,0.5
+54,52001,44,5,0.32
+50,91727,56,7,0.38
+60,131840,88,11,0.65
+36,91859,53,1,0.29
+53,50448,45,3,0.17
+28,38239,65,0,0.77
+32,27321,66,8,0.83
+41,112622,78,9,0.43
+40,80214,45,4,0.07
+18,33388,94,11,0.92
+18,46500,100,11,0.82
+58,90294,42,0,0.23
+26,30193,74,12,0.79
+24,41297,47,15,0.72
+30,58229,48,11,0.68
+49,56536,55,5,0.05
+48,104258,46,4,0.39
+28,33426,53,10,0.8
+55,66518,38,0,0.65
+20,37885,57,23,0.64
+30,68512,48,3,0.43
+61,93824,35,2,0.23
+44,78086,35,2,0.33
+26,73001,83,11,0.47
+53,58232,38,3,0.37
+52,79818,34,4,0.18
+22,32707,88,7,0.72
+53,71493,30,8,0.28
+24,31347,59,17,0.76
+19,40540,66,13,0.73
+30,128195,82,7,0.61
+39,57678,53,8,0.62
+44,94652,83,7,0.62
+27,86064,58,2,0.45
+59,98536,41,1,0.26
+34,62742,41,9,0.61
+47,90203,41,4,0.23
+30,56074,28,12,0.6
+28,22005,57,12,0.91
+59,84794,41,6,0.22
+22,36724,78,7,0.67
+22,34373,88,16,0.87
+49,85215,39,4,0.22
+18,27065,76,9,1.0
+31,112508,72,4,0.54
+28,118744,72,1,0.4
+59,185519,62,3,0.65
+24,50337,40,7,0.21
+27,39737,55,10,0.65
+55,144921,67,8,0.42
+34,45725,61,6,0.3
+63,163965,79,1,0.73
+51,129302,78,4,0.74
+40,129728,76,10,0.55
+18,22770,60,18,0.9
+39,65348,46,6,0.56
+52,147411,84,9,0.59
+42,61157,65,4,0.39
+21,26283,62,15,0.68
+51,76226,46,1,0.2
+45,99999,81,8,0.6
+48,101335,24,2,0.26
+49,106083,36,2,0.27
+32,26200,38,17,0.85
+55,81279,44,4,0.13
+46,121450,77,9,0.74
+32,31011,77,16,0.93
+42,67841,27,1,0.24
+31,104832,78,10,0.39
+34,101023,39,5,0.43
+57,89756,38,3,0.3
+52,57453,36,2,0.43
+41,46897,52,7,0.47
+34,33913,63,8,0.77
+56,104115,67,2,0.94
+52,90258,81,7,0.88
+54,104391,71,10,0.43
+37,115386,85,4,0.84
+36,55014,40,5,0.31
+31,45194,59,4,0.44
+45,88017,45,4,0.5
+65,93314,34,2,0.19
+48,75189,52,1,0.36
+51,106928,31,4,0.29
+49,114962,71,8,0.53
+36,43721,68,0,0.51
+42,64953,58,6,0.42
+28,29275,97,10,0.77
+69,92584,32,7,0.4
+20,31523,80,4,0.78
+35,62625,41,4,0.56
+42,40128,58,0,0.48
+18,34104,59,9,0.82
+59,91684,15,4,0.32
+21,35910,67,15,0.91
+27,34922,69,12,0.82
+34,130244,73,14,0.68
+45,57206,38,2,0.49
+18,25712,68,8,0.75
+31,69867,64,3,0.64
+49,90433,21,6,0.44
+20,44705,84,18,0.63
+52,75590,54,0,0.18
+18,27205,63,12,0.77
diff --git a/chapters/chapter-08-unsupervised-learning/datasets/sensors.csv b/chapters/chapter-08-unsupervised-learning/datasets/sensors.csv
new file mode 100644
index 0000000..e641ca3
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/datasets/sensors.csv
@@ -0,0 +1,201 @@
+temp,pressure,vibration,is_anomaly
+67.4,29.4,0.343,0
+62.3,32.1,0.492,0
+63.6,27.0,0.549,0
+68.4,28.0,0.52,0
+68.9,31.1,0.316,0
+67.8,27.4,0.665,0
+76.3,32.1,0.408,0
+60.1,28.5,0.674,0
+97.6,52.9,0.756,1
+67.4,24.7,0.533,0
+77.2,31.6,0.611,0
+76.7,33.9,0.591,0
+73.0,32.0,0.512,0
+65.0,31.5,0.356,0
+68.8,32.5,0.546,0
+76.6,34.7,0.547,0
+69.5,26.7,0.461,0
+100.5,40.7,1.425,1
+62.0,36.4,0.474,0
+73.5,28.6,0.406,0
+59.0,29.0,0.5,0
+76.1,27.3,0.433,0
+67.8,29.0,0.459,0
+65.4,33.1,0.373,0
+72.2,29.9,0.4,0
+62.9,29.2,0.514,0
+66.2,31.5,0.323,0
+70.9,30.8,0.437,0
+62.8,23.9,0.437,0
+73.1,31.5,0.407,0
+77.0,30.2,0.45,0
+73.4,29.5,0.251,0
+77.3,49.5,0.949,1
+64.8,27.5,0.555,0
+69.4,30.4,0.552,0
+75.0,28.6,0.52,0
+70.4,34.2,0.397,0
+73.5,35.1,0.6,0
+78.9,28.9,0.62,0
+93.7,51.2,1.037,1
+72.5,31.0,0.544,0
+64.3,27.3,0.654,0
+70.4,30.8,0.384,0
+67.1,30.9,0.53,0
+67.4,33.8,0.448,0
+71.5,32.7,0.496,0
+70.6,28.9,0.39,0
+59.6,27.9,0.304,0
+64.8,31.4,0.643,0
+72.2,22.9,0.534,0
+101.6,52.0,1.254,1
+80.7,27.1,0.605,0
+65.9,29.3,0.375,0
+67.2,28.1,0.568,0
+77.1,24.3,0.262,0
+74.6,26.5,0.538,0
+65.4,29.0,0.619,0
+73.6,31.1,0.578,0
+72.6,34.8,0.288,0
+80.3,29.1,0.329,0
+70.6,34.3,0.43,0
+64.6,28.1,0.424,0
+67.5,33.2,0.502,0
+70.4,27.9,0.392,0
+72.5,31.5,0.33,0
+63.2,32.6,0.43,0
+66.8,26.1,0.45,0
+73.2,34.5,0.519,0
+67.0,26.4,0.402,0
+70.3,34.4,0.439,0
+74.0,33.5,0.262,0
+69.6,34.3,0.714,0
+92.9,53.4,1.367,1
+68.8,30.1,0.526,0
+95.1,40.5,1.75,1
+61.5,34.1,0.421,0
+68.9,26.6,0.371,0
+66.4,27.6,0.471,0
+102.8,40.4,0.751,1
+74.8,23.2,0.441,0
+69.1,24.6,0.647,0
+100.5,38.5,1.069,1
+102.0,52.7,0.611,1
+63.3,29.2,0.737,0
+95.5,33.2,0.94,1
+75.6,31.2,0.515,0
+83.2,25.6,0.377,0
+60.3,29.6,0.589,0
+104.6,40.3,1.798,1
+69.9,34.4,0.405,0
+70.0,26.0,0.423,0
+76.9,26.0,0.609,0
+64.0,24.5,0.445,0
+73.7,33.8,0.528,0
+76.4,27.8,0.432,0
+68.8,25.3,0.419,0
+66.9,32.4,0.68,0
+58.5,27.5,0.478,0
+74.5,30.8,0.557,0
+75.5,25.8,0.551,0
+70.3,33.3,0.533,0
+70.2,33.4,0.475,0
+73.8,33.9,0.482,0
+62.8,34.0,0.461,0
+76.4,29.9,0.503,0
+75.5,26.9,0.594,0
+64.0,27.9,0.46,0
+69.4,29.8,0.485,0
+68.9,32.1,0.583,0
+71.7,29.1,0.582,0
+69.4,34.4,0.351,0
+75.1,31.8,0.433,0
+73.4,31.8,0.682,0
+62.8,24.6,0.516,0
+72.5,30.4,0.425,0
+62.5,26.9,0.586,0
+72.7,27.6,0.419,0
+69.1,33.0,0.369,0
+68.1,30.4,0.577,0
+70.4,27.7,0.537,0
+74.4,30.8,0.706,0
+69.3,28.9,0.278,0
+81.6,33.0,0.503,0
+65.9,31.2,0.396,0
+71.8,29.3,0.674,0
+91.6,44.4,1.021,1
+67.9,31.7,0.678,0
+57.6,31.0,0.438,0
+101.6,43.6,1.199,1
+70.1,34.0,0.57,0
+74.8,27.3,0.586,0
+60.3,32.1,0.428,0
+75.6,29.0,0.229,0
+86.2,43.1,1.148,1
+72.5,25.4,0.667,0
+66.7,28.6,0.537,0
+85.9,48.5,1.196,1
+63.2,34.6,0.643,0
+74.6,29.5,0.415,0
+74.6,30.4,0.542,0
+73.9,29.6,0.519,0
+74.2,30.7,0.449,0
+69.2,25.8,0.537,0
+76.7,30.3,0.541,0
+69.5,28.9,0.445,0
+63.8,32.9,0.537,0
+74.8,28.6,0.646,0
+76.2,33.7,0.468,0
+69.2,32.4,0.36,0
+72.2,29.4,0.497,0
+67.0,32.4,0.435,0
+94.3,47.2,1.333,1
+67.8,30.4,0.613,0
+64.2,30.1,0.646,0
+70.3,30.4,0.496,0
+77.3,29.1,0.466,0
+65.5,30.5,0.306,0
+86.2,44.0,0.773,1
+70.7,21.9,0.387,0
+70.6,28.2,0.535,0
+69.1,30.3,0.524,0
+68.1,31.3,0.572,0
+69.8,31.0,0.418,0
+68.6,29.1,0.547,0
+68.2,29.2,0.53,0
+65.3,27.5,0.506,0
+76.5,25.9,0.495,0
+74.3,25.8,0.479,0
+68.3,29.3,0.489,0
+70.0,26.4,0.527,0
+67.4,33.5,0.41,0
+74.9,37.5,0.408,0
+91.4,37.8,0.975,1
+75.6,22.0,0.609,0
+69.1,33.5,0.412,0
+69.5,26.5,0.503,0
+73.7,32.9,0.546,0
+65.9,30.7,0.48,0
+70.4,27.0,0.461,0
+77.0,26.3,0.324,0
+70.7,27.3,0.295,0
+68.2,29.2,0.343,0
+67.8,27.7,0.476,0
+83.9,31.7,0.607,0
+71.5,26.0,0.72,0
+74.8,28.0,0.633,0
+68.3,31.6,0.477,0
+70.9,26.2,0.68,0
+85.7,42.0,0.568,1
+71.8,29.7,0.581,0
+69.1,29.2,0.561,0
+71.5,29.2,0.528,0
+68.7,30.0,0.668,0
+68.0,25.5,0.424,0
+69.7,31.2,0.539,0
+67.0,28.3,0.494,0
+68.2,22.1,0.596,0
+74.8,29.8,0.435,0
+74.5,32.1,0.794,0
+62.9,29.4,0.552,0
diff --git a/chapters/chapter-08-unsupervised-learning/exercises/exercises.py b/chapters/chapter-08-unsupervised-learning/exercises/exercises.py
new file mode 100644
index 0000000..07d8f6c
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/exercises/exercises.py
@@ -0,0 +1,154 @@
+"""
+Chapter 8 Exercises: Unsupervised Learning
+
+Generated by Berta AI | Created by Luigi Pascal Rondanini
+"""
+
+import numpy as np
+
+
+# =============================================================================
+# Exercise 1: Implement K-Means Clustering From Scratch
+# =============================================================================
+# Build a KMeans class that:
+#   - Initializes K centroids randomly from the data points
+#   - Assigns each point to the nearest centroid (Euclidean distance)
+#   - Recomputes centroids as the mean of assigned points
+#   - Repeats for max_iters or until convergence (centroids stop moving)
+#
+# Methods:
+#   - fit(X): Run the K-Means algorithm
+#   - predict(X): Assign each row to its nearest centroid
+#   - fit_predict(X): fit then predict
+#
+# Attributes after fit:
+#   - centroids: (K, n_features) array
+#   - inertia: within-cluster sum of squared distances
+#
+# Hint: np.linalg.norm(X[:, None] - centroids, axis=2) gives all pairwise distances
+
+class KMeansClustering:
+    def __init__(self, n_clusters=3, max_iters=100, random_state=42):
+        # YOUR CODE HERE
+        pass
+
+    def fit(self, X):
+        # YOUR CODE HERE
+        pass
+
+    def predict(self, X):
+        # YOUR CODE HERE
+        pass
+
+    def fit_predict(self, X):
+        # YOUR CODE HERE
+        pass
+
+
+# =============================================================================
+# Exercise 2: Implement PCA From Scratch
+# =============================================================================
+# Build a PCA class that:
+#   - Centers the data (subtract mean)
+#   - Computes the covariance matrix
+#   - Finds eigenvectors/eigenvalues via np.linalg.eigh
+#   - Sorts components by descending eigenvalue
+#   - Projects data onto the top n_components eigenvectors
+#
+# Methods:
+#   - fit(X): Compute components
+#   - transform(X): Project X onto components
+#   - fit_transform(X): fit then transform
+#
+# Attributes after fit:
+#   - components_: (n_components, n_features) array
+#   - explained_variance_ratio_: fraction of variance per component
+#
+# Hint: covariance = X_centered.T @ X_centered / (n - 1)
+
+class PCAFromScratch:
+    def __init__(self, n_components=2):
+        # YOUR CODE HERE
+        pass
+
+    def fit(self, X):
+        # YOUR CODE HERE
+        pass
+
+    def transform(self, X):
+        # YOUR CODE HERE
+        pass
+
+    def fit_transform(self, X):
+        # YOUR CODE HERE
+        pass
+
+
+# =============================================================================
+# Exercise 3: Implement Silhouette Score From Scratch
+# =============================================================================
+# Compute the silhouette score for a clustering result:
+#   For each point i:
+#     a(i) = mean distance to all other points in the same cluster
+#     b(i) = min over other clusters of mean distance to that cluster's points
+#     s(i) = (b(i) - a(i)) / max(a(i), b(i))
+#   Return the mean of s(i) over all points.
+#
+# Parameters:
+#   X: (n_samples, n_features) array
+#   labels: (n_samples,) array of cluster assignments
+#
+# Return: float in [-1, 1], higher is better
+#
+# Hint: Use pairwise Euclidean distances. Handle single-point clusters (s=0).
+
+def silhouette_score_scratch(X, labels):
+    # YOUR CODE HERE
+    pass
+
+
+# =============================================================================
+# Exercise 4: Anomaly Detection with Z-Score
+# =============================================================================
+# Implement a simple anomaly detector that:
+#   1. Computes the Z-score for each feature: z = (x - mean) / std
+#   2. Flags a point as anomalous if any feature has |z| > threshold
+#
+# Parameters:
+#   X: (n_samples, n_features) array
+#   threshold: float (default 3.0)
+#
+# Return: (n_samples,) boolean array, True = anomaly
+#
+# Hint: np.any(np.abs(z_scores) > threshold, axis=1)
+
+def detect_anomalies_zscore(X, threshold=3.0):
+    # YOUR CODE HERE
+    pass
+
+
+# =============================================================================
+# Exercise 5: End-to-End Customer Segmentation Pipeline
+# =============================================================================
+# Build a pipeline that:
+#   1. Loads customer data from datasets/customers.csv
+#   2. Scales features with StandardScaler
+#   3. Applies PCA (keep 95% variance)
+#   4. Uses elbow method to find optimal K (test K=2..8)
+#   5. Runs K-Means with optimal K
+#   6. Returns segment profiles (mean of original features per cluster)
+#
+# Return dict: {
+#     "n_clusters": int,
+#     "labels": array,
+#     "profiles": DataFrame (one row per cluster, columns = original features),
+#     "inertias": list (for each K tested),
+#     "silhouette": float
+# }
+#
+# Hint: The "elbow" can be found by looking for the K where the second
+#       derivative of inertia changes most (or just pick K=4 if uncertain).
+
+def customer_segmentation_pipeline(csv_path="datasets/customers.csv"):
+    # YOUR CODE HERE
+    pass
diff --git a/chapters/chapter-08-unsupervised-learning/exercises/solutions/solutions.py b/chapters/chapter-08-unsupervised-learning/exercises/solutions/solutions.py
new file mode 100644
index 0000000..1ffad8c
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/exercises/solutions/solutions.py
@@ -0,0 +1,265 @@
+"""
+Chapter 8 Solutions: Unsupervised Learning
+
+Generated by Berta AI | Created by Luigi Pascal Rondanini
+"""
+
+import numpy as np
+from pathlib import Path
+
+
+# =============================================================================
+# Exercise 1: K-Means Clustering From Scratch
+# =============================================================================
+
+class KMeansClustering:
+    def __init__(self, n_clusters=3, max_iters=100, random_state=42):
+        self.n_clusters = n_clusters
+        self.max_iters = max_iters
+        self.random_state = random_state
+        self.centroids = None
+        self.inertia = None
+
+    def fit(self, X):
+        X = np.asarray(X, dtype=float)
+        rng = np.random.RandomState(self.random_state)
+        idx = rng.choice(len(X), size=self.n_clusters, replace=False)
+        self.centroids = X[idx].copy()
+
+        for _ in range(self.max_iters):
+            distances = np.linalg.norm(X[:, None] - self.centroids, axis=2)
+            labels = np.argmin(distances, axis=1)
+
+            new_centroids = np.array([
+                X[labels == k].mean(axis=0) if np.any(labels == k) else self.centroids[k]
+                for k in range(self.n_clusters)
+            ])
+
+            if np.allclose(new_centroids, self.centroids):
+                break
+            self.centroids = new_centroids
+
+        distances = np.linalg.norm(X[:, None] - self.centroids, axis=2)
+        labels = np.argmin(distances, axis=1)
+        self.inertia = sum(
+            np.sum((X[labels == k] - self.centroids[k]) ** 2)
+            for k in range(self.n_clusters)
+        )
+        self._labels = labels
+        return self
+
+    def predict(self, X):
+        X = np.asarray(X, dtype=float)
+        distances = np.linalg.norm(X[:, None] - self.centroids, axis=2)
+        return np.argmin(distances, axis=1)
+
+    def fit_predict(self, X):
+        self.fit(X)
+        return self._labels
+
+
+# =============================================================================
+# Exercise 2: PCA From Scratch
+# =============================================================================
+
+class PCAFromScratch:
+    def __init__(self, n_components=2):
+        self.n_components = n_components
+        self.components_ = None
+        self.explained_variance_ratio_ = None
+        self._mean = None
+
+    def fit(self, X):
+        X = np.asarray(X, dtype=float)
+        self._mean = X.mean(axis=0)
+        X_centered = X - self._mean
+        n = X.shape[0]
+        cov = X_centered.T @ X_centered / (n - 1)
+
+        eigenvalues, eigenvectors = np.linalg.eigh(cov)
+        idx = np.argsort(eigenvalues)[::-1]
+        eigenvalues = eigenvalues[idx]
+        eigenvectors = eigenvectors[:, idx]
+
+        self.components_ = eigenvectors[:, :self.n_components].T
+        total_var = eigenvalues.sum()
+        self.explained_variance_ratio_ = eigenvalues[:self.n_components] / total_var
+        return self
+
+    def transform(self, X):
+        X = np.asarray(X, dtype=float)
+        X_centered = X - self._mean
+        return X_centered @ self.components_.T
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+
+# =============================================================================
+# Exercise 3: Silhouette Score From Scratch
+# =============================================================================
+
+def silhouette_score_scratch(X, labels):
+    X = np.asarray(X, dtype=float)
+    labels = np.asarray(labels)
+    n = len(X)
+    unique_labels = np.unique(labels)
+
+    if len(unique_labels) < 2:
+        return 0.0
+
+    scores = np.zeros(n)
+    for i in range(n):
+        same_mask = labels == labels[i]
+        same_mask[i] = False
+        same_cluster = X[same_mask]
+
+        if len(same_cluster) == 0:
+            scores[i] = 0.0
+            continue
+
+        a_i = np.mean(np.linalg.norm(same_cluster - X[i], axis=1))
+
+        b_i = np.inf
+        for k in unique_labels:
+            if k == labels[i]:
+                continue
+            other_cluster = X[labels == k]
+            mean_dist = np.mean(np.linalg.norm(other_cluster - X[i], axis=1))
+            b_i = min(b_i, mean_dist)
+
+        denom = max(a_i, b_i)
+        scores[i] = (b_i - a_i) / denom if denom > 0 else 0.0
+
+    return float(np.mean(scores))
+
+
+# =============================================================================
+# Exercise 4: Anomaly Detection with Z-Score
+# =============================================================================
+
+def detect_anomalies_zscore(X, threshold=3.0):
+    X = np.asarray(X, dtype=float)
+    mean = X.mean(axis=0)
+    std = X.std(axis=0)
+    std[std == 0] = 1.0
+    z_scores = (X - mean) / std
+    return np.any(np.abs(z_scores) > threshold, axis=1)
+
+
+# =============================================================================
+# Exercise 5: Customer Segmentation Pipeline
+# =============================================================================
+
+def customer_segmentation_pipeline(csv_path="datasets/customers.csv"):
+    try:
+        import pandas as pd
+        from sklearn.preprocessing import StandardScaler
+        from sklearn.decomposition import PCA
+        from sklearn.cluster import KMeans
+        from sklearn.metrics import silhouette_score
+    except ImportError:
+        return {"n_clusters": 0, "labels": None, "profiles": None,
+                "inertias": [], "silhouette": 0.0}
+
+    base = Path(__file__).parent.parent.parent
+    path = base / csv_path
+    if not path.exists():
+        return {"n_clusters": 0, "labels": None, "profiles": None,
+                "inertias": [], "silhouette": 0.0}
+
+    df = pd.read_csv(path)
+    feature_cols = [c for c in df.columns if c not in ("customer_id", "segment")]
+    X_raw = df[feature_cols].values
+
+    scaler = StandardScaler()
+    X_scaled = scaler.fit_transform(X_raw)
+
+    pca = PCA(n_components=0.95)
+    X_pca = pca.fit_transform(X_scaled)
+
+    K_range = range(2, 9)
+    inertias = []
+    for k in K_range:
+        km = KMeans(n_clusters=k, n_init=10, random_state=42)
+        km.fit(X_pca)
+        inertias.append(km.inertia_)
+
+    diffs = np.diff(inertias)
+    diffs2 = np.diff(diffs)
+    best_k = int(np.argmax(np.abs(diffs2)) + 2)
+    best_k = max(2, min(best_k, 8))
+
+    km_final = KMeans(n_clusters=best_k, n_init=10, random_state=42)
+    labels = km_final.fit_predict(X_pca)
+    sil = silhouette_score(X_pca, labels)
+
+    df["cluster"] = labels
+    profiles = df.groupby("cluster")[feature_cols].mean()
+
+    return {
+        "n_clusters": best_k,
+        "labels": labels,
+        "profiles": profiles,
+        "inertias": list(inertias),
+        "silhouette": float(sil),
+    }
+
+
+if __name__ == "__main__":
+    print("Chapter 8 Solutions - Verification\n")
+
+    np.random.seed(42)
+
+    # Ex 1
+    print("Exercise 1: K-Means Clustering")
+    from sklearn.datasets import make_blobs
+    X, y_true = make_blobs(n_samples=200, centers=3, random_state=42)
+    km = KMeansClustering(n_clusters=3, random_state=42)
+    labels = km.fit_predict(X)
+    assert km.centroids.shape == (3, 2)
+    assert len(labels) == 200
+    assert km.inertia > 0
+    print(f"  Inertia = {km.inertia:.2f}")
+    print(f"  Centroids shape: {km.centroids.shape}")
+
+    # Ex 2
+    print("\nExercise 2: PCA From Scratch")
+    X_4d = np.random.randn(100, 4)
+    pca = PCAFromScratch(n_components=2)
+    X_2d = pca.fit_transform(X_4d)
+    assert X_2d.shape == (100, 2)
+    assert len(pca.explained_variance_ratio_) == 2
+    assert abs(sum(pca.explained_variance_ratio_) - sum(pca.explained_variance_ratio_)) < 1e-10
+    print(f"  Variance explained: {pca.explained_variance_ratio_}")
+    print(f"  Projected shape: {X_2d.shape}")
+
+    # Ex 3
+    print("\nExercise 3: Silhouette Score")
+    sil = silhouette_score_scratch(X, y_true)
+    assert -1 <= sil <= 1
+    print(f"  Silhouette score = {sil:.4f}")
+
+    # Ex 4
+    print("\nExercise 4: Anomaly Detection (Z-Score)")
+    X_normal = np.random.randn(100, 3)
+    X_anomalies = np.array([[10, 10, 10], [-8, -8, -8]])
+    X_combined = np.vstack([X_normal, X_anomalies])
+    flags = detect_anomalies_zscore(X_combined, threshold=3.0)
+    assert flags[-1] == True
+    assert flags[-2] == True
+    n_detected = flags.sum()
+    print(f"  Detected {n_detected} anomalies out of {len(X_combined)} points")
+
+    # Ex 5
+    print("\nExercise 5: Customer Segmentation Pipeline")
+    result = customer_segmentation_pipeline()
+    if result["labels"] is not None:
+        print(f"  Optimal K: {result['n_clusters']}")
+        print(f"  Silhouette: {result['silhouette']:.4f}")
+        print(f"  Segment profiles:\n{result['profiles']}")
+    else:
+        print("  (Dataset may not be found - run from chapter root)")
+
+    print("\nAll verifications passed.")
diff --git a/chapters/chapter-08-unsupervised-learning/notebooks/01_introduction.ipynb b/chapters/chapter-08-unsupervised-learning/notebooks/01_introduction.ipynb
new file mode 100644
index 0000000..5bb2233
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/notebooks/01_introduction.ipynb
@@ -0,0 +1,580 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Chapter 8: Unsupervised Learning\n",
+    "## Notebook 01 - Introduction: Clustering Basics\n",
+    "\n",
+    "Unsupervised learning finds hidden patterns in data without labels. We start with the most fundamental algorithm: K-Means clustering.\n",
+    "\n",
+    "**What you'll learn:**\n",
+    "- The difference between supervised and unsupervised learning\n",
+    "- K-Means clustering from scratch using NumPy\n",
+    "- Evaluating clusters with inertia and silhouette score\n",
+    "- The elbow method for choosing K\n",
+    "- Scikit-learn's KMeans interface\n",
+    "\n",
+    "**Time estimate:** 2.5 hours\n",
+    "\n",
+    "---\n",
+    "*Generated by Berta AI | Created by Luigi Pascal Rondanini*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 1. Supervised vs Unsupervised Learning\n",
+    "\n",
+    "In **supervised learning**, every training example comes with a label — the \"right answer\" — and the model learns a mapping from inputs to outputs. Classification and regression are the classic examples.\n",
+    "\n",
+    "In **unsupervised learning**, there are **no labels at all**. The algorithm must discover structure in the data on its own. Common tasks include:\n",
+    "\n",
+    "| Task | Goal | Example algorithms |\n",
+    "|------|------|--------------------|\n",
+    "| **Clustering** | Group similar points together | K-Means, DBSCAN, Hierarchical |\n",
+    "| **Dimensionality reduction** | Compress features while preserving structure | PCA, t-SNE, UMAP |\n",
+    "| **Anomaly detection** | Find unusual observations | Isolation Forest, LOF |\n",
+    "\n",
+    "This notebook focuses on **clustering** — specifically the **K-Means** algorithm, the most widely-used clustering method.\n",
+    "\n",
+    "Let's start by generating some data and seeing what it looks like *without* labels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.datasets import make_blobs\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "X, y_true = make_blobs(\n",
+    "    n_samples=200, centers=3, cluster_std=0.9, random_state=42\n",
+    ")\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n",
+    "\n",
+    "axes[0].scatter(X[:, 0], X[:, 1], c=\"steelblue\", edgecolors=\"k\", s=50, alpha=0.7)\n",
+    "axes[0].set_title(\"What we observe (no labels)\", fontsize=14)\n",
+    "axes[0].set_xlabel(\"Feature 1\")\n",
+    "axes[0].set_ylabel(\"Feature 2\")\n",
+    "\n",
+    "colors = [\"#e74c3c\", \"#2ecc71\", \"#3498db\"]\n",
+    "for k in range(3):\n",
+    "    mask = y_true == k\n",
+    "    axes[1].scatter(X[mask, 0], X[mask, 1], c=colors[k],\n",
+    "                    edgecolors=\"k\", s=50, alpha=0.7, label=f\"Cluster {k}\")\n",
+    "axes[1].set_title(\"True clusters (hidden from algorithm)\", fontsize=14)\n",
+    "axes[1].set_xlabel(\"Feature 1\")\n",
+    "axes[1].set_ylabel(\"Feature 2\")\n",
+    "axes[1].legend()\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The left panel is what an unsupervised algorithm receives — raw coordinates with no color-coding. The right panel reveals the ground truth we want the algorithm to *recover* on its own.\n",
+    "\n",
+    "---\n",
+    "## 2. K-Means Algorithm — Theory\n",
+    "\n",
+    "K-Means is an iterative algorithm that partitions *n* data points into *K* clusters. It works in three repeating steps:\n",
+    "\n",
+    "### Step 1 — Initialize\n",
+    "Pick *K* points as initial **centroids** (cluster centers). The simplest approach is to choose *K* data points at random.\n",
+    "\n",
+    "### Step 2 — Assign\n",
+    "For every data point, compute the Euclidean distance to each centroid and assign the point to the **nearest** centroid:\n",
+    "\n",
+    "$$c_i = \\arg\\min_{k} \\| x_i - \\mu_k \\|^2$$\n",
+    "\n",
+    "### Step 3 — Update\n",
+    "Recompute each centroid as the **mean** of all points currently assigned to that cluster:\n",
+    "\n",
+    "$$\\mu_k = \\frac{1}{|C_k|} \\sum_{x_i \\in C_k} x_i$$\n",
+    "\n",
+    "### Repeat\n",
+    "Alternate between Steps 2 and 3 until the assignments no longer change (or a maximum number of iterations is reached).\n",
+    "\n",
+    "### Important caveats\n",
+    "- **Random initialization sensitivity:** Different starting centroids can lead to different final clusters. Running the algorithm multiple times with different seeds and keeping the best result is standard practice.\n",
+    "- **K must be chosen in advance.** We'll learn the *elbow method* later in this notebook.\n",
+    "- The algorithm minimises **inertia** (within-cluster sum of squares) — it always converges, but to a *local* minimum, not necessarily the global one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 3. K-Means From Scratch\n",
+    "\n",
+    "Let's implement K-Means using only NumPy so we truly understand every step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class KMeansScratch:\n",
+    "    \"\"\"Minimal K-Means implementation using NumPy.\"\"\"\n",
+    "\n",
+    "    def __init__(self, k=3, max_iters=100, random_state=42):\n",
+    "        self.k = k\n",
+    "        self.max_iters = max_iters\n",
+    "        self.random_state = random_state\n",
+    "        self.centroids = None\n",
+    "        self.labels_ = None\n",
+    "        self.inertia_ = None\n",
+    "        self.inertia_history = []\n",
+    "        self.centroid_history = []\n",
+    "        self.label_history = []\n",
+    "\n",
+    "    def _euclidean_distances(self, X, centroids):\n",
+    "        \"\"\"Compute distance from every point to every centroid.\"\"\"\n",
+    "        # X: (n, d), centroids: (k, d) -> result: (n, k)\n",
+    "        return np.sqrt(((X[:, np.newaxis] - centroids[np.newaxis]) ** 2).sum(axis=2))\n",
+    "\n",
+    "    def _compute_inertia(self, X, labels, centroids):\n",
+    "        return sum(\n",
+    "            np.sum((X[labels == k] - centroids[k]) ** 2)\n",
+    "            for k in range(self.k)\n",
+    "        )\n",
+    "\n",
+    "    def fit(self, X):\n",
+    "        rng = np.random.RandomState(self.random_state)\n",
+    "        n_samples = X.shape[0]\n",
+    "\n",
+    "        # Step 1: random initialization\n",
+    "        idx = rng.choice(n_samples, self.k, replace=False)\n",
+    "        self.centroids = X[idx].copy()\n",
+    "\n",
+    "        self.inertia_history = []\n",
+    "        self.centroid_history = [self.centroids.copy()]\n",
+    "        self.label_history = []\n",
+    "\n",
+    "        for _ in range(self.max_iters):\n",
+    "            # Step 2: assign\n",
+    "            distances = self._euclidean_distances(X, self.centroids)\n",
+    "            labels = np.argmin(distances, axis=1)\n",
+    "            self.label_history.append(labels.copy())\n",
+    "\n",
+    "            # Step 3: update centroids\n",
+    "            new_centroids = np.array([\n",
+    "                X[labels == k].mean(axis=0) if np.any(labels == k)\n",
+    "                else self.centroids[k]\n",
+    "                for k in range(self.k)\n",
+    "            ])\n",
+    "\n",
+    "            inertia = self._compute_inertia(X, labels, new_centroids)\n",
+    "            self.inertia_history.append(inertia)\n",
+    "            self.centroid_history.append(new_centroids.copy())\n",
+    "\n",
+    "            if np.allclose(new_centroids, self.centroids):\n",
+    "                break\n",
+    "            self.centroids = new_centroids\n",
+    "\n",
+    "        self.labels_ = labels\n",
+    "        self.inertia_ = self.inertia_history[-1]\n",
+    "        return self\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        distances = self._euclidean_distances(X, self.centroids)\n",
+    "        return np.argmin(distances, axis=1)\n",
+    "\n",
+    "\n",
+    "km_scratch = KMeansScratch(k=3, random_state=42)\n",
+    "km_scratch.fit(X)\n",
+    "\n",
+    "print(f\"Converged in {len(km_scratch.inertia_history)} iterations\")\n",
+    "print(f\"Final inertia: {km_scratch.inertia_:.2f}\")\n",
+    "print(f\"Centroids:\\n{km_scratch.centroids}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n",
+    "\n",
+    "colors_map = np.array([\"#e74c3c\", \"#2ecc71\", \"#3498db\"])\n",
+    "\n",
+    "for k in range(3):\n",
+    "    mask = y_true == k\n",
+    "    axes[0].scatter(X[mask, 0], X[mask, 1], c=colors[k],\n",
+    "                    edgecolors=\"k\", s=50, alpha=0.7, label=f\"True {k}\")\n",
+    "axes[0].set_title(\"Ground Truth\", fontsize=14)\n",
+    "axes[0].legend()\n",
+    "axes[0].set_xlabel(\"Feature 1\")\n",
+    "axes[0].set_ylabel(\"Feature 2\")\n",
+    "\n",
+    "axes[1].scatter(X[:, 0], X[:, 1], c=colors_map[km_scratch.labels_],\n",
+    "                edgecolors=\"k\", s=50, alpha=0.7)\n",
+    "axes[1].scatter(km_scratch.centroids[:, 0], km_scratch.centroids[:, 1],\n",
+    "                c=colors, marker=\"X\", s=250, edgecolors=\"k\", linewidths=1.5,\n",
+    "                zorder=5, label=\"Centroids\")\n",
+    "axes[1].set_title(\"K-Means (scratch) result\", fontsize=14)\n",
+    "axes[1].legend()\n",
+    "axes[1].set_xlabel(\"Feature 1\")\n",
+    "axes[1].set_ylabel(\"Feature 2\")\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 4. Step-by-Step K-Means Visualization\n",
+    "\n",
+    "To build intuition for how the algorithm converges, let's watch the first four iterations unfold. Each subplot shows the cluster assignments and centroid positions at a particular iteration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n",
+    "axes = axes.ravel()\n",
+    "\n",
+    "colors_map = np.array([\"#e74c3c\", \"#2ecc71\", \"#3498db\"])\n",
+    "\n",
+    "n_show = min(4, len(km_scratch.label_history))\n",
+    "\n",
+    "for i in range(n_show):\n",
+    "    ax = axes[i]\n",
+    "    labels_i = km_scratch.label_history[i]\n",
+    "    centroids_i = km_scratch.centroid_history[i]       # centroids *before* this assignment\n",
+    "    centroids_next = km_scratch.centroid_history[i + 1] # centroids *after* update\n",
+    "\n",
+    "    ax.scatter(X[:, 0], X[:, 1], c=colors_map[labels_i],\n",
+    "               edgecolors=\"k\", s=40, alpha=0.6)\n",
+    "\n",
+    "    # Old centroids (hollow)\n",
+    "    ax.scatter(centroids_i[:, 0], centroids_i[:, 1],\n",
+    "               facecolors=\"none\", edgecolors=\"k\", marker=\"o\",\n",
+    "               s=200, linewidths=2, label=\"Old centroid\")\n",
+    "\n",
+    "    # New centroids (filled star)\n",
+    "    ax.scatter(centroids_next[:, 0], centroids_next[:, 1],\n",
+    "               c=colors, marker=\"X\", s=250, edgecolors=\"k\",\n",
+    "               linewidths=1.5, zorder=5, label=\"New centroid\")\n",
+    "\n",
+    "    # Arrows showing centroid movement\n",
+    "    for k in range(3):\n",
+    "        ax.annotate(\"\",\n",
+    "                    xy=centroids_next[k], xytext=centroids_i[k],\n",
+    "                    arrowprops=dict(arrowstyle=\"->\", lw=1.5, color=\"black\"))\n",
+    "\n",
+    "    ax.set_title(f\"Iteration {i + 1}  |  inertia = {km_scratch.inertia_history[i]:.1f}\",\n",
+    "                 fontsize=12)\n",
+    "    if i == 0:\n",
+    "        ax.legend(fontsize=9, loc=\"upper left\")\n",
+    "\n",
+    "for j in range(n_show, 4):\n",
+    "    axes[j].axis(\"off\")\n",
+    "\n",
+    "plt.suptitle(\"K-Means — Iteration-by-Iteration\", fontsize=15, y=1.01)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice how the centroids (stars) migrate toward the cluster centers with each iteration while the assignments stabilize.\n",
+    "\n",
+    "---\n",
+    "## 5. Evaluating Clusters\n",
+    "\n",
+    "How do we know if K-Means did a good job? Two common metrics:\n",
+    "\n",
+    "### Inertia (Within-Cluster Sum of Squares — WCSS)\n",
+    "$$\\text{Inertia} = \\sum_{k=1}^{K} \\sum_{x_i \\in C_k} \\| x_i - \\mu_k \\|^2$$\n",
+    "\n",
+    "Lower is better, but inertia **always decreases** as K increases (at K = n every point is its own cluster with inertia = 0). So inertia alone doesn't tell us the *right* K.\n",
+    "\n",
+    "### Silhouette Score\n",
+    "For each point *i*:\n",
+    "- **a(i)** = mean distance to other points in the *same* cluster\n",
+    "- **b(i)** = mean distance to points in the *nearest different* cluster\n",
+    "\n",
+    "$$s(i) = \\frac{b(i) - a(i)}{\\max(a(i),\\, b(i))}$$\n",
+    "\n",
+    "Values range from −1 to +1. Higher is better; values near 0 indicate overlapping clusters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.metrics import silhouette_score, silhouette_samples\n",
+    "\n",
+    "sil_avg = silhouette_score(X, km_scratch.labels_)\n",
+    "sil_vals = silhouette_samples(X, km_scratch.labels_)\n",
+    "\n",
+    "print(f\"Inertia:              {km_scratch.inertia_:.2f}\")\n",
+    "print(f\"Silhouette (mean):    {sil_avg:.4f}\")\n",
+    "print(f\"Silhouette (min):     {sil_vals.min():.4f}\")\n",
+    "print(f\"Silhouette (max):     {sil_vals.max():.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(figsize=(8, 5))\n",
+    "\n",
+    "y_lower = 10\n",
+    "colors_sil = [\"#e74c3c\", \"#2ecc71\", \"#3498db\"]\n",
+    "\n",
+    "for k in range(3):\n",
+    "    cluster_sil = np.sort(sil_vals[km_scratch.labels_ == k])\n",
+    "    cluster_size = cluster_sil.shape[0]\n",
+    "    y_upper = y_lower + cluster_size\n",
+    "\n",
+    "    ax.fill_betweenx(np.arange(y_lower, y_upper), 0, cluster_sil,\n",
+    "                     facecolor=colors_sil[k], edgecolor=colors_sil[k], alpha=0.7)\n",
+    "    ax.text(-0.05, y_lower + 0.5 * cluster_size, f\"Cluster {k}\", fontsize=11,\n",
+    "            fontweight=\"bold\", va=\"center\")\n",
+    "    y_lower = y_upper + 10\n",
+    "\n",
+    "ax.axvline(x=sil_avg, color=\"k\", linestyle=\"--\", linewidth=1.5,\n",
+    "           label=f\"Mean silhouette = {sil_avg:.3f}\")\n",
+    "ax.set_xlabel(\"Silhouette coefficient\", fontsize=12)\n",
+    "ax.set_ylabel(\"Points (sorted within cluster)\", fontsize=12)\n",
+    "ax.set_title(\"Silhouette Plot — K-Means (K=3)\", fontsize=14)\n",
+    "ax.legend(fontsize=11)\n",
+    "ax.set_yticks([])\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A healthy silhouette plot shows clusters of roughly similar width that extend well past the mean line. Thin slivers or clusters that barely cross zero suggest poor separation.\n",
+    "\n",
+    "---\n",
+    "## 6. The Elbow Method for Choosing K\n",
+    "\n",
+    "Since we must specify *K* before running K-Means, how do we pick a good value?\n",
+    "\n",
+    "**The Elbow Method:**\n",
+    "1. Run K-Means for K = 1, 2, …, K_max.\n",
+    "2. Plot inertia vs K.\n",
+    "3. Look for the **\"elbow\"** — the point where inertia stops decreasing sharply and begins to level off.\n",
+    "\n",
+    "The elbow suggests a natural number of clusters in the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "K_range = range(1, 11)\n",
+    "inertias = []\n",
+    "silhouettes = []\n",
+    "\n",
+    "for k in K_range:\n",
+    "    km = KMeansScratch(k=k, random_state=42)\n",
+    "    km.fit(X)\n",
+    "    inertias.append(km.inertia_)\n",
+    "    if k >= 2:\n",
+    "        silhouettes.append(silhouette_score(X, km.labels_))\n",
+    "    else:\n",
+    "        silhouettes.append(np.nan)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "axes[0].plot(K_range, inertias, \"o-\", color=\"#2c3e50\", linewidth=2, markersize=8)\n",
+    "axes[0].set_xlabel(\"Number of clusters (K)\", fontsize=12)\n",
+    "axes[0].set_ylabel(\"Inertia\", fontsize=12)\n",
+    "axes[0].set_title(\"Elbow Method\", fontsize=14)\n",
+    "axes[0].axvline(x=3, color=\"#e74c3c\", linestyle=\"--\", alpha=0.7, label=\"K = 3 (elbow)\")\n",
+    "axes[0].legend(fontsize=11)\n",
+    "axes[0].grid(True, alpha=0.3)\n",
+    "\n",
+    "sil_values = [s for s in silhouettes if not np.isnan(s)]\n",
+    "sil_ks = list(range(2, 11))\n",
+    "axes[1].plot(sil_ks, sil_values, \"s-\", color=\"#27ae60\", linewidth=2, markersize=8)\n",
+    "axes[1].set_xlabel(\"Number of clusters (K)\", fontsize=12)\n",
+    "axes[1].set_ylabel(\"Mean Silhouette Score\", fontsize=12)\n",
+    "axes[1].set_title(\"Silhouette Score vs K\", fontsize=14)\n",
+    "axes[1].axvline(x=3, color=\"#e74c3c\", linestyle=\"--\", alpha=0.7, label=\"K = 3\")\n",
+    "axes[1].legend(fontsize=11)\n",
+    "axes[1].grid(True, alpha=0.3)\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(\"Silhouette scores by K:\")\n",
+    "for k, s in zip(sil_ks, sil_values):\n",
+    "    print(f\"  K={k:2d}  ->  {s:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Both plots agree: **K = 3** is the best choice for this dataset — inertia has a clear elbow and the silhouette score peaks at K = 3.\n",
+    "\n",
+    "---\n",
+    "## 7. Scikit-learn's KMeans\n",
+    "\n",
+    "In practice you'll use scikit-learn's battle-tested implementation. Let's verify our scratch version gives the same answer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.cluster import KMeans\n",
+    "\n",
+    "km_sklearn = KMeans(n_clusters=3, random_state=42, n_init=10)\n",
+    "km_sklearn.fit(X)\n",
+    "\n",
+    "print(\"=== Scikit-learn KMeans ===\")\n",
+    "print(f\"Inertia:          {km_sklearn.inertia_:.2f}\")\n",
+    "print(f\"Silhouette score: {silhouette_score(X, km_sklearn.labels_):.4f}\")\n",
+    "print(f\"Centroids:\\n{km_sklearn.cluster_centers_}\")\n",
+    "print()\n",
+    "\n",
+    "print(\"=== Our scratch KMeans ===\")\n",
+    "print(f\"Inertia:          {km_scratch.inertia_:.2f}\")\n",
+    "print(f\"Silhouette score: {silhouette_score(X, km_scratch.labels_):.4f}\")\n",
+    "print(f\"Centroids:\\n{km_scratch.centroids}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n",
+    "\n",
+    "colors_map = np.array([\"#e74c3c\", \"#2ecc71\", \"#3498db\"])\n",
+    "\n",
+    "axes[0].scatter(X[:, 0], X[:, 1], c=colors_map[km_scratch.labels_],\n",
+    "                edgecolors=\"k\", s=50, alpha=0.7)\n",
+    "axes[0].scatter(km_scratch.centroids[:, 0], km_scratch.centroids[:, 1],\n",
+    "                c=\"gold\", marker=\"X\", s=250, edgecolors=\"k\", linewidths=1.5, zorder=5)\n",
+    "axes[0].set_title(\"Our Scratch Implementation\", fontsize=14)\n",
+    "axes[0].set_xlabel(\"Feature 1\")\n",
+    "axes[0].set_ylabel(\"Feature 2\")\n",
+    "\n",
+    "axes[1].scatter(X[:, 0], X[:, 1], c=colors_map[km_sklearn.labels_],\n",
+    "                edgecolors=\"k\", s=50, alpha=0.7)\n",
+    "axes[1].scatter(km_sklearn.cluster_centers_[:, 0], km_sklearn.cluster_centers_[:, 1],\n",
+    "                c=\"gold\", marker=\"X\", s=250, edgecolors=\"k\", linewidths=1.5, zorder=5)\n",
+    "axes[1].set_title(\"Scikit-learn KMeans\", fontsize=14)\n",
+    "axes[1].set_xlabel(\"Feature 1\")\n",
+    "axes[1].set_ylabel(\"Feature 2\")\n",
+    "\n",
+    "plt.suptitle(\"Scratch vs Scikit-learn — Side by Side\", fontsize=15, y=1.01)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The cluster labels may differ in numbering (label 0 in one could be label 2 in the other), but the **groupings themselves** should be nearly identical. Scikit-learn's version often achieves slightly lower inertia because it uses the smarter **k-means++** initialization by default and runs multiple initializations (`n_init=10`).\n",
+    "\n",
+    "---\n",
+    "## 8. Practical Tips\n",
+    "\n",
+    "### Assumptions of K-Means\n",
+    "K-Means works best when clusters are:\n",
+    "- **Spherical (isotropic):** roughly the same spread in every direction.\n",
+    "- **Similar in size:** very uneven cluster sizes can pull centroids away from smaller groups.\n",
+    "- **Well-separated:** heavily overlapping clusters confuse the algorithm.\n",
+    "\n",
+    "### Feature Scaling\n",
+    "K-Means relies on Euclidean distance. If one feature has a range of 0–1 and another 0–10,000, the second feature will dominate. **Always standardize your features** (e.g., `StandardScaler`) before clustering.\n",
+    "\n",
+    "### Multiple Initializations\n",
+    "Scikit-learn's `n_init` parameter (default 10) runs K-Means 10 times with different random seeds and keeps the result with the lowest inertia. This greatly reduces the risk of a poor local minimum.\n",
+    "\n",
+    "### When K-Means Fails\n",
+    "K-Means struggles with:\n",
+    "- **Non-convex shapes** (e.g., crescent moons, concentric rings) — consider DBSCAN or spectral clustering instead.\n",
+    "- **Clusters with very different densities** — HDBSCAN handles this better.\n",
+    "- **High-dimensional data** — distances become less meaningful (curse of dimensionality); apply dimensionality reduction first.\n",
+    "\n",
+    "We'll explore some of these alternatives in later notebooks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 9. Summary\n",
+    "\n",
+    "### Key Takeaways\n",
+    "\n",
+    "1. **Unsupervised learning** discovers structure without labels. Clustering is its flagship task.\n",
+    "2. **K-Means** iterates between *assigning* points to the nearest centroid and *updating* centroids as cluster means until convergence.\n",
+    "3. **Inertia** measures within-cluster compactness; **silhouette score** balances compactness and separation.\n",
+    "4. The **elbow method** plots inertia vs K to find a natural number of clusters.\n",
+    "5. **Scikit-learn's KMeans** adds smart initialization (k-means++) and multiple restarts for robust results.\n",
+    "6. Always **scale features** before clustering, and remember that K-Means assumes spherical, similarly-sized clusters.\n",
+    "\n",
+    "### What's Next\n",
+    "In the following notebooks we will:\n",
+    "- Explore **hierarchical clustering** and dendrograms\n",
+    "- Learn **DBSCAN** for density-based clustering\n",
+    "- Apply **dimensionality reduction** (PCA, t-SNE) for visualization\n",
+    "\n",
+    "---\n",
+    "*End of Notebook 01 — Clustering Basics*"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/chapters/chapter-08-unsupervised-learning/notebooks/02_intermediate.ipynb b/chapters/chapter-08-unsupervised-learning/notebooks/02_intermediate.ipynb
new file mode 100644
index 0000000..584626b
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/notebooks/02_intermediate.ipynb
@@ -0,0 +1,721 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Chapter 8: Unsupervised Learning\n",
+    "## Notebook 02 - Intermediate: Advanced Clustering\n",
+    "\n",
+    "Beyond K-Means: hierarchical clustering, density-based methods, and Gaussian mixtures for real-world data shapes.\n",
+    "\n",
+    "**What you'll learn:**\n",
+    "- Hierarchical (agglomerative) clustering and dendrograms\n",
+    "- DBSCAN for density-based clustering\n",
+    "- Gaussian Mixture Models (GMMs)\n",
+    "- Comparing clustering algorithms on different data shapes\n",
+    "\n",
+    "**Time estimate:** 2.5 hours\n",
+    "\n",
+    "---\n",
+    "*Generated by Berta AI | Created by Luigi Pascal Rondanini*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib.cm as cm\n",
+    "from sklearn.datasets import make_blobs, make_moons, make_circles\n",
+    "from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN\n",
+    "from sklearn.mixture import GaussianMixture\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn.neighbors import NearestNeighbors\n",
+    "from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
+    "from scipy.stats import multivariate_normal\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "plt.rcParams['figure.figsize'] = (10, 6)\n",
+    "plt.rcParams['figure.dpi'] = 100\n",
+    "plt.rcParams['font.size'] = 11\n",
+    "\n",
+    "print(\"All imports loaded successfully.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 1. Hierarchical (Agglomerative) Clustering\n",
+    "\n",
+    "Hierarchical clustering builds a tree of clusters instead of requiring a fixed number of clusters up front.\n",
+    "\n",
+    "### How agglomerative clustering works\n",
+    "\n",
+    "The **agglomerative (bottom-up)** approach proceeds as follows:\n",
+    "\n",
+    "1. **Start** — treat every data point as its own single-point cluster.\n",
+    "2. **Merge** — find the two closest clusters and merge them into one.\n",
+    "3. **Repeat** — keep merging until only a single cluster remains (or until a stopping criterion is met).\n",
+    "\n",
+    "The result is a hierarchy that can be visualised as a **dendrogram** — a tree diagram showing the order and distance of each merge.\n",
+    "\n",
+    "### Linkage criteria\n",
+    "\n",
+    "\"Distance between two clusters\" can be measured in several ways:\n",
+    "\n",
+    "| Linkage | Definition | Tendency |\n",
+    "|---------|-----------|----------|\n",
+    "| **Single** | Minimum distance between any pair of points across two clusters | Produces elongated, chain-like clusters |\n",
+    "| **Complete** | Maximum distance between any pair of points across two clusters | Produces compact, roughly equal-sized clusters |\n",
+    "| **Average** | Mean distance between all pairs of points across two clusters | Compromise between single and complete |\n",
+    "| **Ward** | Minimises the total within-cluster variance at each merge | Tends to produce equally sized, spherical clusters |\n",
+    "\n",
+    "Ward linkage is the most commonly used default and works well when clusters are roughly spherical."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate synthetic data with 4 well-separated clusters\n",
+    "X_hier, y_hier = make_blobs(\n",
+    "    n_samples=200, centers=4, cluster_std=0.8, random_state=42\n",
+    ")\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "# Left panel — raw data\n",
+    "axes[0].scatter(X_hier[:, 0], X_hier[:, 1], s=30, alpha=0.7, edgecolors='k', linewidths=0.3)\n",
+    "axes[0].set_title('Raw Data (200 points, 4 clusters)')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "# Right panel — dendrogram using Ward linkage\n",
+    "Z_ward = linkage(X_hier, method='ward')\n",
+    "dendrogram(\n",
+    "    Z_ward,\n",
+    "    truncate_mode='lastp',\n",
+    "    p=30,\n",
+    "    leaf_rotation=90,\n",
+    "    leaf_font_size=8,\n",
+    "    ax=axes[1],\n",
+    "    color_threshold=12\n",
+    ")\n",
+    "axes[1].set_title('Dendrogram (Ward Linkage, truncated to 30 leaves)')\n",
+    "axes[1].set_xlabel('Cluster (size)')\n",
+    "axes[1].set_ylabel('Merge Distance')\n",
+    "axes[1].axhline(y=12, color='r', linestyle='--', label='Cut at distance = 12')\n",
+    "axes[1].legend()\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The dendrogram shows the full merge history.  By drawing a horizontal cut line we decide\n",
+    "how many clusters to keep — each vertical line that crosses the cut corresponds to one cluster.\n",
+    "\n",
+    "### Comparing linkage methods\n",
+    "\n",
+    "Let's visualise how the four linkage types partition the same dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "linkage_methods = ['single', 'complete', 'average', 'ward']\n",
+    "fig, axes = plt.subplots(1, 4, figsize=(20, 4.5))\n",
+    "\n",
+    "for ax, method in zip(axes, linkage_methods):\n",
+    "    Z = linkage(X_hier, method=method)\n",
+    "    labels = fcluster(Z, t=4, criterion='maxclust')\n",
+    "    scatter = ax.scatter(\n",
+    "        X_hier[:, 0], X_hier[:, 1],\n",
+    "        c=labels, cmap='viridis', s=30, alpha=0.7, edgecolors='k', linewidths=0.3\n",
+    "    )\n",
+    "    ax.set_title(f'{method.capitalize()} linkage')\n",
+    "    ax.set_xlabel('Feature 1')\n",
+    "    ax.set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.suptitle('Agglomerative Clustering — 4 Linkage Methods (k=4)', fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Scikit-learn's AgglomerativeClustering with Ward linkage\n",
+    "agg = AgglomerativeClustering(n_clusters=4, linkage='ward')\n",
+    "agg_labels = agg.fit_predict(X_hier)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "axes[0].scatter(\n",
+    "    X_hier[:, 0], X_hier[:, 1],\n",
+    "    c=y_hier, cmap='tab10', s=40, alpha=0.7, edgecolors='k', linewidths=0.3\n",
+    ")\n",
+    "axes[0].set_title('Ground-Truth Labels')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "axes[1].scatter(\n",
+    "    X_hier[:, 0], X_hier[:, 1],\n",
+    "    c=agg_labels, cmap='tab10', s=40, alpha=0.7, edgecolors='k', linewidths=0.3\n",
+    ")\n",
+    "axes[1].set_title('AgglomerativeClustering (Ward, k=4)')\n",
+    "axes[1].set_xlabel('Feature 1')\n",
+    "axes[1].set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(f\"Cluster sizes: {np.bincount(agg_labels)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 2. DBSCAN — Density-Based Spatial Clustering\n",
+    "\n",
+    "**DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) takes a fundamentally different\n",
+    "approach to clustering:\n",
+    "\n",
+    "- It does **not** require the number of clusters in advance.\n",
+    "- It defines clusters as **dense regions** separated by sparse regions.\n",
+    "- Points that don't belong to any dense region are labelled as **noise** (label = -1).\n",
+    "\n",
+    "### Key parameters\n",
+    "\n",
+    "| Parameter | Meaning |\n",
+    "|-----------|--------|\n",
+    "| `eps` (ε) | Maximum distance between two points for them to be considered neighbours |\n",
+    "| `min_samples` | Minimum number of points within ε-distance to form a dense region |\n",
+    "\n",
+    "### Point types\n",
+    "\n",
+    "- **Core point** — has at least `min_samples` neighbours within ε.\n",
+    "- **Border point** — within ε of a core point but doesn't have enough neighbours itself.\n",
+    "- **Noise point** — neither core nor border; isolated outliers.\n",
+    "\n",
+    "### Key advantage\n",
+    "\n",
+    "DBSCAN can discover clusters of **arbitrary shape** and naturally identifies outliers — something\n",
+    "centroid-based methods like K-Means cannot do."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate two non-convex datasets\n",
+    "X_moons, y_moons = make_moons(n_samples=500, noise=0.08, random_state=42)\n",
+    "X_circles, y_circles = make_circles(n_samples=500, noise=0.05, factor=0.5, random_state=42)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "axes[0].scatter(X_moons[:, 0], X_moons[:, 1], c=y_moons, cmap='coolwarm', s=20, alpha=0.7)\n",
+    "axes[0].set_title('Two Moons Dataset')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "axes[1].scatter(X_circles[:, 0], X_circles[:, 1], c=y_circles, cmap='coolwarm', s=20, alpha=0.7)\n",
+    "axes[1].set_title('Two Circles Dataset')\n",
+    "axes[1].set_xlabel('Feature 1')\n",
+    "axes[1].set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.suptitle('Non-Convex Datasets — Ground Truth', fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Apply DBSCAN to both datasets\n",
+    "db_moons = DBSCAN(eps=0.2, min_samples=5).fit(X_moons)\n",
+    "db_circles = DBSCAN(eps=0.15, min_samples=5).fit(X_circles)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "colors_moons = db_moons.labels_\n",
+    "colors_circles = db_circles.labels_\n",
+    "\n",
+    "axes[0].scatter(\n",
+    "    X_moons[:, 0], X_moons[:, 1],\n",
+    "    c=colors_moons, cmap='viridis', s=20, alpha=0.7\n",
+    ")\n",
+    "n_noise_moons = (db_moons.labels_ == -1).sum()\n",
+    "axes[0].set_title(f'DBSCAN on Moons — {len(set(colors_moons)) - (1 if -1 in colors_moons else 0)} clusters, {n_noise_moons} noise')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "axes[1].scatter(\n",
+    "    X_circles[:, 0], X_circles[:, 1],\n",
+    "    c=colors_circles, cmap='viridis', s=20, alpha=0.7\n",
+    ")\n",
+    "n_noise_circles = (db_circles.labels_ == -1).sum()\n",
+    "axes[1].set_title(f'DBSCAN on Circles — {len(set(colors_circles)) - (1 if -1 in colors_circles else 0)} clusters, {n_noise_circles} noise')\n",
+    "axes[1].set_xlabel('Feature 1')\n",
+    "axes[1].set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.suptitle('DBSCAN Handles Non-Convex Shapes', fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# K-Means vs DBSCAN on the moons dataset\n",
+    "km_moons = KMeans(n_clusters=2, random_state=42, n_init=10).fit(X_moons)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
+    "\n",
+    "axes[0].scatter(X_moons[:, 0], X_moons[:, 1], c=y_moons, cmap='coolwarm', s=20, alpha=0.7)\n",
+    "axes[0].set_title('Ground Truth')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c=km_moons.labels_, cmap='coolwarm', s=20, alpha=0.7)\n",
+    "axes[1].scatter(km_moons.cluster_centers_[:, 0], km_moons.cluster_centers_[:, 1],\n",
+    "                marker='X', s=200, c='black', edgecolors='white', linewidths=1.5)\n",
+    "axes[1].set_title('K-Means (k=2) — Fails on non-convex shapes')\n",
+    "axes[1].set_xlabel('Feature 1')\n",
+    "axes[1].set_ylabel('Feature 2')\n",
+    "\n",
+    "axes[2].scatter(X_moons[:, 0], X_moons[:, 1], c=db_moons.labels_, cmap='coolwarm', s=20, alpha=0.7)\n",
+    "axes[2].set_title('DBSCAN (eps=0.2) — Correctly separates crescents')\n",
+    "axes[2].set_xlabel('Feature 1')\n",
+    "axes[2].set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.suptitle('K-Means vs DBSCAN on the Moons Dataset', fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 3. Choosing DBSCAN Parameters\n",
+    "\n",
+    "Picking `eps` and `min_samples` can be tricky.  A practical heuristic:\n",
+    "\n",
+    "1. Set `min_samples` ≈ 2 × number of features (a reasonable default).\n",
+    "2. For each point compute the distance to its **k-th nearest neighbour** (k = `min_samples`).\n",
+    "3. Sort these distances and plot them — the **k-distance graph**.\n",
+    "4. Look for the \"elbow\" — the point where the curve bends sharply upward.  The distance at that elbow is a good candidate for `eps`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# k-distance graph for the moons dataset\n",
+    "k = 5  # same as min_samples\n",
+    "nn = NearestNeighbors(n_neighbors=k)\n",
+    "nn.fit(X_moons)\n",
+    "distances, _ = nn.kneighbors(X_moons)\n",
+    "\n",
+    "k_distances = np.sort(distances[:, k - 1])[::-1]\n",
+    "\n",
+    "plt.figure(figsize=(10, 5))\n",
+    "plt.plot(k_distances, linewidth=1.5)\n",
+    "plt.axhline(y=0.2, color='r', linestyle='--', label='eps = 0.2 (our choice)')\n",
+    "plt.title(f'k-Distance Graph (k={k}) — Elbow Indicates Good eps')\n",
+    "plt.xlabel('Points (sorted by descending k-distance)')\n",
+    "plt.ylabel(f'Distance to {k}-th Nearest Neighbour')\n",
+    "plt.legend()\n",
+    "plt.grid(True, alpha=0.3)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Effect of different eps values on DBSCAN results\n",
+    "eps_values = [0.05, 0.1, 0.2, 0.3, 0.5]\n",
+    "fig, axes = plt.subplots(1, len(eps_values), figsize=(22, 4))\n",
+    "\n",
+    "for ax, eps in zip(axes, eps_values):\n",
+    "    db = DBSCAN(eps=eps, min_samples=5).fit(X_moons)\n",
+    "    labels = db.labels_\n",
+    "    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)\n",
+    "    n_noise = (labels == -1).sum()\n",
+    "\n",
+    "    unique_labels = set(labels)\n",
+    "    colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]\n",
+    "\n",
+    "    for k_label, col in zip(sorted(unique_labels), colors):\n",
+    "        if k_label == -1:\n",
+    "            col = [0, 0, 0, 1]  # black for noise\n",
+    "        mask = labels == k_label\n",
+    "        ax.scatter(X_moons[mask, 0], X_moons[mask, 1], c=[col], s=15, alpha=0.7)\n",
+    "\n",
+    "    ax.set_title(f'eps={eps}\\n{n_clusters} clusters, {n_noise} noise')\n",
+    "    ax.set_xlabel('Feature 1')\n",
+    "\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "plt.suptitle('Effect of eps on DBSCAN (min_samples=5)', fontsize=14, y=1.05)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Observations:**\n",
+    "- **eps too small** (0.05) → most points classified as noise; many tiny clusters.\n",
+    "- **eps just right** (0.2) → two clean crescent clusters with very little noise.\n",
+    "- **eps too large** (0.5) → everything merges into a single cluster.\n",
+    "\n",
+    "The k-distance graph helps you find that sweet spot without trial and error."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 4. Gaussian Mixture Models (GMMs)\n",
+    "\n",
+    "A **Gaussian Mixture Model** assumes that the data is generated from a mixture of a finite number\n",
+    "of Gaussian (normal) distributions with unknown parameters.\n",
+    "\n",
+    "### GMM vs K-Means\n",
+    "\n",
+    "| Aspect | K-Means | GMM |\n",
+    "|--------|---------|-----|\n",
+    "| Cluster assignment | **Hard** — each point belongs to exactly one cluster | **Soft** — each point has a probability for every cluster |\n",
+    "| Cluster shape | Spherical (Voronoi cells) | Elliptical (full covariance matrices) |\n",
+    "| Outlier handling | None — every point is assigned | Naturally down-weights low-probability points |\n",
+    "| Output | Cluster label | Probability vector over all clusters |\n",
+    "\n",
+    "GMMs are fit using the **Expectation-Maximisation (EM)** algorithm:\n",
+    "1. **E-step** — compute the probability that each point belongs to each Gaussian component.\n",
+    "2. **M-step** — update each component's mean, covariance, and weight to maximise log-likelihood.\n",
+    "3. Repeat until convergence."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create elongated / elliptical clusters that challenge K-Means\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "n_per_cluster = 200\n",
+    "cov1 = [[2.0, 1.5], [1.5, 1.5]]\n",
+    "cov2 = [[1.5, -1.2], [-1.2, 1.5]]\n",
+    "cov3 = [[0.5, 0.0], [0.0, 2.5]]\n",
+    "\n",
+    "cluster1 = np.random.multivariate_normal([0, 0], cov1, n_per_cluster)\n",
+    "cluster2 = np.random.multivariate_normal([5, 5], cov2, n_per_cluster)\n",
+    "cluster3 = np.random.multivariate_normal([8, 0], cov3, n_per_cluster)\n",
+    "\n",
+    "X_gmm = np.vstack([cluster1, cluster2, cluster3])\n",
+    "y_gmm_true = np.array([0]*n_per_cluster + [1]*n_per_cluster + [2]*n_per_cluster)\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
+    "\n",
+    "# Ground truth\n",
+    "axes[0].scatter(X_gmm[:, 0], X_gmm[:, 1], c=y_gmm_true, cmap='tab10', s=15, alpha=0.6)\n",
+    "axes[0].set_title('Ground Truth (Elliptical Clusters)')\n",
+    "axes[0].set_xlabel('Feature 1')\n",
+    "axes[0].set_ylabel('Feature 2')\n",
+    "\n",
+    "# K-Means\n",
+    "km_gmm = KMeans(n_clusters=3, random_state=42, n_init=10).fit(X_gmm)\n",
+    "axes[1].scatter(X_gmm[:, 0], X_gmm[:, 1], c=km_gmm.labels_, cmap='tab10', s=15, alpha=0.6)\n",
+    "axes[1].scatter(km_gmm.cluster_centers_[:, 0], km_gmm.cluster_centers_[:, 1],\n",
+    "                marker='X', s=200, c='black', edgecolors='white', linewidths=1.5)\n",
+    "axes[1].set_title('K-Means (k=3) — Spherical assumption')\n",
+    "axes[1].set_xlabel('Feature 1')\n",
+    "axes[1].set_ylabel('Feature 2')\n",
+    "\n",
+    "# GMM\n",
+    "gmm = GaussianMixture(n_components=3, covariance_type='full', random_state=42)\n",
+    "gmm.fit(X_gmm)\n",
+    "gmm_labels = gmm.predict(X_gmm)\n",
+    "axes[2].scatter(X_gmm[:, 0], X_gmm[:, 1], c=gmm_labels, cmap='tab10', s=15, alpha=0.6)\n",
+    "axes[2].set_title('GMM (3 components) — Elliptical fit')\n",
+    "axes[2].set_xlabel('Feature 1')\n",
+    "axes[2].set_ylabel('Feature 2')\n",
+    "\n",
+    "plt.suptitle('K-Means vs GMM on Elliptical Clusters', fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualise GMM probability contours\n",
+    "x_min, x_max = X_gmm[:, 0].min() - 2, X_gmm[:, 0].max() + 2\n",
+    "y_min, y_max = X_gmm[:, 1].min() - 2, X_gmm[:, 1].max() + 2\n",
+    "xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300), np.linspace(y_min, y_max, 300))\n",
+    "grid_points = np.column_stack([xx.ravel(), yy.ravel()])\n",
+    "\n",
+    "log_prob = gmm.score_samples(grid_points)\n",
+    "log_prob = log_prob.reshape(xx.shape)\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 7))\n",
+    "ax.contourf(xx, yy, np.exp(log_prob), levels=30, cmap='YlOrRd', alpha=0.6)\n",
+    "ax.contour(xx, yy, np.exp(log_prob), levels=10, colors='darkred', linewidths=0.5, alpha=0.5)\n",
+    "ax.scatter(X_gmm[:, 0], X_gmm[:, 1], c=gmm_labels, cmap='tab10', s=10, alpha=0.7,\n",
+    "           edgecolors='k', linewidths=0.2)\n",
+    "\n",
+    "for i in range(gmm.n_components):\n",
+    "    ax.scatter(gmm.means_[i, 0], gmm.means_[i, 1],\n",
+    "               marker='+', s=300, c='black', linewidths=3)\n",
+    "\n",
+    "ax.set_title('GMM Probability Density Contours')\n",
+    "ax.set_xlabel('Feature 1')\n",
+    "ax.set_ylabel('Feature 2')\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Soft cluster probabilities — the key advantage of GMM\n",
+    "probs = gmm.predict_proba(X_gmm)\n",
+    "\n",
+    "print(\"Cluster membership probabilities for the first 10 points:\")\n",
+    "print(f\"{'Point':>5}  {'P(C0)':>8}  {'P(C1)':>8}  {'P(C2)':>8}  {'Assigned':>8}\")\n",
+    "print(\"-\" * 48)\n",
+    "for i in range(10):\n",
+    "    print(f\"{i:5d}  {probs[i, 0]:8.4f}  {probs[i, 1]:8.4f}  {probs[i, 2]:8.4f}  {gmm_labels[i]:8d}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Model selection with BIC and AIC\n",
+    "\n",
+    "How many Gaussian components should we use?  We can use information criteria:\n",
+    "\n",
+    "- **BIC** (Bayesian Information Criterion) — penalises model complexity more heavily.\n",
+    "- **AIC** (Akaike Information Criterion) — lighter penalty.\n",
+    "\n",
+    "**Lower is better** for both.  We fit GMMs with different numbers of components and pick the one with the lowest BIC (or AIC)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_components_range = range(1, 10)\n",
+    "bic_scores = []\n",
+    "aic_scores = []\n",
+    "\n",
+    "for n in n_components_range:\n",
+    "    gmm_test = GaussianMixture(n_components=n, covariance_type='full', random_state=42)\n",
+    "    gmm_test.fit(X_gmm)\n",
+    "    bic_scores.append(gmm_test.bic(X_gmm))\n",
+    "    aic_scores.append(gmm_test.aic(X_gmm))\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "ax.plot(list(n_components_range), bic_scores, 'bo-', label='BIC', linewidth=2)\n",
+    "ax.plot(list(n_components_range), aic_scores, 'rs--', label='AIC', linewidth=2)\n",
+    "ax.axvline(x=3, color='green', linestyle=':', alpha=0.7, label='True number of components (3)')\n",
+    "ax.set_xlabel('Number of Components')\n",
+    "ax.set_ylabel('Score (lower is better)')\n",
+    "ax.set_title('GMM Model Selection: BIC and AIC')\n",
+    "ax.legend()\n",
+    "ax.grid(True, alpha=0.3)\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(f\"Best BIC at n_components = {np.argmin(bic_scores) + 1}\")\n",
+    "print(f\"Best AIC at n_components = {np.argmin(aic_scores) + 1}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 5. Algorithm Comparison on Multiple Datasets\n",
+    "\n",
+    "Let's put all four algorithms head-to-head on three different data geometries:\n",
+    "\n",
+    "1. **Blobs** — well-separated spherical clusters\n",
+    "2. **Moons** — two interleaving crescents\n",
+    "3. **Varied-variance blobs** — spherical clusters with very different densities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np.random.seed(42)\n",
+    "\n",
+    "n_samples = 500\n",
+    "\n",
+    "# Dataset 1: standard blobs\n",
+    "X_blobs, y_blobs = make_blobs(n_samples=n_samples, centers=3, cluster_std=1.0, random_state=42)\n",
+    "\n",
+    "# Dataset 2: moons\n",
+    "X_moons2, y_moons2 = make_moons(n_samples=n_samples, noise=0.07, random_state=42)\n",
+    "\n",
+    "# Dataset 3: varied-variance blobs\n",
+    "X_varied, y_varied = make_blobs(\n",
+    "    n_samples=n_samples, centers=3, cluster_std=[0.5, 2.5, 1.0], random_state=42\n",
+    ")\n",
+    "\n",
+    "datasets = [\n",
+    "    ('Blobs', X_blobs, {'n_clusters': 3, 'eps': 1.0}),\n",
+    "    ('Moons', X_moons2, {'n_clusters': 2, 'eps': 0.2}),\n",
+    "    ('Varied', X_varied, {'n_clusters': 3, 'eps': 1.5}),\n",
+    "]\n",
+    "\n",
+    "fig, axes = plt.subplots(3, 4, figsize=(22, 15))\n",
+    "\n",
+    "for row, (name, X, params) in enumerate(datasets):\n",
+    "    X_scaled = StandardScaler().fit_transform(X)\n",
+    "    n_c = params['n_clusters']\n",
+    "    eps = params['eps']\n",
+    "\n",
+    "    # K-Means\n",
+    "    km = KMeans(n_clusters=n_c, random_state=42, n_init=10).fit(X_scaled)\n",
+    "    # Agglomerative\n",
+    "    agg = AgglomerativeClustering(n_clusters=n_c, linkage='ward').fit(X_scaled)\n",
+    "    # DBSCAN\n",
+    "    db = DBSCAN(eps=eps, min_samples=5).fit(X_scaled)\n",
+    "    # GMM\n",
+    "    gm = GaussianMixture(n_components=n_c, random_state=42).fit(X_scaled)\n",
+    "\n",
+    "    results = [\n",
+    "        ('K-Means', km.labels_),\n",
+    "        ('Agglomerative', agg.labels_),\n",
+    "        ('DBSCAN', db.labels_),\n",
+    "        ('GMM', gm.predict(X_scaled)),\n",
+    "    ]\n",
+    "\n",
+    "    for col, (algo_name, labels) in enumerate(results):\n",
+    "        ax = axes[row, col]\n",
+    "        unique_labels = set(labels)\n",
+    "        n_clust = len(unique_labels) - (1 if -1 in unique_labels else 0)\n",
+    "\n",
+    "        noise_mask = labels == -1\n",
+    "        ax.scatter(X_scaled[~noise_mask, 0], X_scaled[~noise_mask, 1],\n",
+    "                   c=labels[~noise_mask], cmap='viridis', s=12, alpha=0.7)\n",
+    "        if noise_mask.any():\n",
+    "            ax.scatter(X_scaled[noise_mask, 0], X_scaled[noise_mask, 1],\n",
+    "                       c='red', marker='x', s=15, alpha=0.5, label='noise')\n",
+    "            ax.legend(fontsize=8)\n",
+    "\n",
+    "        if row == 0:\n",
+    "            ax.set_title(algo_name, fontsize=13, fontweight='bold')\n",
+    "        ax.set_ylabel(f'{name}' if col == 0 else '', fontsize=12)\n",
+    "        ax.text(0.02, 0.98, f'{n_clust} cluster(s)',\n",
+    "                transform=ax.transAxes, fontsize=9, va='top',\n",
+    "                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))\n",
+    "\n",
+    "plt.suptitle('Algorithm Comparison Across Data Geometries', fontsize=16, y=1.01)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 6. Summary — When to Use Each Algorithm\n",
+    "\n",
+    "### Quick reference\n",
+    "\n",
+    "| Algorithm | Best for | Weaknesses | Must specify k? |\n",
+    "|-----------|---------|------------|------------------|\n",
+    "| **K-Means** | Large datasets with spherical clusters | Cannot handle non-convex shapes; sensitive to outliers | Yes |\n",
+    "| **Agglomerative Clustering** | Small-to-medium datasets; exploring hierarchy | O(n³) time complexity; hard to scale | Yes (or cut dendrogram) |\n",
+    "| **DBSCAN** | Arbitrary shapes; datasets with noise/outliers | Sensitive to `eps`; struggles with varying densities | No |\n",
+    "| **Gaussian Mixture Model** | Elliptical clusters; need soft assignments | Assumes Gaussian components; sensitive to initialisation | Yes |\n",
+    "\n",
+    "### Rules of thumb\n",
+    "\n",
+    "1. **Start simple:** try K-Means first.  If results look poor, consider the data geometry.\n",
+    "2. **Non-convex shapes?** → Use DBSCAN.\n",
+    "3. **Elliptical or overlapping clusters?** → Use GMM.\n",
+    "4. **Need a hierarchy or dendrogram?** → Use Agglomerative Clustering.\n",
+    "5. **Noisy data with outliers?** → DBSCAN naturally handles noise.\n",
+    "6. **Need probability estimates?** → GMM provides soft assignments.\n",
+    "\n",
+    "### What's next\n",
+    "\n",
+    "In the **advanced notebook** (Notebook 03) we will explore:\n",
+    "- Dimensionality reduction (PCA, t-SNE, UMAP)\n",
+    "- Clustering evaluation metrics (Silhouette, Adjusted Rand Index)\n",
+    "- Pipelines combining reduction + clustering on real-world datasets\n",
+    "\n",
+    "---\n",
+    "*Generated by Berta AI | Created by Luigi Pascal Rondanini*"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/chapters/chapter-08-unsupervised-learning/notebooks/03_advanced.ipynb b/chapters/chapter-08-unsupervised-learning/notebooks/03_advanced.ipynb
new file mode 100644
index 0000000..d73ba76
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/notebooks/03_advanced.ipynb
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Chapter 8: Unsupervised Learning\n",
+    "## Notebook 03 - Advanced: Dimensionality Reduction & Capstone\n",
+    "\n",
+    "Reduce high-dimensional data for visualization and modeling, detect anomalies, and build a complete customer segmentation system.\n",
+    "\n",
+    "**What you'll learn:**\n",
+    "- Principal Component Analysis (PCA) from scratch\n",
+    "- t-SNE for 2D visualization\n",
+    "- Anomaly detection with Isolation Forest\n",
+    "- Customer segmentation capstone project\n",
+    "\n",
+    "**Time estimate:** 3 hours\n",
+    "\n",
+    "---\n",
+    "*Generated by Berta AI | Created by Luigi Pascal Rondanini*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 1. Principal Component Analysis (PCA) — Theory\n",
+    "\n",
+    "### The Core Idea\n",
+    "\n",
+    "PCA is a **linear** dimensionality-reduction technique that finds the directions\n",
+    "(called **principal components**) along which the data varies the most.\n",
+    "\n",
+    "Imagine a cloud of 3-D points that is shaped like a flat pancake. Two axes\n",
+    "capture almost all of the spread; the third adds very little information. PCA\n",
+    "discovers those two dominant axes automatically.\n",
+    "\n",
+    "### Algorithm Steps\n",
+    "\n",
+    "1. **Center the data** — subtract the mean of each feature so that the cloud is\n",
+    "   centered at the origin.\n",
+    "2. **Compute the covariance matrix** — a $d \\times d$ matrix (where $d$ is the\n",
+    "   number of features) that captures pairwise linear relationships.\n",
+    "3. **Eigendecomposition** — find the eigenvectors and eigenvalues of the\n",
+    "   covariance matrix. Each eigenvector is a principal component direction;\n",
+    "   its eigenvalue tells us how much variance that direction explains.\n",
+    "4. **Sort & select** — rank components by eigenvalue (descending) and keep the\n",
+    "   top $k$ to reduce dimensionality from $d$ to $k$.\n",
+    "5. **Project** — multiply the centered data by the selected eigenvectors to\n",
+    "   obtain the lower-dimensional representation.\n",
+    "\n",
+    "### Variance Explained Ratio\n",
+    "\n",
+    "$$\\text{variance explained ratio}_i = \\frac{\\lambda_i}{\\sum_{j=1}^{d} \\lambda_j}$$\n",
+    "\n",
+    "where $\\lambda_i$ is the $i$-th eigenvalue. The **cumulative** variance explained\n",
+    "tells us how much total information is retained when we keep the first $k$\n",
+    "components."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 2. PCA From Scratch\n",
+    "\n",
+    "We will implement PCA using only NumPy and apply it to the classic **Iris**\n",
+    "dataset (4 features → 2 components)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.datasets import load_iris\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "# Load the Iris dataset (4 features, 150 samples, 3 classes)\n",
+    "iris = load_iris()\n",
+    "X = iris.data          # shape (150, 4)\n",
+    "y = iris.target        # 0, 1, 2\n",
+    "feature_names = iris.feature_names\n",
+    "target_names = iris.target_names\n",
+    "\n",
+    "print(f\"Dataset shape: {X.shape}\")\n",
+    "print(f\"Features: {feature_names}\")\n",
+    "print(f\"Classes:  {list(target_names)}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def pca_from_scratch(X, n_components=2):\n",
+    "    \"\"\"Implement PCA using NumPy.\"\"\"\n",
+    "    # Step 1: Center the data\n",
+    "    mean = np.mean(X, axis=0)\n",
+    "    X_centered = X - mean\n",
+    "\n",
+    "    # Step 2: Covariance matrix (features × features)\n",
+    "    cov_matrix = np.cov(X_centered, rowvar=False)\n",
+    "\n",
+    "    # Step 3: Eigendecomposition\n",
+    "    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)\n",
+    "\n",
+    "    # Step 4: Sort by eigenvalue descending\n",
+    "    sorted_idx = np.argsort(eigenvalues)[::-1]\n",
+    "    eigenvalues = eigenvalues[sorted_idx]\n",
+    "    eigenvectors = eigenvectors[:, sorted_idx]\n",
+    "\n",
+    "    # Variance explained ratio\n",
+    "    variance_ratio = eigenvalues / eigenvalues.sum()\n",
+    "\n",
+    "    # Step 5: Project onto top-k components\n",
+    "    W = eigenvectors[:, :n_components]\n",
+    "    X_projected = X_centered @ W\n",
+    "\n",
+    "    return X_projected, eigenvalues, variance_ratio, W\n",
+    "\n",
+    "\n",
+    "X_pca_scratch, eigenvalues, var_ratio, components = pca_from_scratch(X, n_components=2)\n",
+    "\n",
+    "print(\"Eigenvalues:\", np.round(eigenvalues, 4))\n",
+    "print(\"Variance explained ratio:\", np.round(var_ratio, 4))\n",
+    "print(f\"Total variance retained (2 components): {var_ratio[:2].sum():.2%}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- Variance Explained Bar + Cumulative Line ---\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n",
+    "\n",
+    "# Left: bar chart of individual variance ratios\n",
+    "axes[0].bar(range(1, len(var_ratio) + 1), var_ratio, color=\"steelblue\", edgecolor=\"black\")\n",
+    "axes[0].set_xlabel(\"Principal Component\")\n",
+    "axes[0].set_ylabel(\"Variance Explained Ratio\")\n",
+    "axes[0].set_title(\"Variance Explained by Each Component\")\n",
+    "axes[0].set_xticks(range(1, len(var_ratio) + 1))\n",
+    "\n",
+    "# Right: cumulative variance explained\n",
+    "cumulative = np.cumsum(var_ratio)\n",
+    "axes[1].plot(range(1, len(cumulative) + 1), cumulative, \"o-\", color=\"darkorange\", linewidth=2)\n",
+    "axes[1].axhline(y=0.95, color=\"red\", linestyle=\"--\", label=\"95% threshold\")\n",
+    "axes[1].set_xlabel(\"Number of Components\")\n",
+    "axes[1].set_ylabel(\"Cumulative Variance Explained\")\n",
+    "axes[1].set_title(\"Cumulative Variance Explained\")\n",
+    "axes[1].set_xticks(range(1, len(cumulative) + 1))\n",
+    "axes[1].legend()\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- 2-D scatter plot of the scratch PCA projection ---\n",
+    "colors = [\"#1f77b4\", \"#ff7f0e\", \"#2ca02c\"]\n",
+    "\n",
+    "plt.figure(figsize=(8, 6))\n",
+    "for i, name in enumerate(target_names):\n",
+    "    mask = y == i\n",
+    "    plt.scatter(X_pca_scratch[mask, 0], X_pca_scratch[mask, 1],\n",
+    "                label=name, alpha=0.7, edgecolors=\"k\", linewidth=0.5,\n",
+    "                color=colors[i], s=60)\n",
+    "plt.xlabel(f\"PC 1 ({var_ratio[0]:.1%} variance)\")\n",
+    "plt.ylabel(f\"PC 2 ({var_ratio[1]:.1%} variance)\")\n",
+    "plt.title(\"PCA From Scratch — Iris Dataset (2-D Projection)\")\n",
+    "plt.legend()\n",
+    "plt.grid(alpha=0.3)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 3. PCA with Scikit-learn\n",
+    "\n",
+    "Now let's verify our scratch implementation against the well-optimized\n",
+    "`sklearn.decomposition.PCA`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.decomposition import PCA\n",
+    "\n",
+    "pca_sk = PCA(n_components=4)  # keep all 4 to inspect variance\n",
+    "X_pca_sk_full = pca_sk.fit_transform(X)\n",
+    "\n",
+    "print(\"Sklearn variance explained ratio:\", np.round(pca_sk.explained_variance_ratio_, 4))\n",
+    "print(\"Scratch variance explained ratio: \", np.round(var_ratio, 4))\n",
+    "print()\n",
+    "print(\"Cumulative (sklearn):\", np.round(np.cumsum(pca_sk.explained_variance_ratio_), 4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_pca_sk = X_pca_sk_full[:, :2]  # first 2 components\n",
+    "\n",
+    "# Sign of eigenvectors can flip — align for visual comparison\n",
+    "for col in range(2):\n",
+    "    if np.corrcoef(X_pca_scratch[:, col], X_pca_sk[:, col])[0, 1] < 0:\n",
+    "        X_pca_scratch[:, col] *= -1\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharex=True, sharey=True)\n",
+    "\n",
+    "for ax, data, title in zip(axes,\n",
+    "                            [X_pca_scratch, X_pca_sk],\n",
+    "                            [\"PCA (from scratch)\", \"PCA (scikit-learn)\"]):\n",
+    "    for i, name in enumerate(target_names):\n",
+    "        mask = y == i\n",
+    "        ax.scatter(data[mask, 0], data[mask, 1], label=name,\n",
+    "                   alpha=0.7, edgecolors=\"k\", linewidth=0.5,\n",
+    "                   color=colors[i], s=60)\n",
+    "    ax.set_xlabel(\"PC 1\")\n",
+    "    ax.set_ylabel(\"PC 2\")\n",
+    "    ax.set_title(title)\n",
+    "    ax.legend()\n",
+    "    ax.grid(alpha=0.3)\n",
+    "\n",
+    "plt.suptitle(\"Scratch vs Scikit-learn PCA — Identical Results\", fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The two plots are virtually identical (eigenvector signs may differ, which is\n",
+    "cosmetic). This confirms our from-scratch implementation is correct."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 4. t-SNE — Non-linear Visualization\n",
+    "\n",
+    "### What is t-SNE?\n",
+    "\n",
+    "**t-distributed Stochastic Neighbor Embedding (t-SNE)** is a non-linear\n",
+    "dimensionality-reduction technique designed specifically for **visualization**.\n",
+    "\n",
+    "Key properties:\n",
+    "- Preserves **local structure**: points that are close in high-dimensional space\n",
+    "  stay close in the 2-D embedding.\n",
+    "- Does **not** preserve global distances — clusters may move relative to each\n",
+    "  other between runs.\n",
+    "- Computationally expensive — not suitable as a preprocessing step in machine-\n",
+    "  learning pipelines.\n",
+    "- The **perplexity** parameter (roughly: how many neighbors each point\n",
+    "  considers) strongly influences the result. Typical range: 5–50.\n",
+    "\n",
+    "> **Rule of thumb:** Use PCA when you need a general-purpose reduction (for\n",
+    "> modeling, compression, noise removal). Use t-SNE when your sole goal is to\n",
+    "> *see* cluster structure in 2-D."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.manifold import TSNE\n",
+    "\n",
+    "tsne = TSNE(n_components=2, perplexity=30, random_state=42, n_iter=1000)\n",
+    "X_tsne = tsne.fit_transform(X)\n",
+    "\n",
+    "print(f\"t-SNE output shape: {X_tsne.shape}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- Side-by-side: PCA vs t-SNE ---\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "for ax, data, title in zip(axes,\n",
+    "                            [X_pca_sk, X_tsne],\n",
+    "                            [\"PCA (linear)\", \"t-SNE (non-linear)\"]):\n",
+    "    for i, name in enumerate(target_names):\n",
+    "        mask = y == i\n",
+    "        ax.scatter(data[mask, 0], data[mask, 1], label=name,\n",
+    "                   alpha=0.7, edgecolors=\"k\", linewidth=0.5,\n",
+    "                   color=colors[i], s=60)\n",
+    "    ax.set_xlabel(\"Dim 1\")\n",
+    "    ax.set_ylabel(\"Dim 2\")\n",
+    "    ax.set_title(title)\n",
+    "    ax.legend()\n",
+    "    ax.grid(alpha=0.3)\n",
+    "\n",
+    "plt.suptitle(\"PCA vs t-SNE — Iris Dataset\", fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- Effect of perplexity on t-SNE ---\n",
+    "perplexities = [5, 15, 30, 50]\n",
+    "fig, axes = plt.subplots(1, 4, figsize=(20, 4))\n",
+    "\n",
+    "for ax, perp in zip(axes, perplexities):\n",
+    "    embedding = TSNE(n_components=2, perplexity=perp,\n",
+    "                     random_state=42, n_iter=1000).fit_transform(X)\n",
+    "    for i, name in enumerate(target_names):\n",
+    "        mask = y == i\n",
+    "        ax.scatter(embedding[mask, 0], embedding[mask, 1],\n",
+    "                   alpha=0.7, color=colors[i], s=40, edgecolors=\"k\",\n",
+    "                   linewidth=0.3, label=name)\n",
+    "    ax.set_title(f\"Perplexity = {perp}\")\n",
+    "    ax.set_xticks([])\n",
+    "    ax.set_yticks([])\n",
+    "\n",
+    "axes[0].legend(fontsize=8)\n",
+    "plt.suptitle(\"t-SNE: Impact of Perplexity\", fontsize=14, y=1.04)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Observations on perplexity:**\n",
+    "- Low perplexity (5): focuses on very local neighbors — clusters may fragment.\n",
+    "- High perplexity (50): considers more neighbors — clusters become rounder and\n",
+    "  more global structure is visible, but fine local detail may blur.\n",
+    "- There is no single \"correct\" perplexity; try several and look for consistent\n",
+    "  patterns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 5. Anomaly Detection\n",
+    "\n",
+    "### Why Unsupervised Anomaly Detection?\n",
+    "\n",
+    "In many real-world scenarios, labeled anomalies are scarce or non-existent:\n",
+    "\n",
+    "| Domain | Normal | Anomaly |\n",
+    "|--------|--------|--------|\n",
+    "| Banking | Legitimate transactions | Fraud |\n",
+    "| Manufacturing | Good products | Defects |\n",
+    "| Cybersecurity | Regular traffic | Intrusions |\n",
+    "\n",
+    "Unsupervised methods learn the distribution of *normal* data and flag anything\n",
+    "that doesn't fit.\n",
+    "\n",
+    "### Approach 1 — Z-Score\n",
+    "\n",
+    "Flag a point as anomalous if any feature has a Z-score $|z| > \\tau$ (e.g.,\n",
+    "$\\tau = 3$). Simple, but assumes Gaussian features and works only for\n",
+    "univariate or low-dimensional data.\n",
+    "\n",
+    "### Approach 2 — Isolation Forest\n",
+    "\n",
+    "The **Isolation Forest** algorithm isolates observations by randomly selecting\n",
+    "a feature and a split value. Anomalies are easier to isolate (fewer splits\n",
+    "needed), so they have shorter average path lengths in the trees.\n",
+    "\n",
+    "Advantages:\n",
+    "- Works well in high dimensions\n",
+    "- No distribution assumptions\n",
+    "- Linear time complexity"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.ensemble import IsolationForest\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "# Generate normal data: 2 clusters\n",
+    "normal_a = np.random.randn(150, 2) * 0.8 + np.array([2, 2])\n",
+    "normal_b = np.random.randn(150, 2) * 0.8 + np.array([-2, -2])\n",
+    "normal_data = np.vstack([normal_a, normal_b])\n",
+    "\n",
+    "# Inject 20 anomalies scattered far from the clusters\n",
+    "anomalies = np.random.uniform(low=-6, high=6, size=(20, 2))\n",
+    "\n",
+    "X_anom = np.vstack([normal_data, anomalies])\n",
+    "labels_true = np.array([0] * len(normal_data) + [1] * len(anomalies))  # 0=normal, 1=anomaly\n",
+    "\n",
+    "print(f\"Total points: {len(X_anom)}  (normal: {len(normal_data)}, anomalies: {len(anomalies)})\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- Z-Score method ---\n",
+    "from scipy import stats\n",
+    "\n",
+    "z_scores = np.abs(stats.zscore(X_anom))\n",
+    "z_threshold = 3.0\n",
+    "z_anomaly_mask = (z_scores > z_threshold).any(axis=1)\n",
+    "\n",
+    "print(f\"Z-Score method detected {z_anomaly_mask.sum()} anomalies (threshold={z_threshold})\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# --- Isolation Forest ---\n",
+    "iso_forest = IsolationForest(n_estimators=200, contamination=0.06,\n",
+    "                              random_state=42)\n",
+    "iso_preds = iso_forest.fit_predict(X_anom)  # 1 = normal, -1 = anomaly\n",
+    "iso_anomaly_mask = iso_preds == -1\n",
+    "\n",
+    "print(f\"Isolation Forest detected {iso_anomaly_mask.sum()} anomalies\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
+    "\n",
+    "# Ground truth\n",
+    "axes[0].scatter(X_anom[labels_true == 0, 0], X_anom[labels_true == 0, 1],\n",
+    "                c=\"steelblue\", s=30, alpha=0.6, label=\"Normal\")\n",
+    "axes[0].scatter(X_anom[labels_true == 1, 0], X_anom[labels_true == 1, 1],\n",
+    "                c=\"red\", s=80, marker=\"X\", label=\"True Anomaly\")\n",
+    "axes[0].set_title(\"Ground Truth\")\n",
+    "axes[0].legend()\n",
+    "axes[0].grid(alpha=0.3)\n",
+    "\n",
+    "# Z-Score\n",
+    "axes[1].scatter(X_anom[~z_anomaly_mask, 0], X_anom[~z_anomaly_mask, 1],\n",
+    "                c=\"steelblue\", s=30, alpha=0.6, label=\"Normal\")\n",
+    "axes[1].scatter(X_anom[z_anomaly_mask, 0], X_anom[z_anomaly_mask, 1],\n",
+    "                c=\"red\", s=80, marker=\"X\", label=\"Detected Anomaly\")\n",
+    "axes[1].set_title(f\"Z-Score (threshold={z_threshold})\")\n",
+    "axes[1].legend()\n",
+    "axes[1].grid(alpha=0.3)\n",
+    "\n",
+    "# Isolation Forest\n",
+    "axes[2].scatter(X_anom[~iso_anomaly_mask, 0], X_anom[~iso_anomaly_mask, 1],\n",
+    "                c=\"steelblue\", s=30, alpha=0.6, label=\"Normal\")\n",
+    "axes[2].scatter(X_anom[iso_anomaly_mask, 0], X_anom[iso_anomaly_mask, 1],\n",
+    "                c=\"red\", s=80, marker=\"X\", label=\"Detected Anomaly\")\n",
+    "axes[2].set_title(\"Isolation Forest\")\n",
+    "axes[2].legend()\n",
+    "axes[2].grid(alpha=0.3)\n",
+    "\n",
+    "plt.suptitle(\"Anomaly Detection Comparison\", fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Key takeaway:** The Isolation Forest typically outperforms the Z-Score\n",
+    "method, especially when the data is multi-modal or the anomalies are not simply\n",
+    "extreme values along a single axis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 6. Capstone Project — Customer Segmentation\n",
+    "\n",
+    "We will build a complete customer-segmentation pipeline:\n",
+    "\n",
+    "1. Generate & save a synthetic customer dataset\n",
+    "2. Feature scaling\n",
+    "3. Dimensionality reduction with PCA\n",
+    "4. Elbow method to choose optimal $K$\n",
+    "5. K-Means clustering\n",
+    "6. Segment profiling & visualization\n",
+    "7. Business recommendations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.1 Generate Synthetic Customer Data\n",
+    "\n",
+    "We create five features that mimic a retail scenario:\n",
+    "\n",
+    "| Feature | Description |\n",
+    "|---------|-------------|\n",
+    "| `age` | Customer age (18–70) |\n",
+    "| `income` | Annual income in $k (15–150) |\n",
+    "| `spending_score` | In-store spending score (1–100) |\n",
+    "| `visits` | Monthly store visits (0–30) |\n",
+    "| `online_ratio` | Fraction of purchases made online (0–1) |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import os\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "n_customers = 500\n",
+    "\n",
+    "# Segment 1: Young, moderate income, high online, high spending\n",
+    "seg1 = {\n",
+    "    \"age\": np.random.normal(25, 4, 130).clip(18, 40),\n",
+    "    \"income\": np.random.normal(45, 12, 130).clip(15, 80),\n",
+    "    \"spending_score\": np.random.normal(75, 10, 130).clip(1, 100),\n",
+    "    \"visits\": np.random.normal(8, 3, 130).clip(0, 30),\n",
+    "    \"online_ratio\": np.random.normal(0.75, 0.1, 130).clip(0, 1),\n",
+    "}\n",
+    "\n",
+    "# Segment 2: Middle-aged, high income, balanced channel, moderate spending\n",
+    "seg2 = {\n",
+    "    \"age\": np.random.normal(42, 6, 150).clip(28, 60),\n",
+    "    \"income\": np.random.normal(95, 18, 150).clip(50, 150),\n",
+    "    \"spending_score\": np.random.normal(55, 12, 150).clip(1, 100),\n",
+    "    \"visits\": np.random.normal(15, 5, 150).clip(0, 30),\n",
+    "    \"online_ratio\": np.random.normal(0.45, 0.15, 150).clip(0, 1),\n",
+    "}\n",
+    "\n",
+    "# Segment 3: Older, lower income, low online, low spending\n",
+    "seg3 = {\n",
+    "    \"age\": np.random.normal(58, 7, 120).clip(40, 70),\n",
+    "    \"income\": np.random.normal(35, 10, 120).clip(15, 70),\n",
+    "    \"spending_score\": np.random.normal(25, 10, 120).clip(1, 100),\n",
+    "    \"visits\": np.random.normal(20, 5, 120).clip(0, 30),\n",
+    "    \"online_ratio\": np.random.normal(0.15, 0.08, 120).clip(0, 1),\n",
+    "}\n",
+    "\n",
+    "# Segment 4: Mixed ages, very high income, high spending, moderate visits\n",
+    "seg4 = {\n",
+    "    \"age\": np.random.normal(38, 10, 100).clip(18, 70),\n",
+    "    \"income\": np.random.normal(120, 15, 100).clip(80, 150),\n",
+    "    \"spending_score\": np.random.normal(85, 8, 100).clip(1, 100),\n",
+    "    \"visits\": np.random.normal(12, 4, 100).clip(0, 30),\n",
+    "    \"online_ratio\": np.random.normal(0.55, 0.15, 100).clip(0, 1),\n",
+    "}\n",
+    "\n",
+    "frames = []\n",
+    "for seg in [seg1, seg2, seg3, seg4]:\n",
+    "    frames.append(pd.DataFrame(seg))\n",
+    "\n",
+    "df_customers = pd.concat(frames, ignore_index=True)\n",
+    "df_customers = df_customers.sample(frac=1, random_state=42).reset_index(drop=True)\n",
+    "\n",
+    "df_customers[\"age\"] = df_customers[\"age\"].round(0).astype(int)\n",
+    "df_customers[\"income\"] = df_customers[\"income\"].round(1)\n",
+    "df_customers[\"spending_score\"] = df_customers[\"spending_score\"].round(0).astype(int)\n",
+    "df_customers[\"visits\"] = df_customers[\"visits\"].round(0).astype(int)\n",
+    "df_customers[\"online_ratio\"] = df_customers[\"online_ratio\"].round(2)\n",
+    "\n",
+    "# Save to CSV\n",
+    "dataset_dir = os.path.join(os.path.dirname(os.getcwd()), \"datasets\")\n",
+    "os.makedirs(dataset_dir, exist_ok=True)\n",
+    "csv_path = os.path.join(dataset_dir, \"customers.csv\")\n",
+    "df_customers.to_csv(csv_path, index=False)\n",
+    "print(f\"Saved {len(df_customers)} rows to {csv_path}\")\n",
+    "df_customers.head(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_customers.describe().round(2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.2 Feature Scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "feature_cols = [\"age\", \"income\", \"spending_score\", \"visits\", \"online_ratio\"]\n",
+    "X_cust = df_customers[feature_cols].values\n",
+    "\n",
+    "scaler = StandardScaler()\n",
+    "X_scaled = scaler.fit_transform(X_cust)\n",
+    "\n",
+    "print(\"Scaled means (≈0):\", np.round(X_scaled.mean(axis=0), 4))\n",
+    "print(\"Scaled stds  (≈1):\", np.round(X_scaled.std(axis=0), 4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.3 PCA for Dimensionality Reduction"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pca_cust = PCA(n_components=5)\n",
+    "X_pca_cust = pca_cust.fit_transform(X_scaled)\n",
+    "\n",
+    "cum_var = np.cumsum(pca_cust.explained_variance_ratio_)\n",
+    "\n",
+    "plt.figure(figsize=(7, 4))\n",
+    "plt.bar(range(1, 6), pca_cust.explained_variance_ratio_,\n",
+    "        color=\"steelblue\", edgecolor=\"black\", alpha=0.7, label=\"Individual\")\n",
+    "plt.step(range(1, 6), cum_var, where=\"mid\", color=\"darkorange\",\n",
+    "         linewidth=2, label=\"Cumulative\")\n",
+    "plt.axhline(0.90, color=\"red\", linestyle=\"--\", alpha=0.7, label=\"90% threshold\")\n",
+    "plt.xlabel(\"Principal Component\")\n",
+    "plt.ylabel(\"Variance Explained\")\n",
+    "plt.title(\"Customer Data — PCA Variance Explained\")\n",
+    "plt.xticks(range(1, 6))\n",
+    "plt.legend()\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "n_keep = np.argmax(cum_var >= 0.90) + 1\n",
+    "print(f\"\\nComponents needed for ≥90% variance: {n_keep}\")\n",
+    "print(f\"Using first 2 components for visualization ({cum_var[1]:.1%} variance).\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.4 K-Means — Elbow Method"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.cluster import KMeans\n",
+    "\n",
+    "K_range = range(2, 11)\n",
+    "inertias = []\n",
+    "\n",
+    "for k in K_range:\n",
+    "    km = KMeans(n_clusters=k, n_init=10, random_state=42)\n",
+    "    km.fit(X_scaled)\n",
+    "    inertias.append(km.inertia_)\n",
+    "\n",
+    "plt.figure(figsize=(8, 4))\n",
+    "plt.plot(list(K_range), inertias, \"o-\", linewidth=2, color=\"steelblue\")\n",
+    "plt.xlabel(\"Number of Clusters (K)\")\n",
+    "plt.ylabel(\"Inertia (within-cluster sum of squares)\")\n",
+    "plt.title(\"Elbow Method for Optimal K\")\n",
+    "plt.xticks(list(K_range))\n",
+    "plt.grid(alpha=0.3)\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(\"Look for the 'elbow' — the point where adding more clusters yields\")\n",
+    "print(\"diminishing returns. Here K=4 appears to be a good choice.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.5 Fit K-Means with Optimal K"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "optimal_k = 4\n",
+    "km_final = KMeans(n_clusters=optimal_k, n_init=20, random_state=42)\n",
+    "cluster_labels = km_final.fit_predict(X_scaled)\n",
+    "\n",
+    "df_customers[\"cluster\"] = cluster_labels\n",
+    "print(f\"Cluster distribution:\\n{df_customers['cluster'].value_counts().sort_index()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.6 Segment Profiling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "segment_profile = df_customers.groupby(\"cluster\")[feature_cols].mean().round(2)\n",
+    "segment_profile[\"count\"] = df_customers.groupby(\"cluster\").size()\n",
+    "print(\"=== Segment Profiles ===\")\n",
+    "segment_profile"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Radar / parallel-coordinates style comparison\n",
+    "fig, axes = plt.subplots(1, len(feature_cols), figsize=(18, 4), sharey=True)\n",
+    "cluster_colors = [\"#1f77b4\", \"#ff7f0e\", \"#2ca02c\", \"#d62728\"]\n",
+    "\n",
+    "for idx, feat in enumerate(feature_cols):\n",
+    "    means = df_customers.groupby(\"cluster\")[feat].mean()\n",
+    "    bars = axes[idx].bar(means.index, means.values,\n",
+    "                         color=cluster_colors[:optimal_k], edgecolor=\"black\")\n",
+    "    axes[idx].set_title(feat, fontsize=11)\n",
+    "    axes[idx].set_xlabel(\"Cluster\")\n",
+    "    axes[idx].set_xticks(range(optimal_k))\n",
+    "\n",
+    "axes[0].set_ylabel(\"Mean Value\")\n",
+    "plt.suptitle(\"Feature Means by Cluster\", fontsize=14, y=1.02)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.7 Visualize Segments in 2-D (PCA Projection)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_vis = X_pca_cust[:, :2]\n",
+    "centroids_scaled = km_final.cluster_centers_\n",
+    "centroids_2d = pca_cust.transform(centroids_scaled)[:, :2]  # project centroids\n",
+    "\n",
+    "plt.figure(figsize=(9, 7))\n",
+    "for c in range(optimal_k):\n",
+    "    mask = cluster_labels == c\n",
+    "    plt.scatter(X_vis[mask, 0], X_vis[mask, 1], s=40, alpha=0.6,\n",
+    "                color=cluster_colors[c], edgecolors=\"k\", linewidth=0.3,\n",
+    "                label=f\"Segment {c}\")\n",
+    "\n",
+    "plt.scatter(centroids_2d[:, 0], centroids_2d[:, 1], s=250, c=\"black\",\n",
+    "            marker=\"*\", zorder=5, label=\"Centroids\")\n",
+    "\n",
+    "plt.xlabel(f\"PC 1 ({pca_cust.explained_variance_ratio_[0]:.1%} var)\")\n",
+    "plt.ylabel(f\"PC 2 ({pca_cust.explained_variance_ratio_[1]:.1%} var)\")\n",
+    "plt.title(\"Customer Segments — PCA 2-D Projection\")\n",
+    "plt.legend()\n",
+    "plt.grid(alpha=0.3)\n",
+    "plt.tight_layout()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.8 Business Recommendations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "recommendations = {\n",
+    "    0: {\n",
+    "        \"label\": \"Budget Traditionalists\",\n",
+    "        \"description\": \"Older customers with low income and spending, who shop mostly in-store.\",\n",
+    "        \"actions\": [\n",
+    "            \"Offer loyalty discounts and in-store promotions\",\n",
+    "            \"Simplify the in-store experience\",\n",
+    "            \"Provide personalized coupons at checkout\",\n",
+    "        ],\n",
+    "    },\n",
+    "    1: {\n",
+    "        \"label\": \"Young Digital Shoppers\",\n",
+    "        \"description\": \"Young customers with moderate income but high online engagement and spending.\",\n",
+    "        \"actions\": [\n",
+    "            \"Invest in mobile app features and social media marketing\",\n",
+    "            \"Offer free shipping and digital-only deals\",\n",
+    "            \"Launch a referral program to leverage their network\",\n",
+    "        ],\n",
+    "    },\n",
+    "    2: {\n",
+    "        \"label\": \"Premium High-Spenders\",\n",
+    "        \"description\": \"High income, high spending score — the most valuable segment.\",\n",
+    "        \"actions\": [\n",
+    "            \"Create a VIP/premium loyalty tier\",\n",
+    "            \"Offer early access to new products\",\n",
+    "            \"Assign dedicated account managers for retention\",\n",
+    "        ],\n",
+    "    },\n",
+    "    3: {\n",
+    "        \"label\": \"Established Moderates\",\n",
+    "        \"description\": \"Middle-aged, higher income, moderate spending, balanced channel use.\",\n",
+    "        \"actions\": [\n",
+    "            \"Cross-sell higher-margin products\",\n",
+    "            \"Provide omni-channel convenience (buy online, pick up in store)\",\n",
+    "            \"Target with email campaigns for seasonal offers\",\n",
+    "        ],\n",
+    "    },\n",
+    "}\n",
+    "\n",
+    "for seg_id, info in recommendations.items():\n",
+    "    count = (cluster_labels == seg_id).sum()\n",
+    "    print(f\"\\n{'='*60}\")\n",
+    "    print(f\"Segment {seg_id}: {info['label']}  (n={count})\")\n",
+    "    print(f\"{'='*60}\")\n",
+    "    print(f\"  {info['description']}\")\n",
+    "    print(\"  Recommended actions:\")\n",
+    "    for action in info[\"actions\"]:\n",
+    "        print(f\"    • {action}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 7. Summary & Key Takeaways\n",
+    "\n",
+    "### What We Covered in This Notebook\n",
+    "\n",
+    "| Topic | Key Idea |\n",
+    "|-------|----------|\n",
+    "| **PCA** | Linear projection onto directions of maximum variance |\n",
+    "| **t-SNE** | Non-linear embedding that preserves local neighborhoods — for visualization only |\n",
+    "| **Z-Score Anomaly Detection** | Simple threshold on standardized values |\n",
+    "| **Isolation Forest** | Tree-based anomaly detector — fast, distribution-free |\n",
+    "| **Customer Segmentation** | End-to-end pipeline: scale → PCA → K-Means → profile → recommend |\n",
+    "\n",
+    "### Chapter 8 Recap\n",
+    "\n",
+    "Across the three notebooks you have:\n",
+    "\n",
+    "1. **Notebook 01 (Introduction):** Learned K-Means, hierarchical clustering, and evaluation metrics.\n",
+    "2. **Notebook 02 (Intermediate):** Explored DBSCAN, Gaussian Mixture Models, and silhouette analysis.\n",
+    "3. **Notebook 03 (Advanced — this one):** Mastered PCA, t-SNE, anomaly detection, and built a full capstone project.\n",
+    "\n",
+    "### What's Next\n",
+    "\n",
+    "In **Chapter 9: Deep Learning** we'll move from classical ML to neural\n",
+    "networks — starting with perceptrons, backpropagation, and building your first\n",
+    "deep network with PyTorch/Keras.\n",
+    "\n",
+    "---\n",
+    "*Generated by Berta AI | Created by Luigi Pascal Rondanini*"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/chapters/chapter-08-unsupervised-learning/requirements.txt b/chapters/chapter-08-unsupervised-learning/requirements.txt
new file mode 100644
index 0000000..8781803
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/requirements.txt
@@ -0,0 +1,7 @@
+jupyter
+notebook
+numpy
+pandas
+matplotlib
+scikit-learn
+scipy
diff --git a/chapters/chapter-08-unsupervised-learning/scripts/unsupervised_toolkit.py b/chapters/chapter-08-unsupervised-learning/scripts/unsupervised_toolkit.py
new file mode 100644
index 0000000..c1b1659
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/scripts/unsupervised_toolkit.py
@@ -0,0 +1,423 @@
+"""
+Unsupervised Learning Toolkit - Core implementations and plotting utilities.
+Generated by Berta AI | Created by Luigi Pascal Rondanini
+"""
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.metrics import silhouette_samples, silhouette_score
+from scipy.cluster.hierarchy import dendrogram, linkage
+from sklearn.datasets import make_blobs
+
+
+class KMeansScratch:
+    """
+    K-Means clustering implementation from scratch.
+    """
+
+    def __init__(self, n_clusters=3, max_iters=100, random_state=42):
+        """
+        Initialize K-Means.
+
+        Parameters
+        ----------
+        n_clusters : int
+            Number of clusters.
+        max_iters : int
+            Maximum iterations for the algorithm.
+        random_state : int
+            Random seed for reproducibility.
+        """
+        self.n_clusters = n_clusters
+        self.max_iters = max_iters
+        self.random_state = random_state
+        self.centroids = None
+        self.labels_ = None
+        self.inertia_history = []
+
+    def fit(self, X):
+        """
+        Fit K-Means to the data.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Training data.
+
+        Returns
+        -------
+        self
+        """
+        X = np.asarray(X)
+        np.random.seed(self.random_state)
+        n_samples = X.shape[0]
+
+        # Random centroid initialization
+        idx = np.random.choice(n_samples, self.n_clusters, replace=False)
+        self.centroids = X[idx].copy()
+
+        for _ in range(self.max_iters):
+            # Assign points to nearest centroid
+            labels = self._assign_clusters(X)
+            # Recompute centroids
+            new_centroids = np.zeros_like(self.centroids)
+            for k in range(self.n_clusters):
+                mask = labels == k
+                if np.any(mask):
+                    new_centroids[k] = X[mask].mean(axis=0)
+                else:
+                    new_centroids[k] = self.centroids[k]
+
+            inertia = self._compute_inertia(X, labels, new_centroids)
+            self.inertia_history.append(inertia)
+
+            if np.allclose(self.centroids, new_centroids):
+                break
+            self.centroids = new_centroids
+
+        self.labels_ = self._assign_clusters(X)
+        return self
+
+    def _assign_clusters(self, X):
+        """Assign each point to the nearest centroid."""
+        distances = np.linalg.norm(X[:, np.newaxis] - self.centroids, axis=2)
+        return np.argmin(distances, axis=1)
+
+    def predict(self, X):
+        """
+        Predict cluster labels for new data.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Data to predict.
+
+        Returns
+        -------
+        labels : ndarray of shape (n_samples,)
+            Cluster indices.
+        """
+        X = np.asarray(X)
+        return self._assign_clusters(X)
+
+    def fit_predict(self, X):
+        """
+        Fit and return cluster labels.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Training data.
+
+        Returns
+        -------
+        labels : ndarray of shape (n_samples,)
+            Cluster indices.
+        """
+        return self.fit(X).labels_
+
+    def _compute_inertia(self, X, labels, centroids):
+        """
+        Compute within-cluster sum of squares (inertia).
+
+        Parameters
+        ----------
+        X : ndarray
+            Data points.
+        labels : ndarray
+            Cluster labels.
+        centroids : ndarray
+            Cluster centroids.
+
+        Returns
+        -------
+        inertia : float
+        """
+        inertia = 0.0
+        for k in range(self.n_clusters):
+            mask = labels == k
+            if np.any(mask):
+                inertia += np.sum((X[mask] - centroids[k]) ** 2)
+        return inertia
+
+
+class PCAScratch:
+    """
+    Principal Component Analysis implementation from scratch.
+    """
+
+    def __init__(self, n_components=2):
+        """
+        Initialize PCA.
+
+        Parameters
+        ----------
+        n_components : int
+            Number of components to keep.
+        """
+        self.n_components = n_components
+        self.mean_ = None
+        self.components_ = None
+        self.explained_variance_ = None
+
+    def fit(self, X):
+        """
+        Fit PCA to the data.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Training data.
+
+        Returns
+        -------
+        self
+        """
+        X = np.asarray(X)
+        self.mean_ = X.mean(axis=0)
+        X_centered = X - self.mean_
+
+        # Covariance matrix
+        cov = np.cov(X_centered.T)
+
+        # Eigendecomposition
+        eigenvalues, eigenvectors = np.linalg.eigh(cov)
+        idx = np.argsort(eigenvalues)[::-1]
+        eigenvalues = eigenvalues[idx]
+        eigenvectors = eigenvectors[:, idx]
+
+        n = min(self.n_components, len(eigenvalues))
+        self.components_ = eigenvectors[:, :n].T
+        self.explained_variance_ = eigenvalues[:n]
+        return self
+
+    def transform(self, X):
+        """
+        Project data onto principal components.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Data to transform.
+
+        Returns
+        -------
+        X_transformed : ndarray of shape (n_samples, n_components)
+        """
+        X = np.asarray(X)
+        X_centered = X - self.mean_
+        return X_centered @ self.components_.T
+
+    def fit_transform(self, X):
+        """
+        Fit and transform in one step.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Training data.
+
+        Returns
+        -------
+        X_transformed : ndarray of shape (n_samples, n_components)
+        """
+        return self.fit(X).transform(X)
+
+    @property
+    def explained_variance_ratio_(self):
+        """Fraction of variance explained by each component."""
+        total = np.sum(self.explained_variance_)
+        return self.explained_variance_ / total if total > 0 else self.explained_variance_
+
+
+def plot_clusters(X, labels, centroids=None, title="Clusters"):
+    """
+    Scatter plot of clustered data with optional centroid markers.
+
+    Parameters
+    ----------
+    X : array-like
+        Data points (2D).
+    labels : array-like
+        Cluster labels.
+    centroids : array-like, optional
+        Centroids to plot as markers.
+    title : str
+        Plot title.
+    """
+    X = np.asarray(X)
+    labels = np.asarray(labels)
+    plt.figure(figsize=(8, 6))
+    scatter = plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="viridis", alpha=0.7, edgecolors="k")
+    if centroids is not None:
+        centroids = np.asarray(centroids)
+        plt.scatter(centroids[:, 0], centroids[:, 1], c="red", marker="X", s=200, edgecolors="black")
+    plt.colorbar(scatter, label="Cluster")
+    plt.title(title)
+    plt.xlabel("Feature 1")
+    plt.ylabel("Feature 2")
+    plt.tight_layout()
+    plt.show()
+
+
+def plot_elbow(K_range, inertias, title="Elbow Method"):
+    """
+    Line plot of inertia vs K for elbow method.
+
+    Parameters
+    ----------
+    K_range : array-like
+        Range of K values.
+    inertias : array-like
+        Inertia for each K.
+    title : str
+        Plot title.
+    """
+    plt.figure(figsize=(8, 5))
+    plt.plot(K_range, inertias, "bo-")
+    plt.xlabel("Number of clusters (K)")
+    plt.ylabel("Inertia")
+    plt.title(title)
+    plt.grid(True, alpha=0.3)
+    plt.tight_layout()
+    plt.show()
+
+
+def plot_silhouette(X, labels, title="Silhouette Analysis"):
+    """
+    Silhouette plot using sklearn.metrics.
+
+    Parameters
+    ----------
+    X : array-like
+        Data points.
+    labels : array-like
+        Cluster labels.
+    title : str
+        Plot title.
+    """
+    X = np.asarray(X)
+    labels = np.asarray(labels)
+    n_clusters = len(np.unique(labels))
+    silhouette_vals = silhouette_samples(X, labels)
+    score = silhouette_score(X, labels)
+
+    plt.figure(figsize=(10, 6))
+    y_lower = 10
+    for i in range(n_clusters):
+        cluster_silhouette = silhouette_vals[labels == i]
+        cluster_silhouette.sort()
+        size = cluster_silhouette.shape[0]
+        y_upper = y_lower + size
+        plt.fill_betweenx(np.arange(y_lower, y_upper), 0, cluster_silhouette, alpha=0.7)
+        plt.text(-0.05, y_lower + 0.5 * size, str(i))
+        y_lower = y_upper + 10
+
+    plt.axvline(x=score, color="red", linestyle="--", label=f"Avg: {score:.3f}")
+    plt.xlabel("Silhouette coefficient")
+    plt.ylabel("Cluster label")
+    plt.title(title)
+    plt.legend()
+    plt.tight_layout()
+    plt.show()
+
+
+def plot_dendrogram(X, method="ward", title="Dendrogram"):
+    """
+    Hierarchical clustering dendrogram using scipy.
+
+    Parameters
+    ----------
+    X : array-like
+        Data points.
+    method : str
+        Linkage method ('ward', 'complete', 'average', 'single').
+    title : str
+        Plot title.
+    """
+    X = np.asarray(X)
+    linkage_matrix = linkage(X, method=method)
+    plt.figure(figsize=(10, 6))
+    dendrogram(linkage_matrix)
+    plt.title(title)
+    plt.xlabel("Sample index or (cluster size)")
+    plt.ylabel("Distance")
+    plt.tight_layout()
+    plt.show()
+
+
+def plot_pca_variance(pca, title="PCA Variance Explained"):
+    """
+    Bar chart and cumulative line for PCA variance explained.
+
+    Parameters
+    ----------
+    pca : PCAScratch
+        Fitted PCA object.
+    title : str
+        Plot title.
+    """
+    ratios = pca.explained_variance_ratio_
+    cumsum = np.cumsum(ratios)
+    n = len(ratios)
+
+    fig, ax1 = plt.subplots(figsize=(8, 5))
+    x = np.arange(1, n + 1)
+    ax1.bar(x - 0.2, ratios, 0.4, label="Individual", color="steelblue")
+    ax1.set_xlabel("Principal Component")
+    ax1.set_ylabel("Variance explained ratio")
+    ax1.set_xticks(x)
+
+    ax2 = ax1.twinx()
+    ax2.plot(x, cumsum, "ro-", label="Cumulative")
+    ax2.set_ylabel("Cumulative variance")
+    ax2.set_ylim(0, 1.05)
+
+    plt.title(title)
+    fig.legend(loc="upper right", bbox_to_anchor=(1, 1), bbox_transform=ax1.transAxes)
+    plt.tight_layout()
+    plt.show()
+
+
+def plot_anomalies(X, labels, title="Anomaly Detection"):
+    """
+    Scatter plot for normal vs anomaly points.
+
+    Parameters
+    ----------
+    X : array-like
+        Data points (2D).
+    labels : array-like
+        Binary labels (0=normal, 1=anomaly or similar).
+    title : str
+        Plot title.
+    """
+    X = np.asarray(X)
+    labels = np.asarray(labels)
+    plt.figure(figsize=(8, 6))
+    normal = labels == 0
+    anomaly = labels == 1
+    plt.scatter(X[normal, 0], X[normal, 1], c="steelblue", alpha=0.7, label="Normal")
+    plt.scatter(X[anomaly, 0], X[anomaly, 1], c="red", alpha=0.7, label="Anomaly")
+    plt.xlabel("Feature 1")
+    plt.ylabel("Feature 2")
+    plt.title(title)
+    plt.legend()
+    plt.tight_layout()
+    plt.show()
+
+
+if __name__ == "__main__":
+    # Demo: Generate blobs, run KMeansScratch
+    X_blobs, _ = make_blobs(n_samples=300, n_features=2, centers=4, random_state=42)
+    kmeans = KMeansScratch(n_clusters=4, max_iters=100, random_state=42)
+    kmeans.fit(X_blobs)
+    print("KMeansScratch inertia:", kmeans.inertia_history[-1] if kmeans.inertia_history else "N/A")
+
+    # Demo: Run PCAScratch on 4D dataset
+    X_4d, _ = make_blobs(n_samples=200, n_features=4, centers=3, random_state=42)
+    pca = PCAScratch(n_components=4)
+    pca.fit(X_4d)
+    print("PCA variance explained:", pca.explained_variance_ratio_)
+
+    print("Demo complete.")
diff --git a/chapters/chapter-08-unsupervised-learning/scripts/utilities.py b/chapters/chapter-08-unsupervised-learning/scripts/utilities.py
new file mode 100644
index 0000000..bdf4c31
--- /dev/null
+++ b/chapters/chapter-08-unsupervised-learning/scripts/utilities.py
@@ -0,0 +1,104 @@
+"""
+Helper utilities for unsupervised learning.
+Generated by Berta AI | Created by Luigi Pascal Rondanini
+"""
+
+import numpy as np
+import pandas as pd
+from sklearn.preprocessing import StandardScaler
+
+
+def scale_features(X):
+    """
+    Scale features using StandardScaler (zero mean, unit variance).
+
+    Parameters
+    ----------
+    X : array-like of shape (n_samples, n_features)
+        Data to scale.
+
+    Returns
+    -------
+    X_scaled : ndarray of shape (n_samples, n_features)
+        Scaled data.
+    """
+    scaler = StandardScaler()
+    return scaler.fit_transform(X)
+
+
+def generate_synthetic_customers(n=300, seed=42):
+    """
+    Generate synthetic customer data for clustering/segmentation.
+
+    Parameters
+    ----------
+    n : int
+        Number of customers to generate.
+    seed : int
+        Random seed for reproducibility.
+
+    Returns
+    -------
+    df : pandas.DataFrame
+        DataFrame with columns: age, income, spending_score, visits, online_ratio.
+    """
+    np.random.seed(seed)
+    age = np.random.randint(18, 70, size=n)
+    income = np.random.exponential(scale=30000, size=n).astype(int) + 20000
+    spending_score = np.random.exponential(scale=50, size=n).astype(int) + 10
+    visits = np.random.poisson(lam=5, size=n) + 1
+    online_ratio = np.random.beta(2, 2, size=n)
+    return pd.DataFrame({
+        "age": age,
+        "income": income,
+        "spending_score": spending_score,
+        "visits": visits,
+        "online_ratio": online_ratio,
+    })
+
+
+def generate_synthetic_sensors(n=200, anomaly_fraction=0.1, seed=42):
+    """
+    Generate synthetic sensor data with anomalies.
+
+    Parameters
+    ----------
+    n : int
+        Number of sensor readings.
+    anomaly_fraction : float
+        Fraction of readings that are anomalies (0 to 1).
+    seed : int
+        Random seed for reproducibility.
+
+    Returns
+    -------
+    df : pandas.DataFrame
+        DataFrame with columns: temp, pressure, vibration, is_anomaly.
+    """
+    np.random.seed(seed)
+    n_anomaly = int(n * anomaly_fraction)
+    n_normal = n - n_anomaly
+
+    # Normal readings
+    temp_normal = np.random.normal(25, 2, n_normal)
+    pressure_normal = np.random.normal(100, 5, n_normal)
+    vibration_normal = np.random.exponential(0.5, n_normal)
+
+    # Anomalous readings (outliers)
+    temp_anomaly = np.random.uniform(50, 90, n_anomaly)
+    pressure_anomaly = np.random.uniform(150, 200, n_anomaly)
+    vibration_anomaly = np.random.exponential(5, n_anomaly)
+
+    temp = np.concatenate([temp_normal, temp_anomaly])
+    pressure = np.concatenate([pressure_normal, pressure_anomaly])
+    vibration = np.concatenate([vibration_normal, vibration_anomaly])
+    is_anomaly = np.concatenate([np.zeros(n_normal, dtype=int), np.ones(n_anomaly, dtype=int)])
+
+    # Shuffle
+    idx = np.random.permutation(n)
+    return pd.DataFrame({
+        "temp": temp[idx],
+        "pressure": pressure[idx],
+        "vibration": vibration[idx],
+        "is_anomaly": is_anomaly[idx],
+    })
diff --git a/docs/chapters/assets/diagrams/anomaly_detection.svg b/docs/chapters/assets/diagrams/anomaly_detection.svg
new file mode 100644
index 0000000..92452f7
--- /dev/null
+++ b/docs/chapters/assets/diagrams/anomaly_detection.svg
@@ -0,0 +1,90 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+  </style>
+
+  <!-- Panel 1: Normal Distribution -->
+  <rect x="10" y="10" width="270" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="145" y="35" text-anchor="middle" class="title" fill="#4a90d9">Statistical (Z-Score)</text>
+  <line x1="40" y1="240" x2="250" y2="240" class="axis"/>
+  <line x1="40" y1="240" x2="40" y2="60" class="axis"/>
+  <!-- Bell curve -->
+  <path d="M50,235 Q80,230 100,200 Q120,140 145,80 Q170,140 190,200 Q210,230 240,235" fill="none" stroke="#4a90d9" stroke-width="2.5"/>
+  <!-- Threshold lines -->
+  <line x1="75" y1="60" x2="75" y2="240" stroke="#c0392b" stroke-width="1.5" stroke-dasharray="4,3"/>
+  <line x1="215" y1="60" x2="215" y2="240" stroke="#c0392b" stroke-width="1.5" stroke-dasharray="4,3"/>
+  <text x="75" y="55" text-anchor="middle" style="font-size:9px" fill="#c0392b">-3 sigma</text>
+  <text x="215" y="55" text-anchor="middle" style="font-size:9px" fill="#c0392b">+3 sigma</text>
+  <!-- Normal points -->
+  <circle cx="120" cy="225" r="3" fill="#4a90d9"/><circle cx="140" cy="225" r="3" fill="#4a90d9"/>
+  <circle cx="155" cy="225" r="3" fill="#4a90d9"/><circle cx="170" cy="225" r="3" fill="#4a90d9"/>
+  <circle cx="130" cy="225" r="3" fill="#4a90d9"/><circle cx="160" cy="225" r="3" fill="#4a90d9"/>
+  <!-- Anomaly points -->
+  <circle cx="55" cy="225" r="5" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <circle cx="240" cy="225" r="5" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <text x="55" y="218" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">!</text>
+  <text x="240" y="218" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">!</text>
+  <text x="145" y="265" text-anchor="middle" class="label">Points beyond threshold</text>
+  <text x="145" y="280" text-anchor="middle" class="label" fill="#4a90d9">Simple, fast, assumes normal</text>
+
+  <!-- Panel 2: Isolation Forest -->
+  <rect x="300" y="10" width="270" height="280" rx="8" fill="#f0fff0" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="435" y="35" text-anchor="middle" class="title" fill="#27ae60">Isolation Forest</text>
+  <!-- Normal cluster -->
+  <circle cx="380" cy="160" r="3" fill="#27ae60"/><circle cx="395" cy="150" r="3" fill="#27ae60"/>
+  <circle cx="390" cy="170" r="3" fill="#27ae60"/><circle cx="405" cy="155" r="3" fill="#27ae60"/>
+  <circle cx="385" cy="165" r="3" fill="#27ae60"/><circle cx="400" cy="145" r="3" fill="#27ae60"/>
+  <circle cx="375" cy="155" r="3" fill="#27ae60"/><circle cx="410" cy="165" r="3" fill="#27ae60"/>
+  <circle cx="398" cy="160" r="3" fill="#27ae60"/><circle cx="388" cy="148" r="3" fill="#27ae60"/>
+  <circle cx="402" cy="172" r="3" fill="#27ae60"/><circle cx="392" cy="158" r="3" fill="#27ae60"/>
+  <!-- Anomaly (isolated) -->
+  <circle cx="500" cy="80" r="6" fill="#c0392b" stroke="#c0392b" stroke-width="1"/>
+  <text x="500" y="72" text-anchor="middle" style="font-size:9px; font-weight:bold" fill="#c0392b">anomaly</text>
+  <!-- Tree splits -->
+  <line x1="350" y1="100" x2="350" y2="230" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <line x1="350" y1="130" x2="530" y2="130" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <text x="480" y="115" style="font-size:9px" fill="#27ae60">Split 1</text>
+  <text x="340" y="115" style="font-size:9px" fill="#27ae60">Split 2</text>
+  <!-- Short path annotation -->
+  <path d="M500,90 L500,100 L505,100" fill="none" stroke="#c0392b" stroke-width="1.5"/>
+  <text x="515" y="104" style="font-size:9px" fill="#c0392b">short path</text>
+  <!-- Long path annotation -->
+  <path d="M392,175 L392,190 L385,190 L385,200 L390,200" fill="none" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="400" y="205" style="font-size:9px" fill="#27ae60">long path</text>
+  <text x="435" y="265" text-anchor="middle" class="label">Anomalies isolated quickly</text>
+  <text x="435" y="280" text-anchor="middle" class="label" fill="#27ae60">Works with any distribution</text>
+
+  <!-- Panel 3: Applications -->
+  <rect x="590" y="10" width="270" height="280" rx="8" fill="#fff7f0" stroke="#e67e22" stroke-width="1.5"/>
+  <text x="725" y="35" text-anchor="middle" class="title" fill="#e67e22">Applications</text>
+  <!-- Fraud icon -->
+  <rect x="620" y="55" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="75" text-anchor="middle" style="font-size:10px" fill="#e67e22">$</text>
+  <text x="670" y="75" class="label" fill="#333">Fraud Detection</text>
+  <text x="670" y="88" style="font-size:9px" fill="#888">Unusual transactions</text>
+  <!-- Defect icon -->
+  <rect x="620" y="105" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="124" text-anchor="middle" style="font-size:14px" fill="#e67e22">!</text>
+  <text x="670" y="122" class="label" fill="#333">Manufacturing QA</text>
+  <text x="670" y="135" style="font-size:9px" fill="#888">Defective products</text>
+  <!-- Network icon -->
+  <rect x="620" y="155" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <circle cx="633" cy="170" r="4" fill="none" stroke="#e67e22" stroke-width="1.5"/>
+  <circle cx="647" cy="170" r="4" fill="none" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="637" y1="170" x2="643" y2="170" stroke="#e67e22" stroke-width="1"/>
+  <text x="670" y="172" class="label" fill="#333">Network Intrusion</text>
+  <text x="670" y="185" style="font-size:9px" fill="#888">Unusual traffic patterns</text>
+  <!-- Health icon -->
+  <rect x="620" y="205" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="225" text-anchor="middle" style="font-size:12px" fill="#e67e22">+</text>
+  <text x="670" y="222" class="label" fill="#333">Health Monitoring</text>
+  <text x="670" y="235" style="font-size:9px" fill="#888">Abnormal sensor readings</text>
+  <!-- Sensor icon -->
+  <rect x="620" y="250" width="40" height="30" rx="4" fill="#e67e22" opacity="0.2" stroke="#e67e22"/>
+  <text x="640" y="268" text-anchor="middle" style="font-size:9px" fill="#e67e22">IoT</text>
+  <text x="670" y="268" class="label" fill="#333">Predictive Maintenance</text>
+  <text x="670" y="281" style="font-size:9px" fill="#888">Equipment failure warnings</text>
+</svg>
diff --git a/docs/chapters/assets/diagrams/clustering_algorithms.svg b/docs/chapters/assets/diagrams/clustering_algorithms.svg
new file mode 100644
index 0000000..f17f560
--- /dev/null
+++ b/docs/chapters/assets/diagrams/clustering_algorithms.svg
@@ -0,0 +1,92 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+  </style>
+
+  <!-- Panel 1: K-Means -->
+  <rect x="10" y="10" width="270" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="145" y="35" text-anchor="middle" class="title" fill="#4a90d9">K-Means</text>
+  <line x1="40" y1="250" x2="250" y2="250" class="axis"/>
+  <line x1="40" y1="250" x2="40" y2="60" class="axis"/>
+  <!-- Cluster 1 (blue) -->
+  <circle cx="80" cy="100" r="3" fill="#4a90d9"/><circle cx="90" cy="110" r="3" fill="#4a90d9"/>
+  <circle cx="75" cy="115" r="3" fill="#4a90d9"/><circle cx="95" cy="95" r="3" fill="#4a90d9"/>
+  <circle cx="85" cy="105" r="3" fill="#4a90d9"/><circle cx="70" cy="108" r="3" fill="#4a90d9"/>
+  <!-- Centroid 1 -->
+  <circle cx="83" cy="105" r="6" fill="none" stroke="#4a90d9" stroke-width="2"/>
+  <line x1="79" y1="105" x2="87" y2="105" stroke="#4a90d9" stroke-width="2"/>
+  <line x1="83" y1="101" x2="83" y2="109" stroke="#4a90d9" stroke-width="2"/>
+  <!-- Cluster 2 (orange) -->
+  <circle cx="180" cy="180" r="3" fill="#e67e22"/><circle cx="190" cy="170" r="3" fill="#e67e22"/>
+  <circle cx="175" cy="190" r="3" fill="#e67e22"/><circle cx="195" cy="185" r="3" fill="#e67e22"/>
+  <circle cx="185" cy="175" r="3" fill="#e67e22"/><circle cx="170" cy="178" r="3" fill="#e67e22"/>
+  <!-- Centroid 2 -->
+  <circle cx="183" cy="180" r="6" fill="none" stroke="#e67e22" stroke-width="2"/>
+  <line x1="179" y1="180" x2="187" y2="180" stroke="#e67e22" stroke-width="2"/>
+  <line x1="183" y1="176" x2="183" y2="184" stroke="#e67e22" stroke-width="2"/>
+  <!-- Cluster 3 (green) -->
+  <circle cx="200" cy="90" r="3" fill="#27ae60"/><circle cx="210" cy="100" r="3" fill="#27ae60"/>
+  <circle cx="220" cy="85" r="3" fill="#27ae60"/><circle cx="215" cy="95" r="3" fill="#27ae60"/>
+  <circle cx="205" cy="88" r="3" fill="#27ae60"/>
+  <!-- Centroid 3 -->
+  <circle cx="210" cy="92" r="6" fill="none" stroke="#27ae60" stroke-width="2"/>
+  <line x1="206" y1="92" x2="214" y2="92" stroke="#27ae60" stroke-width="2"/>
+  <line x1="210" y1="88" x2="210" y2="96" stroke="#27ae60" stroke-width="2"/>
+  <text x="145" y="265" text-anchor="middle" class="label">Spherical clusters, fixed K</text>
+  <text x="145" y="280" text-anchor="middle" class="label" fill="#4a90d9">Assigns to nearest centroid</text>
+
+  <!-- Panel 2: Hierarchical -->
+  <rect x="300" y="10" width="270" height="280" rx="8" fill="#fff7f0" stroke="#e67e22" stroke-width="1.5"/>
+  <text x="435" y="35" text-anchor="middle" class="title" fill="#e67e22">Hierarchical</text>
+  <!-- Dendrogram -->
+  <line x1="340" y1="230" x2="340" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="370" y1="230" x2="370" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="340" y1="180" x2="370" y2="180" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="355" y1="180" x2="355" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="430" y1="230" x2="430" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="460" y1="230" x2="460" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="430" y1="160" x2="460" y2="160" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="445" y1="160" x2="445" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="355" y1="130" x2="445" y2="130" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="400" y1="130" x2="400" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="510" y1="230" x2="510" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <line x1="400" y1="80" x2="510" y2="80" stroke="#e67e22" stroke-width="1.5"/>
+  <!-- Cut line -->
+  <line x1="320" y1="140" x2="540" y2="140" stroke="red" stroke-width="1.5" stroke-dasharray="5,3"/>
+  <text x="530" y="136" class="label" fill="red">cut</text>
+  <!-- Labels -->
+  <text x="340" y="245" text-anchor="middle" style="font-size:10px" fill="#555">A</text>
+  <text x="370" y="245" text-anchor="middle" style="font-size:10px" fill="#555">B</text>
+  <text x="430" y="245" text-anchor="middle" style="font-size:10px" fill="#555">C</text>
+  <text x="460" y="245" text-anchor="middle" style="font-size:10px" fill="#555">D</text>
+  <text x="510" y="245" text-anchor="middle" style="font-size:10px" fill="#555">E</text>
+  <text x="435" y="265" text-anchor="middle" class="label">Dendrogram, cut to get K</text>
+  <text x="435" y="280" text-anchor="middle" class="label" fill="#e67e22">Bottom-up merging</text>
+
+  <!-- Panel 3: DBSCAN -->
+  <rect x="590" y="10" width="270" height="280" rx="8" fill="#f0fff0" stroke="#27ae60" stroke-width="1.5"/>
+  <text x="725" y="35" text-anchor="middle" class="title" fill="#27ae60">DBSCAN</text>
+  <line x1="620" y1="250" x2="830" y2="250" class="axis"/>
+  <line x1="620" y1="250" x2="620" y2="60" class="axis"/>
+  <!-- Moon shape 1 (green) -->
+  <circle cx="660" cy="150" r="3" fill="#27ae60"/><circle cx="670" cy="130" r="3" fill="#27ae60"/>
+  <circle cx="690" cy="115" r="3" fill="#27ae60"/><circle cx="710" cy="110" r="3" fill="#27ae60"/>
+  <circle cx="730" cy="115" r="3" fill="#27ae60"/><circle cx="750" cy="130" r="3" fill="#27ae60"/>
+  <circle cx="760" cy="150" r="3" fill="#27ae60"/>
+  <!-- Moon shape 2 (purple) -->
+  <circle cx="680" cy="170" r="3" fill="#8e44ad"/><circle cx="690" cy="190" r="3" fill="#8e44ad"/>
+  <circle cx="710" cy="200" r="3" fill="#8e44ad"/><circle cx="730" cy="200" r="3" fill="#8e44ad"/>
+  <circle cx="750" cy="190" r="3" fill="#8e44ad"/><circle cx="760" cy="170" r="3" fill="#8e44ad"/>
+  <!-- Noise points -->
+  <circle cx="650" cy="80" r="3" fill="#999"/><circle cx="800" cy="220" r="3" fill="#999"/>
+  <text x="650" y="73" style="font-size:9px" fill="#999">noise</text>
+  <text x="800" y="213" style="font-size:9px" fill="#999">noise</text>
+  <!-- eps radius example -->
+  <circle cx="710" cy="110" r="18" fill="none" stroke="#27ae60" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
+  <text x="732" y="100" style="font-size:9px" fill="#27ae60">eps</text>
+  <text x="725" y="265" text-anchor="middle" class="label">Arbitrary shapes, auto K</text>
+  <text x="725" y="280" text-anchor="middle" class="label" fill="#27ae60">Density-based, detects noise</text>
+</svg>
diff --git a/docs/chapters/assets/diagrams/dimensionality_reduction.svg b/docs/chapters/assets/diagrams/dimensionality_reduction.svg
new file mode 100644
index 0000000..7f4b92b
--- /dev/null
+++ b/docs/chapters/assets/diagrams/dimensionality_reduction.svg
@@ -0,0 +1,81 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 300">
+  <style>
+    text { font-family: Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: bold; fill: #333; }
+    .label { font-size: 11px; fill: #555; }
+    .axis { stroke: #999; stroke-width: 1; }
+    .arrow { stroke: #333; stroke-width: 1.5; fill: #333; }
+  </style>
+
+  <!-- Panel 1: High-dimensional data -->
+  <rect x="10" y="10" width="250" height="280" rx="8" fill="#f5f5f5" stroke="#666" stroke-width="1.5"/>
+  <text x="135" y="35" text-anchor="middle" class="title" fill="#666">High-Dimensional Data</text>
+  <!-- Feature matrix -->
+  <rect x="40" y="55" width="190" height="120" rx="4" fill="#fff" stroke="#ccc" stroke-width="1"/>
+  <text x="50" y="75" style="font-size:10px; font-family:monospace" fill="#333">f1  f2  f3  f4  ...  fN</text>
+  <line x1="40" y1="80" x2="230" y2="80" stroke="#ccc" stroke-width="0.5"/>
+  <text x="50" y="95" style="font-size:10px; font-family:monospace" fill="#888">2.1 0.3 1.7 4.2 ... 0.9</text>
+  <text x="50" y="110" style="font-size:10px; font-family:monospace" fill="#888">1.5 2.8 0.4 3.1 ... 1.2</text>
+  <text x="50" y="125" style="font-size:10px; font-family:monospace" fill="#888">3.2 1.1 2.9 0.8 ... 2.4</text>
+  <text x="135" y="145" text-anchor="middle" style="font-size:10px" fill="#aaa">... n rows x d features ...</text>
+  <text x="135" y="165" style="font-size:10px" fill="#aaa" text-anchor="middle">d = 50, 100, 1000+</text>
+  <!-- Problems -->
+  <text x="135" y="200" text-anchor="middle" class="label" fill="#c0392b">Curse of dimensionality</text>
+  <text x="135" y="216" text-anchor="middle" class="label" fill="#c0392b">Hard to visualize</text>
+  <text x="135" y="232" text-anchor="middle" class="label" fill="#c0392b">Noisy, redundant features</text>
+  <text x="135" y="248" text-anchor="middle" class="label" fill="#c0392b">Slow computation</text>
+
+  <!-- Arrow -->
+  <polygon points="270,150 290,140 290,148 330,148 330,152 290,152 290,160" fill="#333"/>
+  <text x="300" y="140" text-anchor="middle" style="font-size:10px; font-weight:bold" fill="#333">Reduce</text>
+
+  <!-- Panel 2: PCA -->
+  <rect x="340" y="10" width="250" height="280" rx="8" fill="#f0f7ff" stroke="#4a90d9" stroke-width="1.5"/>
+  <text x="465" y="35" text-anchor="middle" class="title" fill="#4a90d9">PCA (Linear)</text>
+  <line x1="370" y1="240" x2="560" y2="240" class="axis"/>
+  <line x1="370" y1="240" x2="370" y2="60" class="axis"/>
+  <text x="465" y="258" text-anchor="middle" style="font-size:10px" fill="#555">PC1</text>
+  <text x="362" y="150" text-anchor="middle" style="font-size:10px" fill="#555" transform="rotate(-90, 362, 150)">PC2</text>
+  <!-- Elongated cluster along PC1 -->
+  <circle cx="390" cy="160" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="410" cy="150" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="425" cy="145" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="440" cy="155" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="460" cy="140" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="475" cy="148" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="490" cy="135" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="505" cy="142" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="520" cy="130" r="3" fill="#4a90d9" opacity="0.7"/>
+  <circle cx="535" cy="138" r="3" fill="#4a90d9" opacity="0.7"/>
+  <!-- PC1 direction arrow -->
+  <line x1="385" y1="165" x2="545" y2="125" stroke="#4a90d9" stroke-width="2" stroke-dasharray="5,3"/>
+  <!-- Variance bar -->
+  <rect x="380" y="200" width="140" height="12" rx="2" fill="#4a90d9" opacity="0.6"/>
+  <rect x="380" y="216" width="40" height="12" rx="2" fill="#4a90d9" opacity="0.3"/>
+  <text x="530" y="210" style="font-size:9px" fill="#4a90d9">PC1: 72%</text>
+  <text x="430" y="226" style="font-size:9px" fill="#4a90d9">PC2: 18%</text>
+  <text x="465" y="272" text-anchor="middle" class="label" fill="#4a90d9">Max variance directions</text>
+  <text x="465" y="285" text-anchor="middle" class="label">Global structure preserved</text>
+
+  <!-- Panel 3: t-SNE -->
+  <rect x="610" y="10" width="250" height="280" rx="8" fill="#fff0f5" stroke="#8e44ad" stroke-width="1.5"/>
+  <text x="735" y="35" text-anchor="middle" class="title" fill="#8e44ad">t-SNE (Nonlinear)</text>
+  <line x1="640" y1="240" x2="830" y2="240" class="axis"/>
+  <line x1="640" y1="240" x2="640" y2="60" class="axis"/>
+  <!-- Well-separated clusters -->
+  <!-- Cluster A -->
+  <circle cx="680" cy="100" r="3" fill="#e74c3c"/><circle cx="690" cy="90" r="3" fill="#e74c3c"/>
+  <circle cx="675" cy="95" r="3" fill="#e74c3c"/><circle cx="695" cy="105" r="3" fill="#e74c3c"/>
+  <circle cx="685" cy="98" r="3" fill="#e74c3c"/><circle cx="670" cy="88" r="3" fill="#e74c3c"/>
+  <!-- Cluster B -->
+  <circle cx="770" cy="90" r="3" fill="#27ae60"/><circle cx="780" cy="100" r="3" fill="#27ae60"/>
+  <circle cx="775" cy="85" r="3" fill="#27ae60"/><circle cx="785" cy="95" r="3" fill="#27ae60"/>
+  <circle cx="765" cy="92" r="3" fill="#27ae60"/>
+  <!-- Cluster C -->
+  <circle cx="720" cy="190" r="3" fill="#3498db"/><circle cx="730" cy="180" r="3" fill="#3498db"/>
+  <circle cx="715" cy="185" r="3" fill="#3498db"/><circle cx="735" cy="195" r="3" fill="#3498db"/>
+  <circle cx="725" cy="188" r="3" fill="#3498db"/><circle cx="740" cy="185" r="3" fill="#3498db"/>
+  <circle cx="710" cy="192" r="3" fill="#3498db"/>
+  <text x="735" y="265" text-anchor="middle" class="label" fill="#8e44ad">Preserves local neighborhoods</text>
+  <text x="735" y="280" text-anchor="middle" class="label">Best for visualization</text>
+</svg>
diff --git a/docs/chapters/chapter-08.md b/docs/chapters/chapter-08.md
new file mode 100644
index 0000000..1a9eb96
--- /dev/null
+++ b/docs/chapters/chapter-08.md
@@ -0,0 +1,100 @@
+# Chapter 8: Unsupervised Learning
+
+Discover hidden patterns in unlabeled data—clustering, dimensionality reduction, and anomaly detection.
+
+---
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Track** | Practitioner |
+| **Time** | 8 hours |
+| **Prerequisites** | Chapters 1–6 |
+
+---
+
+## Learning Objectives
+
+- Implement K-Means clustering from scratch using NumPy
+- Apply hierarchical clustering and interpret dendrograms
+- Use DBSCAN for density-based clustering with noise detection
+- Evaluate clusters with silhouette scores and the elbow method
+- Reduce dimensionality with PCA and t-SNE
+- Detect anomalies with Isolation Forest and statistical methods
+- Build a complete customer segmentation pipeline
+
+---
+
+## What's Included
+
+### Notebooks
+
+| Notebook | Description |
+|----------|-------------|
+| `01_introduction.ipynb` | K-Means from scratch, evaluation, elbow method |
+| `02_intermediate.ipynb` | Hierarchical, DBSCAN, Gaussian Mixture Models |
+| `03_advanced.ipynb` | PCA, t-SNE, anomaly detection, customer segmentation capstone |
+
+### Scripts
+
+- `unsupervised_toolkit.py` — Core implementations (KMeansScratch, PCAScratch) and plotting utilities
+
+### Exercises
+
+- **5 exercises** with solutions (in `solutions/` branch)
+
+### SVG Diagrams
+
+- 3 visual diagrams for clustering algorithms, dimensionality reduction, and anomaly detection
+
+---
+
+
+
+---
+
+## Read Online
+
+You can read the full chapter content right here on the website:
+
+- **[08.1 Introduction](content/ch08-01_introduction.md)** -- K-Means from scratch, silhouette scores, elbow method
+- **[08.2 Intermediate](content/ch08-02_intermediate.md)** -- Hierarchical clustering, DBSCAN, Gaussian Mixture Models
+- **[08.3 Advanced](content/ch08-03_advanced.md)** -- PCA, t-SNE, anomaly detection, customer segmentation capstone
+
+Or [try the code in the Playground](../playground.md).
+
+## How to Use This Chapter
+
+!!! tip "Quick Start"
+    Follow these steps to get coding in minutes.
+
+**1. Clone and install dependencies**
+
+```bash
+git clone https://github.com/luigipascal/berta-chapters.git
+cd berta-chapters
+pip install -r requirements.txt
+```
+
+**2. Navigate to the chapter**
+
+```bash
+cd chapters/chapter-08-unsupervised-learning
+```
+
+**3. Launch Jupyter**
+
+```bash
+jupyter notebook notebooks/01_introduction.ipynb
+```
+
+!!! info "GitHub Folder"
+    All chapter materials live in: [`chapters/chapter-08-unsupervised-learning/`](https://github.com/luigipascal/berta-chapters/tree/main/chapters/chapter-08-unsupervised-learning/)
+
+!!! tip "SciPy"
+    This chapter uses SciPy for hierarchical clustering dendrograms. Ensure it's installed: `pip install scipy`
+
+---
+
+**Created by Luigi Pascal Rondanini | Generated by Berta AI**
diff --git a/docs/chapters/content/ch08-01_introduction.md b/docs/chapters/content/ch08-01_introduction.md
new file mode 100644
index 0000000..1f2321e
--- /dev/null
+++ b/docs/chapters/content/ch08-01_introduction.md
@@ -0,0 +1,448 @@
+# Ch 8: Unsupervised Learning - Introduction
+
+**Track**: Practitioner | [Try code in Playground](../../playground.md) | [Back to chapter overview](../chapter-08.md)
+
+
+!!! tip "Read online or run locally"
+    You can read this content here on the web. To run the code interactively,
+    either use the [Playground](../../playground.md) or clone the repo and open
+    `chapters/chapter-08-unsupervised-learning/notebooks/01_introduction.ipynb` in Jupyter.
+
+---
+
+# Chapter 8: Unsupervised Learning
+## Notebook 01 - Introduction: Clustering Basics
+
+Unsupervised learning finds hidden patterns in data without labels. We start with the most fundamental algorithm: **K-Means clustering**.
+
+**What you'll learn:**
+- The difference between supervised and unsupervised learning
+- K-Means clustering from scratch using NumPy
+- Evaluating clusters with inertia and silhouette score
+- The elbow method for choosing K
+- Scikit-learn's KMeans interface
+
+**Time estimate:** 2.5 hours
+
+---
+
+## 1. Supervised vs Unsupervised Learning
+
+In **supervised learning**, every training example comes with a label — the "right answer" — and the model learns a mapping from inputs to outputs. Classification and regression are the classic examples.
+
+In **unsupervised learning**, there are **no labels at all**. The algorithm must discover structure in the data on its own. Common tasks include clustering (group similar points), dimensionality reduction (compress features), and anomaly detection (find unusual observations).
+
+This notebook focuses on **clustering** — specifically the **K-Means** algorithm. Let's start by generating some data and seeing what it looks like *without* labels. The left plot shows raw data (all same color); the right reveals the true clusters we want the algorithm to recover on its own.
+
+```python
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.datasets import make_blobs
+
+np.random.seed(42)
+
+X, y_true = make_blobs(
+    n_samples=200, centers=3, cluster_std=0.9, random_state=42
+)
+
+fig, axes = plt.subplots(1, 2, figsize=(13, 5))
+
+axes[0].scatter(X[:, 0], X[:, 1], c="steelblue", edgecolors="k", s=50, alpha=0.7)
+axes[0].set_title("What we observe (no labels)", fontsize=14)
+axes[0].set_xlabel("Feature 1")
+axes[0].set_ylabel("Feature 2")
+
+colors = ["#e74c3c", "#2ecc71", "#3498db"]
+for k in range(3):
+    mask = y_true == k
+    axes[1].scatter(X[mask, 0], X[mask, 1], c=colors[k],
+                    edgecolors="k", s=50, alpha=0.7, label=f"Cluster {k}")
+axes[1].set_title("True clusters (hidden from algorithm)", fontsize=14)
+axes[1].set_xlabel("Feature 1")
+axes[1].set_ylabel("Feature 2")
+axes[1].legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 2. K-Means Algorithm
+
+K-Means is an iterative algorithm that partitions *n* data points into *K* clusters. It works in three repeating steps:
+
+**Step 1 — Initialize:** Pick *K* points as initial **centroids** (cluster centers). The simplest approach is to choose *K* data points at random.
+
+**Step 2 — Assign:** For every data point, compute the Euclidean distance to each centroid and assign the point to the **nearest** centroid.
+
+**Step 3 — Update:** Recompute each centroid as the **mean** of all points currently assigned to that cluster.
+
+**Repeat** Steps 2 and 3 until the assignments no longer change (or a maximum number of iterations is reached).
+
+Let's implement K-Means from scratch using only NumPy:
+
+```python
+class KMeansScratch:
+    """Minimal K-Means implementation using NumPy."""
+
+    def __init__(self, k=3, max_iters=100, random_state=42):
+        self.k = k
+        self.max_iters = max_iters
+        self.random_state = random_state
+        self.centroids = None
+        self.labels_ = None
+        self.inertia_ = None
+        self.inertia_history = []
+        self.centroid_history = []
+        self.label_history = []
+
+    def _euclidean_distances(self, X, centroids):
+        """Compute distance from every point to every centroid."""
+        return np.sqrt(((X[:, np.newaxis] - centroids[np.newaxis]) ** 2).sum(axis=2))
+
+    def _compute_inertia(self, X, labels, centroids):
+        return sum(
+            np.sum((X[labels == k] - centroids[k]) ** 2)
+            for k in range(self.k)
+        )
+
+    def fit(self, X):
+        rng = np.random.RandomState(self.random_state)
+        n_samples = X.shape[0]
+
+        # Step 1: random initialization
+        idx = rng.choice(n_samples, self.k, replace=False)
+        self.centroids = X[idx].copy()
+
+        self.inertia_history = []
+        self.centroid_history = [self.centroids.copy()]
+        self.label_history = []
+
+        for _ in range(self.max_iters):
+            # Step 2: assign
+            distances = self._euclidean_distances(X, self.centroids)
+            labels = np.argmin(distances, axis=1)
+            self.label_history.append(labels.copy())
+
+            # Step 3: update centroids
+            new_centroids = np.array([
+                X[labels == k].mean(axis=0) if np.any(labels == k)
+                else self.centroids[k]
+                for k in range(self.k)
+            ])
+
+            inertia = self._compute_inertia(X, labels, new_centroids)
+            self.inertia_history.append(inertia)
+            self.centroid_history.append(new_centroids.copy())
+
+            if np.allclose(new_centroids, self.centroids):
+                break
+            self.centroids = new_centroids
+
+        self.labels_ = labels
+        self.inertia_ = self.inertia_history[-1]
+        return self
+
+    def predict(self, X):
+        distances = self._euclidean_distances(X, self.centroids)
+        return np.argmin(distances, axis=1)
+
+
+km_scratch = KMeansScratch(k=3, random_state=42)
+km_scratch.fit(X)
+
+print(f"Converged in {len(km_scratch.inertia_history)} iterations")
+print(f"Final inertia: {km_scratch.inertia_:.2f}")
+print(f"Centroids:\n{km_scratch.centroids}")
+```
+
+Now let's plot the ground truth alongside our K-Means result:
+
+```python
+fig, axes = plt.subplots(1, 2, figsize=(13, 5))
+
+colors_map = np.array(["#e74c3c", "#2ecc71", "#3498db"])
+
+for k in range(3):
+    mask = y_true == k
+    axes[0].scatter(X[mask, 0], X[mask, 1], c=colors[k],
+                    edgecolors="k", s=50, alpha=0.7, label=f"True {k}")
+axes[0].set_title("Ground Truth", fontsize=14)
+axes[0].legend()
+axes[0].set_xlabel("Feature 1")
+axes[0].set_ylabel("Feature 2")
+
+axes[1].scatter(X[:, 0], X[:, 1], c=colors_map[km_scratch.labels_],
+                edgecolors="k", s=50, alpha=0.7)
+axes[1].scatter(km_scratch.centroids[:, 0], km_scratch.centroids[:, 1],
+                c=colors, marker="X", s=250, edgecolors="k", linewidths=1.5,
+                zorder=5, label="Centroids")
+axes[1].set_title("K-Means (scratch) result", fontsize=14)
+axes[1].legend()
+axes[1].set_xlabel("Feature 1")
+axes[1].set_ylabel("Feature 2")
+
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 3. Step-by-Step Visualization
+
+To build intuition for how the algorithm converges, let's watch the first four iterations unfold. Each subplot shows the cluster assignments and centroid positions at a particular iteration. Notice how the centroids migrate toward the cluster centers with each iteration.
+
+```python
+fig, axes = plt.subplots(2, 2, figsize=(12, 10))
+axes = axes.ravel()
+
+colors_map = np.array(["#e74c3c", "#2ecc71", "#3498db"])
+
+n_show = min(4, len(km_scratch.label_history))
+
+for i in range(n_show):
+    ax = axes[i]
+    labels_i = km_scratch.label_history[i]
+    centroids_i = km_scratch.centroid_history[i]
+    centroids_next = km_scratch.centroid_history[i + 1]
+
+    ax.scatter(X[:, 0], X[:, 1], c=colors_map[labels_i],
+               edgecolors="k", s=40, alpha=0.6)
+
+    ax.scatter(centroids_i[:, 0], centroids_i[:, 1],
+               facecolors="none", edgecolors="k", marker="o",
+               s=200, linewidths=2, label="Old centroid")
+
+    ax.scatter(centroids_next[:, 0], centroids_next[:, 1],
+               c=colors, marker="X", s=250, edgecolors="k",
+               linewidths=1.5, zorder=5, label="New centroid")
+
+    for k in range(3):
+        ax.annotate("",
+                    xy=centroids_next[k], xytext=centroids_i[k],
+                    arrowprops=dict(arrowstyle="->", lw=1.5, color="black"))
+
+    ax.set_title(f"Iteration {i + 1}  |  inertia = {km_scratch.inertia_history[i]:.1f}",
+                 fontsize=12)
+    if i == 0:
+        ax.legend(fontsize=9, loc="upper left")
+
+for j in range(n_show, 4):
+    axes[j].axis("off")
+
+plt.suptitle("K-Means — Iteration-by-Iteration", fontsize=15, y=1.01)
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 4. Evaluating Clusters
+
+How do we know if K-Means did a good job? Two common metrics:
+
+**Inertia (Within-Cluster Sum of Squares):** The sum of squared distances from each point to its centroid. Lower is better, but inertia *always* decreases as K increases — so it alone doesn't tell us the right K.
+
+**Silhouette Score:** For each point, we compare the mean distance to others in the same cluster (*a*) vs. the mean distance to the nearest other cluster (*b*). The score is *(b - a) / max(a, b)*, ranging from −1 to +1. Higher is better; values near 0 indicate overlapping clusters.
+
+```python
+from sklearn.metrics import silhouette_score, silhouette_samples
+
+sil_avg = silhouette_score(X, km_scratch.labels_)
+sil_vals = silhouette_samples(X, km_scratch.labels_)
+
+print(f"Inertia:              {km_scratch.inertia_:.2f}")
+print(f"Silhouette (mean):    {sil_avg:.4f}")
+print(f"Silhouette (min):     {sil_vals.min():.4f}")
+print(f"Silhouette (max):     {sil_vals.max():.4f}")
+```
+
+A silhouette plot shows each cluster's distribution of silhouette coefficients. Healthy clusters extend well past the mean line; thin slivers or clusters barely crossing zero suggest poor separation.
+
+```python
+fig, ax = plt.subplots(figsize=(8, 5))
+
+y_lower = 10
+colors_sil = ["#e74c3c", "#2ecc71", "#3498db"]
+
+for k in range(3):
+    cluster_sil = np.sort(sil_vals[km_scratch.labels_ == k])
+    cluster_size = cluster_sil.shape[0]
+    y_upper = y_lower + cluster_size
+
+    ax.fill_betweenx(np.arange(y_lower, y_upper), 0, cluster_sil,
+                     facecolor=colors_sil[k], edgecolor=colors_sil[k], alpha=0.7)
+    ax.text(-0.05, y_lower + 0.5 * cluster_size, f"Cluster {k}", fontsize=11,
+            fontweight="bold", va="center")
+    y_lower = y_upper + 10
+
+ax.axvline(x=sil_avg, color="k", linestyle="--", linewidth=1.5,
+           label=f"Mean silhouette = {sil_avg:.3f}")
+ax.set_xlabel("Silhouette coefficient", fontsize=12)
+ax.set_ylabel("Points (sorted within cluster)", fontsize=12)
+ax.set_title("Silhouette Plot — K-Means (K=3)", fontsize=14)
+ax.legend(fontsize=11)
+ax.set_yticks([])
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 5. The Elbow Method
+
+Since we must specify *K* before running K-Means, how do we pick a good value?
+
+**The Elbow Method:**
+1. Run K-Means for K = 1, 2, …, K_max.
+2. Plot inertia vs K.
+3. Look for the **"elbow"** — the point where inertia stops decreasing sharply and begins to level off.
+
+We can also plot silhouette score vs K; the best K often maximizes silhouette. Both plots together give a clearer picture.
+
+```python
+K_range = range(1, 11)
+inertias = []
+silhouettes = []
+
+for k in K_range:
+    km = KMeansScratch(k=k, random_state=42)
+    km.fit(X)
+    inertias.append(km.inertia_)
+    if k >= 2:
+        silhouettes.append(silhouette_score(X, km.labels_))
+    else:
+        silhouettes.append(np.nan)
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+axes[0].plot(K_range, inertias, "o-", color="#2c3e50", linewidth=2, markersize=8)
+axes[0].set_xlabel("Number of clusters (K)", fontsize=12)
+axes[0].set_ylabel("Inertia", fontsize=12)
+axes[0].set_title("Elbow Method", fontsize=14)
+axes[0].axvline(x=3, color="#e74c3c", linestyle="--", alpha=0.7, label="K = 3 (elbow)")
+axes[0].legend(fontsize=11)
+axes[0].grid(True, alpha=0.3)
+
+sil_values = [s for s in silhouettes if not np.isnan(s)]
+sil_ks = list(range(2, 11))
+axes[1].plot(sil_ks, sil_values, "s-", color="#27ae60", linewidth=2, markersize=8)
+axes[1].set_xlabel("Number of clusters (K)", fontsize=12)
+axes[1].set_ylabel("Mean Silhouette Score", fontsize=12)
+axes[1].set_title("Silhouette Score vs K", fontsize=14)
+axes[1].axvline(x=3, color="#e74c3c", linestyle="--", alpha=0.7, label="K = 3")
+axes[1].legend(fontsize=11)
+axes[1].grid(True, alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+
+print("Silhouette scores by K:")
+for k, s in zip(sil_ks, sil_values):
+    print(f"  K={k:2d}  ->  {s:.4f}")
+```
+
+Both plots agree: **K = 3** is the best choice for this dataset — inertia has a clear elbow and the silhouette score peaks at K = 3.
+
+---
+
+## 6. Scikit-learn KMeans
+
+In practice you'll use scikit-learn's battle-tested implementation. It uses smarter **k-means++** initialization and runs multiple restarts (`n_init`) to avoid poor local minima. Let's compare with our scratch version:
+
+```python
+from sklearn.cluster import KMeans
+
+km_sklearn = KMeans(n_clusters=3, random_state=42, n_init=10)
+km_sklearn.fit(X)
+
+print("=== Scikit-learn KMeans ===")
+print(f"Inertia:          {km_sklearn.inertia_:.2f}")
+print(f"Silhouette score: {silhouette_score(X, km_sklearn.labels_):.4f}")
+print(f"Centroids:\n{km_sklearn.cluster_centers_}")
+print()
+
+print("=== Our scratch KMeans ===")
+print(f"Inertia:          {km_scratch.inertia_:.2f}")
+print(f"Silhouette score: {silhouette_score(X, km_scratch.labels_):.4f}")
+print(f"Centroids:\n{km_scratch.centroids}")
+```
+
+The cluster labels may differ in numbering (label 0 in one could be label 2 in the other), but the **groupings themselves** should be nearly identical.
+
+```python
+fig, axes = plt.subplots(1, 2, figsize=(13, 5))
+
+colors_map = np.array(["#e74c3c", "#2ecc71", "#3498db"])
+
+axes[0].scatter(X[:, 0], X[:, 1], c=colors_map[km_scratch.labels_],
+                edgecolors="k", s=50, alpha=0.7)
+axes[0].scatter(km_scratch.centroids[:, 0], km_scratch.centroids[:, 1],
+                c="gold", marker="X", s=250, edgecolors="k", linewidths=1.5, zorder=5)
+axes[0].set_title("Our Scratch Implementation", fontsize=14)
+axes[0].set_xlabel("Feature 1")
+axes[0].set_ylabel("Feature 2")
+
+axes[1].scatter(X[:, 0], X[:, 1], c=colors_map[km_sklearn.labels_],
+                edgecolors="k", s=50, alpha=0.7)
+axes[1].scatter(km_sklearn.cluster_centers_[:, 0], km_sklearn.cluster_centers_[:, 1],
+                c="gold", marker="X", s=250, edgecolors="k", linewidths=1.5, zorder=5)
+axes[1].set_title("Scikit-learn KMeans", fontsize=14)
+axes[1].set_xlabel("Feature 1")
+axes[1].set_ylabel("Feature 2")
+
+plt.suptitle("Scratch vs Scikit-learn — Side by Side", fontsize=15, y=1.01)
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 7. Practical Tips
+
+### When K-Means Works Well
+
+K-Means works best when clusters are:
+- **Spherical (isotropic):** roughly the same spread in every direction
+- **Similar in size:** very uneven cluster sizes can pull centroids away from smaller groups
+- **Well-separated:** heavily overlapping clusters confuse the algorithm
+
+### Feature Scaling
+
+K-Means relies on Euclidean distance. If one feature has a range of 0–1 and another 0–10,000, the second feature will dominate. **Always standardize your features** (e.g., `StandardScaler`) before clustering.
+
+### Multiple Initializations
+
+Scikit-learn's `n_init` parameter (default 10) runs K-Means multiple times with different random seeds and keeps the result with the lowest inertia. This greatly reduces the risk of a poor local minimum.
+
+### When K-Means Fails
+
+K-Means struggles with:
+- **Non-convex shapes** (e.g., crescent moons, concentric rings) — consider DBSCAN or spectral clustering instead
+- **Clusters with very different densities** — HDBSCAN handles this better
+- **High-dimensional data** — distances become less meaningful (curse of dimensionality); apply dimensionality reduction first
+
+---
+
+## Summary
+
+### Key Takeaways
+
+1. **Unsupervised learning** discovers structure without labels. Clustering is its flagship task.
+2. **K-Means** iterates between *assigning* points to the nearest centroid and *updating* centroids as cluster means until convergence.
+3. **Inertia** measures within-cluster compactness; **silhouette score** balances compactness and separation.
+4. The **elbow method** plots inertia vs K to find a natural number of clusters.
+5. **Scikit-learn's KMeans** adds smart initialization (k-means++) and multiple restarts for robust results.
+6. Always **scale features** before clustering, and remember that K-Means assumes spherical, similarly-sized clusters.
+
+### What's Next
+
+In the following notebooks we will:
+- Explore **hierarchical clustering** and dendrograms
+- Learn **DBSCAN** for density-based clustering
+- Apply **dimensionality reduction** (PCA, t-SNE) for visualization
+
+---
+
+*Generated by Berta AI | Created by Luigi Pascal Rondanini*
diff --git a/docs/chapters/content/ch08-02_intermediate.md b/docs/chapters/content/ch08-02_intermediate.md
new file mode 100644
index 0000000..66973b2
--- /dev/null
+++ b/docs/chapters/content/ch08-02_intermediate.md
@@ -0,0 +1,520 @@
+# Ch 8: Unsupervised Learning - Intermediate
+
+**Track**: Practitioner | [Try code in Playground](../../playground.md) | [Back to chapter overview](../chapter-08.md)
+
+
+!!! tip "Read online or run locally"
+    You can read this content here on the web. To run the code interactively,
+    either use the [Playground](../../playground.md) or clone the repo and open
+    `chapters/chapter-08-unsupervised-learning/notebooks/02_intermediate.ipynb` in Jupyter.
+
+---
+
+# Chapter 8: Unsupervised Learning
+## Notebook 02 - Intermediate: Advanced Clustering
+
+Beyond K-Means: hierarchical clustering, density-based methods, and Gaussian mixtures for real-world data shapes.
+
+**What you'll learn:**
+- Hierarchical (agglomerative) clustering and dendrograms
+- DBSCAN for density-based clustering
+- Gaussian Mixture Models (GMMs)
+- Comparing clustering algorithms on different data shapes
+
+**Time estimate:** 2.5 hours
+
+**Try it yourself:** Experiment with different linkage methods (single, complete, average, ward) on the hierarchical clustering example. Change `eps` and `min_samples` in DBSCAN to see how they affect cluster formation.
+
+**Common mistakes:** Using K-Means on non-convex shapes (e.g., moons), ignoring the k-distance graph when tuning DBSCAN, or assuming spherical clusters when data is elliptical.
+
+---
+
+## 1. Hierarchical Clustering
+
+Hierarchical clustering builds a tree of clusters instead of requiring a fixed number of clusters up front. The **agglomerative (bottom-up)** approach proceeds as follows:
+
+1. **Start** — treat every data point as its own single-point cluster.
+2. **Merge** — find the two closest clusters and merge them into one.
+3. **Repeat** — keep merging until only a single cluster remains (or until a stopping criterion is met).
+
+The result is a hierarchy that can be visualised as a **dendrogram** — a tree diagram showing the order and distance of each merge.
+
+### Linkage criteria
+
+"Distance between two clusters" can be measured in several ways:
+
+| Linkage | Definition | Tendency |
+|---------|-----------|----------|
+| **Single** | Minimum distance between any pair of points across two clusters | Produces elongated, chain-like clusters |
+| **Complete** | Maximum distance between any pair of points across two clusters | Produces compact, roughly equal-sized clusters |
+| **Average** | Mean distance between all pairs of points across two clusters | Compromise between single and complete |
+| **Ward** | Minimises the total within-cluster variance at each merge | Tends to produce equally sized, spherical clusters |
+
+Ward linkage is the most commonly used default and works well when clusters are roughly spherical.
+
+```python
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.datasets import make_blobs
+from sklearn.cluster import AgglomerativeClustering
+from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
+
+np.random.seed(42)
+
+# Generate synthetic data with 4 well-separated clusters
+X_hier, y_hier = make_blobs(
+    n_samples=200, centers=4, cluster_std=0.8, random_state=42
+)
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+# Left panel — raw data
+axes[0].scatter(X_hier[:, 0], X_hier[:, 1], s=30, alpha=0.7, edgecolors='k', linewidths=0.3)
+axes[0].set_title('Raw Data (200 points, 4 clusters)')
+axes[0].set_xlabel('Feature 1')
+axes[0].set_ylabel('Feature 2')
+
+# Right panel — dendrogram using Ward linkage
+Z_ward = linkage(X_hier, method='ward')
+dendrogram(
+    Z_ward,
+    truncate_mode='lastp',
+    p=30,
+    leaf_rotation=90,
+    leaf_font_size=8,
+    ax=axes[1],
+    color_threshold=12
+)
+axes[1].set_title('Dendrogram (Ward Linkage, truncated to 30 leaves)')
+axes[1].set_xlabel('Cluster (size)')
+axes[1].set_ylabel('Merge Distance')
+axes[1].axhline(y=12, color='r', linestyle='--', label='Cut at distance = 12')
+axes[1].legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+The dendrogram shows the full merge history. By drawing a horizontal cut line we decide how many clusters to keep — each vertical line that crosses the cut corresponds to one cluster.
+
+### Comparing linkage methods
+
+Let's visualise how the four linkage types partition the same dataset.
+
+```python
+linkage_methods = ['single', 'complete', 'average', 'ward']
+fig, axes = plt.subplots(1, 4, figsize=(20, 4.5))
+
+for ax, method in zip(axes, linkage_methods):
+    Z = linkage(X_hier, method=method)
+    labels = fcluster(Z, t=4, criterion='maxclust')
+    scatter = ax.scatter(
+        X_hier[:, 0], X_hier[:, 1],
+        c=labels, cmap='viridis', s=30, alpha=0.7, edgecolors='k', linewidths=0.3
+    )
+    ax.set_title(f'{method.capitalize()} linkage')
+    ax.set_xlabel('Feature 1')
+    ax.set_ylabel('Feature 2')
+
+plt.suptitle('Agglomerative Clustering — 4 Linkage Methods (k=4)', fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+```python
+# Scikit-learn's AgglomerativeClustering with Ward linkage
+agg = AgglomerativeClustering(n_clusters=4, linkage='ward')
+agg_labels = agg.fit_predict(X_hier)
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+axes[0].scatter(
+    X_hier[:, 0], X_hier[:, 1],
+    c=y_hier, cmap='tab10', s=40, alpha=0.7, edgecolors='k', linewidths=0.3
+)
+axes[0].set_title('Ground-Truth Labels')
+axes[0].set_xlabel('Feature 1')
+axes[0].set_ylabel('Feature 2')
+
+axes[1].scatter(
+    X_hier[:, 0], X_hier[:, 1],
+    c=agg_labels, cmap='tab10', s=40, alpha=0.7, edgecolors='k', linewidths=0.3
+)
+axes[1].set_title('AgglomerativeClustering (Ward, k=4)')
+axes[1].set_xlabel('Feature 1')
+axes[1].set_ylabel('Feature 2')
+
+plt.tight_layout()
+plt.show()
+
+print(f"Cluster sizes: {np.bincount(agg_labels)}")
+```
+
+---
+
+## 2. DBSCAN
+
+**DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) takes a fundamentally different approach to clustering:
+
+- It does **not** require the number of clusters in advance.
+- It defines clusters as **dense regions** separated by sparse regions.
+- Points that don't belong to any dense region are labelled as **noise** (label = -1).
+
+### Key parameters
+
+| Parameter | Meaning |
+|-----------|--------|
+| `eps` (ε) | Maximum distance between two points for them to be considered neighbours |
+| `min_samples` | Minimum number of points within ε-distance to form a dense region |
+
+### Point types
+
+- **Core point** — has at least `min_samples` neighbours within ε.
+- **Border point** — within ε of a core point but doesn't have enough neighbours itself.
+- **Noise point** — neither core nor border; isolated outliers.
+
+DBSCAN can discover clusters of **arbitrary shape** and naturally identifies outliers — something centroid-based methods like K-Means cannot do.
+
+```python
+from sklearn.datasets import make_moons
+from sklearn.cluster import KMeans, DBSCAN
+
+np.random.seed(42)
+
+# Generate two moons (non-convex dataset)
+X_moons, y_moons = make_moons(n_samples=500, noise=0.08, random_state=42)
+
+# Apply DBSCAN and K-Means
+db_moons = DBSCAN(eps=0.2, min_samples=5).fit(X_moons)
+km_moons = KMeans(n_clusters=2, random_state=42, n_init=10).fit(X_moons)
+
+fig, axes = plt.subplots(1, 3, figsize=(18, 5))
+
+axes[0].scatter(X_moons[:, 0], X_moons[:, 1], c=y_moons, cmap='coolwarm', s=20, alpha=0.7)
+axes[0].set_title('Ground Truth')
+axes[0].set_xlabel('Feature 1')
+axes[0].set_ylabel('Feature 2')
+
+axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c=km_moons.labels_, cmap='coolwarm', s=20, alpha=0.7)
+axes[1].scatter(km_moons.cluster_centers_[:, 0], km_moons.cluster_centers_[:, 1],
+                marker='X', s=200, c='black', edgecolors='white', linewidths=1.5)
+axes[1].set_title('K-Means (k=2) — Fails on non-convex shapes')
+axes[1].set_xlabel('Feature 1')
+axes[1].set_ylabel('Feature 2')
+
+axes[2].scatter(X_moons[:, 0], X_moons[:, 1], c=db_moons.labels_, cmap='coolwarm', s=20, alpha=0.7)
+axes[2].set_title('DBSCAN (eps=0.2) — Correctly separates crescents')
+axes[2].set_xlabel('Feature 1')
+axes[2].set_ylabel('Feature 2')
+
+plt.suptitle('K-Means vs DBSCAN on the Moons Dataset', fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 3. Choosing DBSCAN Parameters
+
+Picking `eps` and `min_samples` can be tricky. A practical heuristic:
+
+1. Set `min_samples` ≈ 2 × number of features (a reasonable default).
+2. For each point compute the distance to its **k-th nearest neighbour** (k = `min_samples`).
+3. Sort these distances and plot them — the **k-distance graph**.
+4. Look for the "elbow" — the point where the curve bends sharply upward. The distance at that elbow is a good candidate for `eps`.
+
+```python
+from sklearn.neighbors import NearestNeighbors
+
+k = 5  # same as min_samples
+nn = NearestNeighbors(n_neighbors=k)
+nn.fit(X_moons)
+distances, _ = nn.kneighbors(X_moons)
+
+k_distances = np.sort(distances[:, k - 1])[::-1]
+
+plt.figure(figsize=(10, 5))
+plt.plot(k_distances, linewidth=1.5)
+plt.axhline(y=0.2, color='r', linestyle='--', label='eps = 0.2 (our choice)')
+plt.title(f'k-Distance Graph (k={k}) — Elbow Indicates Good eps')
+plt.xlabel('Points (sorted by descending k-distance)')
+plt.ylabel(f'Distance to {k}-th Nearest Neighbour')
+plt.legend()
+plt.grid(True, alpha=0.3)
+plt.show()
+```
+
+```python
+# Effect of different eps values on DBSCAN results
+eps_values = [0.05, 0.1, 0.2, 0.3, 0.5]
+fig, axes = plt.subplots(1, len(eps_values), figsize=(22, 4))
+
+for ax, eps in zip(axes, eps_values):
+    db = DBSCAN(eps=eps, min_samples=5).fit(X_moons)
+    labels = db.labels_
+    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
+    n_noise = (labels == -1).sum()
+
+    unique_labels = set(labels)
+    colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
+
+    for k_label, col in zip(sorted(unique_labels), colors):
+        if k_label == -1:
+            col = [0, 0, 0, 1]  # black for noise
+        mask = labels == k_label
+        ax.scatter(X_moons[mask, 0], X_moons[mask, 1], c=[col], s=15, alpha=0.7)
+
+    ax.set_title(f'eps={eps}\n{n_clusters} clusters, {n_noise} noise')
+    ax.set_xlabel('Feature 1')
+
+axes[0].set_ylabel('Feature 2')
+plt.suptitle('Effect of eps on DBSCAN (min_samples=5)', fontsize=14, y=1.05)
+plt.tight_layout()
+plt.show()
+```
+
+**Observations:**
+- **eps too small** (0.05) → most points classified as noise; many tiny clusters.
+- **eps just right** (0.2) → two clean crescent clusters with very little noise.
+- **eps too large** (0.5) → everything merges into a single cluster.
+
+The k-distance graph helps you find that sweet spot without trial and error.
+
+---
+
+## 4. Gaussian Mixture Models
+
+A **Gaussian Mixture Model** assumes that the data is generated from a mixture of a finite number of Gaussian (normal) distributions with unknown parameters.
+
+### GMM vs K-Means
+
+| Aspect | K-Means | GMM |
+|--------|---------|-----|
+| Cluster assignment | **Hard** — each point belongs to exactly one cluster | **Soft** — each point has a probability for every cluster |
+| Cluster shape | Spherical (Voronoi cells) | Elliptical (full covariance matrices) |
+| Outlier handling | None — every point is assigned | Naturally down-weights low-probability points |
+| Output | Cluster label | Probability vector over all clusters |
+
+GMMs are fit using the **Expectation-Maximisation (EM)** algorithm:
+1. **E-step** — compute the probability that each point belongs to each Gaussian component.
+2. **M-step** — update each component's mean, covariance, and weight to maximise log-likelihood.
+3. Repeat until convergence.
+
+```python
+from sklearn.mixture import GaussianMixture
+
+np.random.seed(42)
+
+# Create elongated / elliptical clusters that challenge K-Means
+n_per_cluster = 200
+cov1 = [[2.0, 1.5], [1.5, 1.5]]
+cov2 = [[1.5, -1.2], [-1.2, 1.5]]
+cov3 = [[0.5, 0.0], [0.0, 2.5]]
+
+cluster1 = np.random.multivariate_normal([0, 0], cov1, n_per_cluster)
+cluster2 = np.random.multivariate_normal([5, 5], cov2, n_per_cluster)
+cluster3 = np.random.multivariate_normal([8, 0], cov3, n_per_cluster)
+
+X_gmm = np.vstack([cluster1, cluster2, cluster3])
+y_gmm_true = np.array([0]*n_per_cluster + [1]*n_per_cluster + [2]*n_per_cluster)
+
+fig, axes = plt.subplots(1, 3, figsize=(18, 5))
+
+# Ground truth
+axes[0].scatter(X_gmm[:, 0], X_gmm[:, 1], c=y_gmm_true, cmap='tab10', s=15, alpha=0.6)
+axes[0].set_title('Ground Truth (Elliptical Clusters)')
+axes[0].set_xlabel('Feature 1')
+axes[0].set_ylabel('Feature 2')
+
+# K-Means
+km_gmm = KMeans(n_clusters=3, random_state=42, n_init=10).fit(X_gmm)
+axes[1].scatter(X_gmm[:, 0], X_gmm[:, 1], c=km_gmm.labels_, cmap='tab10', s=15, alpha=0.6)
+axes[1].scatter(km_gmm.cluster_centers_[:, 0], km_gmm.cluster_centers_[:, 1],
+                marker='X', s=200, c='black', edgecolors='white', linewidths=1.5)
+axes[1].set_title('K-Means (k=3) — Spherical assumption')
+axes[1].set_xlabel('Feature 1')
+axes[1].set_ylabel('Feature 2')
+
+# GMM
+gmm = GaussianMixture(n_components=3, covariance_type='full', random_state=42)
+gmm.fit(X_gmm)
+gmm_labels = gmm.predict(X_gmm)
+axes[2].scatter(X_gmm[:, 0], X_gmm[:, 1], c=gmm_labels, cmap='tab10', s=15, alpha=0.6)
+axes[2].set_title('GMM (3 components) — Elliptical fit')
+axes[2].set_xlabel('Feature 1')
+axes[2].set_ylabel('Feature 2')
+
+plt.suptitle('K-Means vs GMM on Elliptical Clusters', fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+```python
+# Visualise GMM probability contours
+x_min, x_max = X_gmm[:, 0].min() - 2, X_gmm[:, 0].max() + 2
+y_min, y_max = X_gmm[:, 1].min() - 2, X_gmm[:, 1].max() + 2
+xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300), np.linspace(y_min, y_max, 300))
+grid_points = np.column_stack([xx.ravel(), yy.ravel()])
+
+log_prob = gmm.score_samples(grid_points)
+log_prob = log_prob.reshape(xx.shape)
+
+fig, ax = plt.subplots(figsize=(10, 7))
+ax.contourf(xx, yy, np.exp(log_prob), levels=30, cmap='YlOrRd', alpha=0.6)
+ax.contour(xx, yy, np.exp(log_prob), levels=10, colors='darkred', linewidths=0.5, alpha=0.5)
+ax.scatter(X_gmm[:, 0], X_gmm[:, 1], c=gmm_labels, cmap='tab10', s=10, alpha=0.7,
+           edgecolors='k', linewidths=0.2)
+
+for i in range(gmm.n_components):
+    ax.scatter(gmm.means_[i, 0], gmm.means_[i, 1],
+               marker='+', s=300, c='black', linewidths=3)
+
+ax.set_title('GMM Probability Density Contours')
+ax.set_xlabel('Feature 1')
+ax.set_ylabel('Feature 2')
+plt.tight_layout()
+plt.show()
+```
+
+### Model selection with BIC and AIC
+
+How many Gaussian components should we use? We can use information criteria:
+
+- **BIC** (Bayesian Information Criterion) — penalises model complexity more heavily.
+- **AIC** (Akaike Information Criterion) — lighter penalty.
+
+**Lower is better** for both. We fit GMMs with different numbers of components and pick the one with the lowest BIC (or AIC).
+
+```python
+n_components_range = range(1, 10)
+bic_scores = []
+aic_scores = []
+
+for n in n_components_range:
+    gmm_test = GaussianMixture(n_components=n, covariance_type='full', random_state=42)
+    gmm_test.fit(X_gmm)
+    bic_scores.append(gmm_test.bic(X_gmm))
+    aic_scores.append(gmm_test.aic(X_gmm))
+
+fig, ax = plt.subplots(figsize=(10, 5))
+ax.plot(list(n_components_range), bic_scores, 'bo-', label='BIC', linewidth=2)
+ax.plot(list(n_components_range), aic_scores, 'rs--', label='AIC', linewidth=2)
+ax.axvline(x=3, color='green', linestyle=':', alpha=0.7, label='True number of components (3)')
+ax.set_xlabel('Number of Components')
+ax.set_ylabel('Score (lower is better)')
+ax.set_title('GMM Model Selection: BIC and AIC')
+ax.legend()
+ax.grid(True, alpha=0.3)
+plt.tight_layout()
+plt.show()
+
+print(f"Best BIC at n_components = {np.argmin(bic_scores) + 1}")
+print(f"Best AIC at n_components = {np.argmin(aic_scores) + 1}")
+```
+
+---
+
+## 5. Algorithm Comparison
+
+Let's put all four algorithms head-to-head on three different data geometries:
+
+1. **Blobs** — well-separated spherical clusters
+2. **Moons** — two interleaving crescents
+3. **Varied-variance blobs** — spherical clusters with very different densities
+
+```python
+from sklearn.preprocessing import StandardScaler
+
+np.random.seed(42)
+
+n_samples = 500
+
+# Dataset 1: standard blobs
+X_blobs, y_blobs = make_blobs(n_samples=n_samples, centers=3, cluster_std=1.0, random_state=42)
+
+# Dataset 2: moons
+X_moons2, y_moons2 = make_moons(n_samples=n_samples, noise=0.07, random_state=42)
+
+# Dataset 3: varied-variance blobs
+X_varied, y_varied = make_blobs(
+    n_samples=n_samples, centers=3, cluster_std=[0.5, 2.5, 1.0], random_state=42
+)
+
+datasets = [
+    ('Blobs', X_blobs, {'n_clusters': 3, 'eps': 1.0}),
+    ('Moons', X_moons2, {'n_clusters': 2, 'eps': 0.2}),
+    ('Varied', X_varied, {'n_clusters': 3, 'eps': 1.5}),
+]
+
+fig, axes = plt.subplots(3, 4, figsize=(22, 15))
+
+for row, (name, X, params) in enumerate(datasets):
+    X_scaled = StandardScaler().fit_transform(X)
+    n_c = params['n_clusters']
+    eps = params['eps']
+
+    # K-Means
+    km = KMeans(n_clusters=n_c, random_state=42, n_init=10).fit(X_scaled)
+    # Agglomerative
+    agg = AgglomerativeClustering(n_clusters=n_c, linkage='ward').fit(X_scaled)
+    # DBSCAN
+    db = DBSCAN(eps=eps, min_samples=5).fit(X_scaled)
+    # GMM
+    gm = GaussianMixture(n_components=n_c, random_state=42).fit(X_scaled)
+
+    results = [
+        ('K-Means', km.labels_),
+        ('Agglomerative', agg.labels_),
+        ('DBSCAN', db.labels_),
+        ('GMM', gm.predict(X_scaled)),
+    ]
+
+    for col, (algo_name, labels) in enumerate(results):
+        ax = axes[row, col]
+        unique_labels = set(labels)
+        n_clust = len(unique_labels) - (1 if -1 in unique_labels else 0)
+
+        noise_mask = labels == -1
+        ax.scatter(X_scaled[~noise_mask, 0], X_scaled[~noise_mask, 1],
+                   c=labels[~noise_mask], cmap='viridis', s=12, alpha=0.7)
+        if noise_mask.any():
+            ax.scatter(X_scaled[noise_mask, 0], X_scaled[noise_mask, 1],
+                       c='red', marker='x', s=15, alpha=0.5, label='noise')
+            ax.legend(fontsize=8)
+
+        if row == 0:
+            ax.set_title(algo_name, fontsize=13, fontweight='bold')
+        ax.set_ylabel(f'{name}' if col == 0 else '', fontsize=12)
+        ax.text(0.02, 0.98, f'{n_clust} cluster(s)',
+                transform=ax.transAxes, fontsize=9, va='top',
+                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
+
+plt.suptitle('Algorithm Comparison Across Data Geometries', fontsize=16, y=1.01)
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## Summary
+
+### When to use each algorithm
+
+| Algorithm | Best for | Weaknesses | Must specify k? |
+|-----------|----------|------------|-----------------|
+| **K-Means** | Large datasets with spherical clusters | Cannot handle non-convex shapes; sensitive to outliers | Yes |
+| **Agglomerative Clustering** | Small-to-medium datasets; exploring hierarchy | O(n³) time complexity; hard to scale | Yes (or cut dendrogram) |
+| **DBSCAN** | Arbitrary shapes; datasets with noise/outliers | Sensitive to `eps`; struggles with varying densities | No |
+| **Gaussian Mixture Model** | Elliptical clusters; need soft assignments | Assumes Gaussian components; sensitive to initialisation | Yes |
+
+### Rules of thumb
+
+1. **Start simple:** try K-Means first. If results look poor, consider the data geometry.
+2. **Non-convex shapes?** → Use DBSCAN.
+3. **Elliptical or overlapping clusters?** → Use GMM.
+4. **Need a hierarchy or dendrogram?** → Use Agglomerative Clustering.
+5. **Noisy data with outliers?** → DBSCAN naturally handles noise.
+6. **Need probability estimates?** → GMM provides soft assignments.
+
+---
+*Generated by Berta AI | Created by Luigi Pascal Rondanini*
diff --git a/docs/chapters/content/ch08-03_advanced.md b/docs/chapters/content/ch08-03_advanced.md
new file mode 100644
index 0000000..1e08ea4
--- /dev/null
+++ b/docs/chapters/content/ch08-03_advanced.md
@@ -0,0 +1,687 @@
+# Ch 8: Unsupervised Learning - Advanced
+
+**Track**: Practitioner | [Try code in Playground](../../playground.md) | [Back to chapter overview](../chapter-08.md)
+
+
+!!! tip "Read online or run locally"
+    You can read this content here on the web. To run the code interactively,
+    either use the [Playground](../../playground.md) or clone the repo and open
+    `chapters/chapter-08-unsupervised-learning/notebooks/03_advanced.ipynb` in Jupyter.
+
+---
+
+# Chapter 8: Unsupervised Learning
+## Notebook 03 - Advanced: Dimensionality Reduction & Capstone
+
+Reduce high-dimensional data for visualization and modeling, detect anomalies, and build a complete customer segmentation system.
+
+**What you'll learn:**
+- Principal Component Analysis (PCA) from scratch
+- t-SNE for 2D visualization
+- Anomaly detection with Isolation Forest
+- Customer segmentation capstone project
+
+**Time estimate:** 3 hours
+
+---
+
+## 1. PCA Theory
+
+### The Core Idea
+
+PCA is a **linear** dimensionality-reduction technique that finds the directions (called **principal components**) along which the data varies the most.
+
+Imagine a cloud of 3-D points shaped like a flat pancake. Two axes capture almost all of the spread; the third adds very little information. PCA discovers those two dominant axes automatically.
+
+### Algorithm Steps
+
+1. **Center the data** — subtract the mean of each feature so that the cloud is centered at the origin.
+2. **Compute the covariance matrix** — a \(d \times d\) matrix (where \(d\) is the number of features) that captures pairwise linear relationships.
+3. **Eigendecomposition** — find the eigenvectors and eigenvalues of the covariance matrix. Each eigenvector is a principal component direction; its eigenvalue tells us how much variance that direction explains.
+4. **Sort & select** — rank components by eigenvalue (descending) and keep the top \(k\) to reduce dimensionality from \(d\) to \(k\).
+5. **Project** — multiply the centered data by the selected eigenvectors to obtain the lower-dimensional representation.
+
+### Variance Explained Ratio
+
+The variance explained ratio for component \(i\) is \(\lambda_i / \sum_{j=1}^{d} \lambda_j\), where \(\lambda_i\) is the \(i\)-th eigenvalue. The **cumulative** variance explained tells us how much total information is retained when we keep the first \(k\) components.
+
+---
+
+## 2. PCA From Scratch
+
+We implement PCA using only NumPy and apply it to the classic **Iris** dataset (4 features → 2 components).
+
+```python
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.datasets import load_iris
+
+np.random.seed(42)
+
+# Load the Iris dataset (4 features, 150 samples, 3 classes)
+iris = load_iris()
+X = iris.data          # shape (150, 4)
+y = iris.target        # 0, 1, 2
+feature_names = iris.feature_names
+target_names = iris.target_names
+
+print(f"Dataset shape: {X.shape}")
+print(f"Features: {feature_names}")
+print(f"Classes:  {list(target_names)}")
+```
+
+```python
+def pca_from_scratch(X, n_components=2):
+    """Implement PCA using NumPy."""
+    # Step 1: Center the data
+    mean = np.mean(X, axis=0)
+    X_centered = X - mean
+
+    # Step 2: Covariance matrix (features × features)
+    cov_matrix = np.cov(X_centered, rowvar=False)
+
+    # Step 3: Eigendecomposition
+    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
+
+    # Step 4: Sort by eigenvalue descending
+    sorted_idx = np.argsort(eigenvalues)[::-1]
+    eigenvalues = eigenvalues[sorted_idx]
+    eigenvectors = eigenvectors[:, sorted_idx]
+
+    # Variance explained ratio
+    variance_ratio = eigenvalues / eigenvalues.sum()
+
+    # Step 5: Project onto top-k components
+    W = eigenvectors[:, :n_components]
+    X_projected = X_centered @ W
+
+    return X_projected, eigenvalues, variance_ratio, W
+
+
+X_pca_scratch, eigenvalues, var_ratio, components = pca_from_scratch(X, n_components=2)
+
+print("Eigenvalues:", np.round(eigenvalues, 4))
+print("Variance explained ratio:", np.round(var_ratio, 4))
+print(f"Total variance retained (2 components): {var_ratio[:2].sum():.2%}")
+```
+
+```python
+# Variance Explained Bar + Cumulative Line
+fig, axes = plt.subplots(1, 2, figsize=(13, 5))
+
+# Left: bar chart of individual variance ratios
+axes[0].bar(range(1, len(var_ratio) + 1), var_ratio, color="steelblue", edgecolor="black")
+axes[0].set_xlabel("Principal Component")
+axes[0].set_ylabel("Variance Explained Ratio")
+axes[0].set_title("Variance Explained by Each Component")
+axes[0].set_xticks(range(1, len(var_ratio) + 1))
+
+# Right: cumulative variance explained
+cumulative = np.cumsum(var_ratio)
+axes[1].plot(range(1, len(cumulative) + 1), cumulative, "o-", color="darkorange", linewidth=2)
+axes[1].axhline(y=0.95, color="red", linestyle="--", label="95% threshold")
+axes[1].set_xlabel("Number of Components")
+axes[1].set_ylabel("Cumulative Variance Explained")
+axes[1].set_title("Cumulative Variance Explained")
+axes[1].set_xticks(range(1, len(cumulative) + 1))
+axes[1].legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+```python
+# 2-D scatter plot of the scratch PCA projection
+colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]
+
+plt.figure(figsize=(8, 6))
+for i, name in enumerate(target_names):
+    mask = y == i
+    plt.scatter(X_pca_scratch[mask, 0], X_pca_scratch[mask, 1],
+                label=name, alpha=0.7, edgecolors="k", linewidth=0.5,
+                color=colors[i], s=60)
+plt.xlabel(f"PC 1 ({var_ratio[0]:.1%} variance)")
+plt.ylabel(f"PC 2 ({var_ratio[1]:.1%} variance)")
+plt.title("PCA From Scratch — Iris Dataset (2-D Projection)")
+plt.legend()
+plt.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+---
+
+## 3. PCA with Scikit-learn
+
+We verify our scratch implementation against the well-optimized `sklearn.decomposition.PCA`.
+
+```python
+from sklearn.decomposition import PCA
+
+pca_sk = PCA(n_components=4)  # keep all 4 to inspect variance
+X_pca_sk_full = pca_sk.fit_transform(X)
+
+print("Sklearn variance explained ratio:", np.round(pca_sk.explained_variance_ratio_, 4))
+print("Scratch variance explained ratio: ", np.round(var_ratio, 4))
+print()
+print("Cumulative (sklearn):", np.round(np.cumsum(pca_sk.explained_variance_ratio_), 4))
+```
+
+```python
+X_pca_sk = X_pca_sk_full[:, :2]  # first 2 components
+
+# Sign of eigenvectors can flip — align for visual comparison
+for col in range(2):
+    if np.corrcoef(X_pca_scratch[:, col], X_pca_sk[:, col])[0, 1] < 0:
+        X_pca_scratch[:, col] *= -1
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharex=True, sharey=True)
+
+for ax, data, title in zip(axes,
+                            [X_pca_scratch, X_pca_sk],
+                            ["PCA (from scratch)", "PCA (scikit-learn)"]):
+    for i, name in enumerate(target_names):
+        mask = y == i
+        ax.scatter(data[mask, 0], data[mask, 1], label=name,
+                   alpha=0.7, edgecolors="k", linewidth=0.5,
+                   color=colors[i], s=60)
+    ax.set_xlabel("PC 1")
+    ax.set_ylabel("PC 2")
+    ax.set_title(title)
+    ax.legend()
+    ax.grid(alpha=0.3)
+
+plt.suptitle("Scratch vs Scikit-learn PCA — Identical Results", fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+The two plots are virtually identical (eigenvector signs may differ, which is cosmetic). This confirms our from-scratch implementation is correct.
+
+---
+
+## 4. t-SNE
+
+### What is t-SNE?
+
+**t-distributed Stochastic Neighbor Embedding (t-SNE)** is a non-linear dimensionality-reduction technique designed specifically for **visualization**.
+
+Key properties:
+- Preserves **local structure**: points that are close in high-dimensional space stay close in the 2-D embedding.
+- Does **not** preserve global distances — clusters may move relative to each other between runs.
+- Computationally expensive — not suitable as a preprocessing step in machine-learning pipelines.
+- The **perplexity** parameter (roughly: how many neighbors each point considers) strongly influences the result. Typical range: 5–50.
+
+**Rule of thumb:** Use PCA when you need a general-purpose reduction (for modeling, compression, noise removal). Use t-SNE when your sole goal is to *see* cluster structure in 2-D.
+
+```python
+from sklearn.manifold import TSNE
+
+tsne = TSNE(n_components=2, perplexity=30, random_state=42, n_iter=1000)
+X_tsne = tsne.fit_transform(X)
+
+print(f"t-SNE output shape: {X_tsne.shape}")
+```
+
+```python
+# Side-by-side: PCA vs t-SNE
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+for ax, data, title in zip(axes,
+                            [X_pca_sk, X_tsne],
+                            ["PCA (linear)", "t-SNE (non-linear)"]):
+    for i, name in enumerate(target_names):
+        mask = y == i
+        ax.scatter(data[mask, 0], data[mask, 1], label=name,
+                   alpha=0.7, edgecolors="k", linewidth=0.5,
+                   color=colors[i], s=60)
+    ax.set_xlabel("Dim 1")
+    ax.set_ylabel("Dim 2")
+    ax.set_title(title)
+    ax.legend()
+    ax.grid(alpha=0.3)
+
+plt.suptitle("PCA vs t-SNE — Iris Dataset", fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+```python
+# Effect of perplexity on t-SNE
+perplexities = [5, 15, 30, 50]
+fig, axes = plt.subplots(1, 4, figsize=(20, 4))
+
+for ax, perp in zip(axes, perplexities):
+    embedding = TSNE(n_components=2, perplexity=perp,
+                     random_state=42, n_iter=1000).fit_transform(X)
+    for i, name in enumerate(target_names):
+        mask = y == i
+        ax.scatter(embedding[mask, 0], embedding[mask, 1],
+                   alpha=0.7, color=colors[i], s=40, edgecolors="k",
+                   linewidth=0.3, label=name)
+    ax.set_title(f"Perplexity = {perp}")
+    ax.set_xticks([])
+    ax.set_yticks([])
+
+axes[0].legend(fontsize=8)
+plt.suptitle("t-SNE: Impact of Perplexity", fontsize=14, y=1.04)
+plt.tight_layout()
+plt.show()
+```
+
+**Observations on perplexity:**
+- Low perplexity (5): focuses on very local neighbors — clusters may fragment.
+- High perplexity (50): considers more neighbors — clusters become rounder and more global structure is visible, but fine local detail may blur.
+- There is no single "correct" perplexity; try several and look for consistent patterns.
+
+---
+
+## 5. Anomaly Detection
+
+### Why Unsupervised Anomaly Detection?
+
+In many real-world scenarios, labeled anomalies are scarce or non-existent:
+
+| Domain | Normal | Anomaly |
+|--------|--------|---------|
+| Banking | Legitimate transactions | Fraud |
+| Manufacturing | Good products | Defects |
+| Cybersecurity | Regular traffic | Intrusions |
+
+Unsupervised methods learn the distribution of *normal* data and flag anything that doesn't fit.
+
+### Approach 1 — Z-Score
+
+Flag a point as anomalous if any feature has a Z-score \(|z| > \tau\) (e.g., \(\tau = 3\)). Simple, but assumes Gaussian features and works only for univariate or low-dimensional data.
+
+### Approach 2 — Isolation Forest
+
+The **Isolation Forest** algorithm isolates observations by randomly selecting a feature and a split value. Anomalies are easier to isolate (fewer splits needed), so they have shorter average path lengths in the trees.
+
+Advantages:
+- Works well in high dimensions
+- No distribution assumptions
+- Linear time complexity
+
+```python
+from sklearn.ensemble import IsolationForest
+from scipy import stats
+
+np.random.seed(42)
+
+# Generate normal data: 2 clusters
+normal_a = np.random.randn(150, 2) * 0.8 + np.array([2, 2])
+normal_b = np.random.randn(150, 2) * 0.8 + np.array([-2, -2])
+normal_data = np.vstack([normal_a, normal_b])
+
+# Inject 20 anomalies scattered far from the clusters
+anomalies = np.random.uniform(low=-6, high=6, size=(20, 2))
+
+X_anom = np.vstack([normal_data, anomalies])
+labels_true = np.array([0] * len(normal_data) + [1] * len(anomalies))  # 0=normal, 1=anomaly
+
+print(f"Total points: {len(X_anom)}  (normal: {len(normal_data)}, anomalies: {len(anomalies)})")
+```
+
+```python
+# Z-Score method
+z_scores = np.abs(stats.zscore(X_anom))
+z_threshold = 3.0
+z_anomaly_mask = (z_scores > z_threshold).any(axis=1)
+
+print(f"Z-Score method detected {z_anomaly_mask.sum()} anomalies (threshold={z_threshold})")
+```
+
+```python
+# Isolation Forest
+iso_forest = IsolationForest(n_estimators=200, contamination=0.06,
+                            random_state=42)
+iso_preds = iso_forest.fit_predict(X_anom)  # 1 = normal, -1 = anomaly
+iso_anomaly_mask = iso_preds == -1
+
+print(f"Isolation Forest detected {iso_anomaly_mask.sum()} anomalies")
+```
+
+```python
+fig, axes = plt.subplots(1, 3, figsize=(18, 5))
+
+# Ground truth
+axes[0].scatter(X_anom[labels_true == 0, 0], X_anom[labels_true == 0, 1],
+                c="steelblue", s=30, alpha=0.6, label="Normal")
+axes[0].scatter(X_anom[labels_true == 1, 0], X_anom[labels_true == 1, 1],
+                c="red", s=80, marker="X", label="True Anomaly")
+axes[0].set_title("Ground Truth")
+axes[0].legend()
+axes[0].grid(alpha=0.3)
+
+# Z-Score
+axes[1].scatter(X_anom[~z_anomaly_mask, 0], X_anom[~z_anomaly_mask, 1],
+                c="steelblue", s=30, alpha=0.6, label="Normal")
+axes[1].scatter(X_anom[z_anomaly_mask, 0], X_anom[z_anomaly_mask, 1],
+                c="red", s=80, marker="X", label="Detected Anomaly")
+axes[1].set_title(f"Z-Score (threshold={z_threshold})")
+axes[1].legend()
+axes[1].grid(alpha=0.3)
+
+# Isolation Forest
+axes[2].scatter(X_anom[~iso_anomaly_mask, 0], X_anom[~iso_anomaly_mask, 1],
+                c="steelblue", s=30, alpha=0.6, label="Normal")
+axes[2].scatter(X_anom[iso_anomaly_mask, 0], X_anom[iso_anomaly_mask, 1],
+                c="red", s=80, marker="X", label="Detected Anomaly")
+axes[2].set_title("Isolation Forest")
+axes[2].legend()
+axes[2].grid(alpha=0.3)
+
+plt.suptitle("Anomaly Detection Comparison", fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+**Key takeaway:** The Isolation Forest typically outperforms the Z-Score method, especially when the data is multi-modal or the anomalies are not simply extreme values along a single axis.
+
+---
+
+## 6. Capstone — Customer Segmentation
+
+We build a complete customer-segmentation pipeline:
+
+1. Generate & save a synthetic customer dataset
+2. Feature scaling
+3. Dimensionality reduction with PCA
+4. Elbow method to choose optimal \(K\)
+5. K-Means clustering
+6. Segment profiling & visualization
+7. Business recommendations
+
+### 6.1 Generate Synthetic Customer Data
+
+We create five features that mimic a retail scenario:
+
+| Feature | Description |
+|---------|-------------|
+| `age` | Customer age (18–70) |
+| `income` | Annual income in $k (15–150) |
+| `spending_score` | In-store spending score (1–100) |
+| `visits` | Monthly store visits (0–30) |
+| `online_ratio` | Fraction of purchases made online (0–1) |
+
+```python
+import pandas as pd
+import os
+
+np.random.seed(42)
+n_customers = 500
+
+# Segment 1: Young, moderate income, high online, high spending
+seg1 = {
+    "age": np.random.normal(25, 4, 130).clip(18, 40),
+    "income": np.random.normal(45, 12, 130).clip(15, 80),
+    "spending_score": np.random.normal(75, 10, 130).clip(1, 100),
+    "visits": np.random.normal(8, 3, 130).clip(0, 30),
+    "online_ratio": np.random.normal(0.75, 0.1, 130).clip(0, 1),
+}
+
+# Segment 2: Middle-aged, high income, balanced channel, moderate spending
+seg2 = {
+    "age": np.random.normal(42, 6, 150).clip(28, 60),
+    "income": np.random.normal(95, 18, 150).clip(50, 150),
+    "spending_score": np.random.normal(55, 12, 150).clip(1, 100),
+    "visits": np.random.normal(15, 5, 150).clip(0, 30),
+    "online_ratio": np.random.normal(0.45, 0.15, 150).clip(0, 1),
+}
+
+# Segment 3: Older, lower income, low online, low spending
+seg3 = {
+    "age": np.random.normal(58, 7, 120).clip(40, 70),
+    "income": np.random.normal(35, 10, 120).clip(15, 70),
+    "spending_score": np.random.normal(25, 10, 120).clip(1, 100),
+    "visits": np.random.normal(20, 5, 120).clip(0, 30),
+    "online_ratio": np.random.normal(0.15, 0.08, 120).clip(0, 1),
+}
+
+# Segment 4: Mixed ages, very high income, high spending, moderate visits
+seg4 = {
+    "age": np.random.normal(38, 10, 100).clip(18, 70),
+    "income": np.random.normal(120, 15, 100).clip(80, 150),
+    "spending_score": np.random.normal(85, 8, 100).clip(1, 100),
+    "visits": np.random.normal(12, 4, 100).clip(0, 30),
+    "online_ratio": np.random.normal(0.55, 0.15, 100).clip(0, 1),
+}
+
+frames = []
+for seg in [seg1, seg2, seg3, seg4]:
+    frames.append(pd.DataFrame(seg))
+
+df_customers = pd.concat(frames, ignore_index=True)
+df_customers = df_customers.sample(frac=1, random_state=42).reset_index(drop=True)
+
+df_customers["age"] = df_customers["age"].round(0).astype(int)
+df_customers["income"] = df_customers["income"].round(1)
+df_customers["spending_score"] = df_customers["spending_score"].round(0).astype(int)
+df_customers["visits"] = df_customers["visits"].round(0).astype(int)
+df_customers["online_ratio"] = df_customers["online_ratio"].round(2)
+
+# Save to CSV (run from chapter folder: chapters/chapter-08-unsupervised-learning/)
+dataset_dir = "datasets"
+os.makedirs(dataset_dir, exist_ok=True)
+csv_path = os.path.join(dataset_dir, "customers.csv")
+df_customers.to_csv(csv_path, index=False)
+print(f"Saved {len(df_customers)} rows to {csv_path}")
+df_customers.head(10)
+```
+
+### 6.2 Feature Scaling
+
+```python
+from sklearn.preprocessing import StandardScaler
+
+feature_cols = ["age", "income", "spending_score", "visits", "online_ratio"]
+X_cust = df_customers[feature_cols].values
+
+scaler = StandardScaler()
+X_scaled = scaler.fit_transform(X_cust)
+
+print("Scaled means (≈0):", np.round(X_scaled.mean(axis=0), 4))
+print("Scaled stds  (≈1):", np.round(X_scaled.std(axis=0), 4))
+```
+
+### 6.3 PCA for Dimensionality Reduction
+
+```python
+pca_cust = PCA(n_components=5)
+X_pca_cust = pca_cust.fit_transform(X_scaled)
+
+cum_var = np.cumsum(pca_cust.explained_variance_ratio_)
+
+plt.figure(figsize=(7, 4))
+plt.bar(range(1, 6), pca_cust.explained_variance_ratio_,
+        color="steelblue", edgecolor="black", alpha=0.7, label="Individual")
+plt.step(range(1, 6), cum_var, where="mid", color="darkorange",
+         linewidth=2, label="Cumulative")
+plt.axhline(0.90, color="red", linestyle="--", alpha=0.7, label="90% threshold")
+plt.xlabel("Principal Component")
+plt.ylabel("Variance Explained")
+plt.title("Customer Data — PCA Variance Explained")
+plt.xticks(range(1, 6))
+plt.legend()
+plt.tight_layout()
+plt.show()
+
+n_keep = np.argmax(cum_var >= 0.90) + 1
+print(f"\nComponents needed for ≥90% variance: {n_keep}")
+print(f"Using first 2 components for visualization ({cum_var[1]:.1%} variance).")
+```
+
+### 6.4 K-Means — Elbow Method
+
+```python
+from sklearn.cluster import KMeans
+
+K_range = range(2, 11)
+inertias = []
+
+for k in K_range:
+    km = KMeans(n_clusters=k, n_init=10, random_state=42)
+    km.fit(X_scaled)
+    inertias.append(km.inertia_)
+
+plt.figure(figsize=(8, 4))
+plt.plot(list(K_range), inertias, "o-", linewidth=2, color="steelblue")
+plt.xlabel("Number of Clusters (K)")
+plt.ylabel("Inertia (within-cluster sum of squares)")
+plt.title("Elbow Method for Optimal K")
+plt.xticks(list(K_range))
+plt.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+
+print("Look for the 'elbow' — the point where adding more clusters yields")
+print("diminishing returns. Here K=4 appears to be a good choice.")
+```
+
+### 6.5 Fit K-Means with Optimal K
+
+```python
+optimal_k = 4
+km_final = KMeans(n_clusters=optimal_k, n_init=20, random_state=42)
+cluster_labels = km_final.fit_predict(X_scaled)
+
+df_customers["cluster"] = cluster_labels
+print(f"Cluster distribution:\n{df_customers['cluster'].value_counts().sort_index()}")
+```
+
+### 6.6 Segment Profiling
+
+```python
+segment_profile = df_customers.groupby("cluster")[feature_cols].mean().round(2)
+segment_profile["count"] = df_customers.groupby("cluster").size()
+print("=== Segment Profiles ===")
+segment_profile
+```
+
+```python
+# Radar / parallel-coordinates style comparison
+fig, axes = plt.subplots(1, len(feature_cols), figsize=(18, 4), sharey=True)
+cluster_colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728"]
+
+for idx, feat in enumerate(feature_cols):
+    means = df_customers.groupby("cluster")[feat].mean()
+    axes[idx].bar(means.index, means.values,
+                  color=cluster_colors[:optimal_k], edgecolor="black")
+    axes[idx].set_title(feat, fontsize=11)
+    axes[idx].set_xlabel("Cluster")
+    axes[idx].set_xticks(range(optimal_k))
+
+axes[0].set_ylabel("Mean Value")
+plt.suptitle("Feature Means by Cluster", fontsize=14, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+### 6.7 Visualize Segments in 2-D (PCA Projection)
+
+```python
+X_vis = X_pca_cust[:, :2]
+centroids_scaled = km_final.cluster_centers_
+centroids_2d = pca_cust.transform(centroids_scaled)[:, :2]  # project centroids
+
+plt.figure(figsize=(9, 7))
+for c in range(optimal_k):
+    mask = cluster_labels == c
+    plt.scatter(X_vis[mask, 0], X_vis[mask, 1], s=40, alpha=0.6,
+                color=cluster_colors[c], edgecolors="k", linewidth=0.3,
+                label=f"Segment {c}")
+
+plt.scatter(centroids_2d[:, 0], centroids_2d[:, 1], s=250, c="black",
+            marker="*", zorder=5, label="Centroids")
+
+plt.xlabel(f"PC 1 ({pca_cust.explained_variance_ratio_[0]:.1%} var)")
+plt.ylabel(f"PC 2 ({pca_cust.explained_variance_ratio_[1]:.1%} var)")
+plt.title("Customer Segments — PCA 2-D Projection")
+plt.legend()
+plt.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+### 6.8 Business Recommendations
+
+```python
+recommendations = {
+    0: {
+        "label": "Budget Traditionalists",
+        "description": "Older customers with low income and spending, who shop mostly in-store.",
+        "actions": [
+            "Offer loyalty discounts and in-store promotions",
+            "Simplify the in-store experience",
+            "Provide personalized coupons at checkout",
+        ],
+    },
+    1: {
+        "label": "Young Digital Shoppers",
+        "description": "Young customers with moderate income but high online engagement and spending.",
+        "actions": [
+            "Invest in mobile app features and social media marketing",
+            "Offer free shipping and digital-only deals",
+            "Launch a referral program to leverage their network",
+        ],
+    },
+    2: {
+        "label": "Premium High-Spenders",
+        "description": "High income, high spending score — the most valuable segment.",
+        "actions": [
+            "Create a VIP/premium loyalty tier",
+            "Offer early access to new products",
+            "Assign dedicated account managers for retention",
+        ],
+    },
+    3: {
+        "label": "Established Moderates",
+        "description": "Middle-aged, higher income, moderate spending, balanced channel use.",
+        "actions": [
+            "Cross-sell higher-margin products",
+            "Provide omni-channel convenience (buy online, pick up in store)",
+            "Target with email campaigns for seasonal offers",
+        ],
+    },
+}
+
+for seg_id, info in recommendations.items():
+    count = (cluster_labels == seg_id).sum()
+    print(f"\n{'='*60}")
+    print(f"Segment {seg_id}: {info['label']}  (n={count})")
+    print(f"{'='*60}")
+    print(f"  {info['description']}")
+    print("  Recommended actions:")
+    for action in info["actions"]:
+        print(f"    • {action}")
+```
+
+---
+
+## 7. Summary
+
+### What We Covered in This Notebook
+
+| Topic | Key Idea |
+|-------|----------|
+| **PCA** | Linear projection onto directions of maximum variance |
+| **t-SNE** | Non-linear embedding that preserves local neighborhoods — for visualization only |
+| **Z-Score Anomaly Detection** | Simple threshold on standardized values |
+| **Isolation Forest** | Tree-based anomaly detector — fast, distribution-free |
+| **Customer Segmentation** | End-to-end pipeline: scale → PCA → K-Means → profile → recommend |
+
+### Chapter 8 Recap
+
+Across the three notebooks you have:
+
+1. **Notebook 01 (Introduction):** Learned K-Means, hierarchical clustering, and evaluation metrics.
+2. **Notebook 02 (Intermediate):** Explored DBSCAN, Gaussian Mixture Models, and silhouette analysis.
+3. **Notebook 03 (Advanced — this one):** Mastered PCA, t-SNE, anomaly detection, and built a full capstone project.
+
+### What's Next
+
+In **Chapter 9: Deep Learning** we'll move from classical ML to neural networks — starting with perceptrons, backpropagation, and building your first deep network with PyTorch/Keras.
+
+---
+*Generated by Berta AI | Created by Luigi Pascal Rondanini*
diff --git a/docs/chapters/index.md b/docs/chapters/index.md
index cd83b10..ba4b1ea 100644
--- a/docs/chapters/index.md
+++ b/docs/chapters/index.md
@@ -48,6 +48,10 @@ Apply your knowledge to real-world ML and AI problems.
     *10h · 3 notebooks, 5 exercises, 3 SVGs*  
     Regression, regularization; classification, SVM, ROC; ensembles, tuning, credit-risk
 
+- **Ch 8: [Unsupervised Learning](chapter-08.md)**  
+    *8h · 3 notebooks, 5 exercises, 3 SVGs*  
+    K-Means, hierarchical, DBSCAN; PCA, t-SNE; anomaly detection, customer segmentation
+
 </div>
 
 ---
@@ -63,15 +67,16 @@ Apply your knowledge to real-world ML and AI problems.
 | [5: Software Design](chapter-05.md) | Foundation | 6h | 3 | 5 | 3 |
 | [6: Intro to ML](chapter-06.md) | Practitioner | 8h | 3 | 5 | 3 |
 | [7: Supervised Learning](chapter-07.md) | Practitioner | 10h | 3 | 5 | 3 |
+| [8: Unsupervised Learning](chapter-08.md) | Practitioner | 8h | 3 | 5 | 3 |
 
 ---
 
 ## Coming Soon
 
-!!! info "Chapters 8–25"
+!!! info "Chapters 9–25"
     Additional chapters are planned for the Practitioner and Advanced tracks:
     
-    - **Practitioner** (8–15): Unsupervised Learning, Deep Learning, NLP, LLMs, Prompt Engineering, RAG, Fine-tuning, MLOps
+    - **Practitioner** (9–15): Deep Learning, NLP, LLMs, Prompt Engineering, RAG, Fine-tuning, MLOps
     - **Advanced** (16–25): Multi-Agent Systems, Advanced RAG, Reinforcement Learning, Model Optimization, Production AI, Finance, Safety, AI Products, Research, Governance & Ethics
     
     [Request a custom chapter](../guides/chapter-requests.md) on any AI topic while you wait!
diff --git a/docs/guides/curriculum.md b/docs/guides/curriculum.md
index 2e997d7..6401d23 100644
--- a/docs/guides/curriculum.md
+++ b/docs/guides/curriculum.md
@@ -30,7 +30,7 @@ Apply knowledge to real-world ML and AI problems.
 |---|---------|-------|--------|------|
 | 6 | Introduction to Machine Learning | 8h | Available | [chapter-06.md](../chapters/chapter-06.md) |
 | 7 | Supervised Learning | 10h | Available | [chapter-07.md](../chapters/chapter-07.md) |
-| 8 | Unsupervised Learning | 8h | Coming soon | — |
+| 8 | Unsupervised Learning | 8h | Available | [chapter-08.md](../chapters/chapter-08.md) |
 | 9 | Deep Learning Fundamentals | 12h | Coming soon | — |
 | 10 | Natural Language Processing Basics | 10h | Coming soon | — |
 | 11 | Large Language Models & Transformers | 10h | Coming soon | — |
@@ -39,7 +39,7 @@ Apply knowledge to real-world ML and AI problems.
 | 14 | Fine-tuning & Adaptation | 8h | Coming soon | — |
 | 15 | MLOps & Deployment | 8h | Coming soon | — |
 
-**Total: 88 hours (18h available)**
+**Total: 88 hours (26h available)**
 
 ---
 
@@ -69,9 +69,9 @@ Master complex topics and specialized domains.
 | Track | Chapters | Total Hours | Available |
 |-------|----------|-------------|-----------|
 | Foundation | 1–5 | 38h | 5/5 |
-| Practitioner | 6–15 | 88h | 2/10 |
+| Practitioner | 6–15 | 88h | 3/10 |
 | Advanced | 16–25 | 84h | 0/10 |
-| **Total** | **25** | **210h+** | **7** |
+| **Total** | **25** | **210h+** | **8** |
 
 ---
 
diff --git a/docs/guides/roadmap.md b/docs/guides/roadmap.md
index d9161ef..528ba8f 100644
--- a/docs/guides/roadmap.md
+++ b/docs/guides/roadmap.md
@@ -10,10 +10,10 @@ Our vision for the future of AI education. Priorities evolve based on community
 |-----------|--------|
 | Master Repository | Live |
 | Foundation Track | Complete (5 chapters) |
-| Practitioner Track | In progress (2 of 10 chapters) |
+| Practitioner Track | In progress (3 of 10 chapters) |
 | Advanced Track | Planned (10 chapters) |
 | Community Requests | Starting |
-| **Available Now** | 7 chapters, 56 hours, 21 SVGs |
+| **Available Now** | 8 chapters, 64 hours, 24 SVGs |
 
 ---
 
@@ -28,13 +28,13 @@ Our vision for the future of AI education. Priorities evolve based on community
 ## Phase 1: Foundation & Launch — Complete
 
 !!! success "Complete"
-    Foundation Track complete. Chapters 6-7 available. Core infrastructure done.
+    Foundation Track complete. Chapters 6-8 available. Core infrastructure done.
 
 ### Objectives
 
 - [x] Establish master repository
 - [x] Complete Foundation Track (Chapters 1-5)
-- [x] Begin Practitioner Track (Ch 6-7)
+- [x] Begin Practitioner Track (Ch 6-8)
 - [ ] Establish community request process
 - [ ] Build first 100 community chapters
 
@@ -63,7 +63,7 @@ Our vision for the future of AI education. Priorities evolve based on community
 |---|---------|--------|
 | 6 | Introduction to ML | Done |
 | 7 | Supervised Learning | Done |
-| 8 | Unsupervised Learning | Next |
+| 8 | Unsupervised Learning | Done |
 | 9 | Deep Learning Fundamentals | Planned |
 | 10 | NLP Basics | Planned |
 | 11 | LLMs & Transformers | Planned |
diff --git a/docs/guides/syllabus.md b/docs/guides/syllabus.md
index ed94091..c8f174e 100644
--- a/docs/guides/syllabus.md
+++ b/docs/guides/syllabus.md
@@ -16,7 +16,7 @@ graph TD
 
     CH6["Ch 6: Intro to ML<br/>8h | Available"]
     CH7["Ch 7: Supervised Learning<br/>10h | Available"]
-    CH8["Ch 8: Unsupervised Learning<br/>8h | Coming Soon"]
+    CH8["Ch 8: Unsupervised Learning<br/>8h | Available"]
     CH9["Ch 9: Deep Learning<br/>12h | Coming Soon"]
     CH10["Ch 10: NLP Basics<br/>10h | Coming Soon"]
     CH11["Ch 11: LLMs & Transformers<br/>10h | Coming Soon"]
@@ -56,7 +56,7 @@ graph TD
     style CH5 fill:#4caf50,color:#fff
     style CH6 fill:#4caf50,color:#fff
     style CH7 fill:#4caf50,color:#fff
-    style CH8 fill:#f3e5f5
+    style CH8 fill:#4caf50,color:#fff
     style CH9 fill:#f3e5f5
     style CH10 fill:#f3e5f5
     style CH11 fill:#f3e5f5
@@ -89,7 +89,7 @@ graph TD
 | 5 | Software Design | Foundation | 6h | Available |
 | 6 | Introduction to ML | Practitioner | 8h | Available |
 | 7 | Supervised Learning | Practitioner | 10h | Available |
-| 8 | Unsupervised Learning | Practitioner | 8h | Coming soon |
+| 8 | Unsupervised Learning | Practitioner | 8h | Available |
 | 9 | Deep Learning Fundamentals | Practitioner | 12h | Coming soon |
 | 10 | NLP Basics | Practitioner | 10h | Coming soon |
 | 11 | LLMs & Transformers | Practitioner | 10h | Coming soon |
diff --git a/docs/index.md b/docs/index.md
index e7c82c5..a3fb797 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -28,27 +28,27 @@ Free. Open-source. Community-driven. Generated by [Berta AI](https://berta.one).
 <div class="stats-grid" markdown>
 
 <div class="stat-card" markdown>
-<div class="number">7</div>
+<div class="number">8</div>
 <div class="label">Chapters</div>
 </div>
 
 <div class="stat-card" markdown>
-<div class="number">21</div>
+<div class="number">24</div>
 <div class="label">Notebooks</div>
 </div>
 
 <div class="stat-card" markdown>
-<div class="number">21</div>
+<div class="number">24</div>
 <div class="label">Diagrams</div>
 </div>
 
 <div class="stat-card" markdown>
-<div class="number">56h</div>
+<div class="number">64h</div>
 <div class="label">Content</div>
 </div>
 
 <div class="stat-card" markdown>
-<div class="number">37</div>
+<div class="number">42</div>
 <div class="label">Exercises</div>
 </div>
 
@@ -79,7 +79,8 @@ Free. Open-source. Community-driven. Generated by [Berta AI](https://berta.one).
 |---|---------|-------|----------|
 | 6 | [Introduction to Machine Learning](chapters/chapter-06.md) | 8h | 3 notebooks, 5 exercises, 3 diagrams |
 | 7 | [Supervised Learning](chapters/chapter-07.md) | 10h | 3 notebooks, 5 exercises, 3 diagrams |
-| 8–25 | Coming soon | | [View roadmap](guides/roadmap.md) |
+| 8 | [Unsupervised Learning](chapters/chapter-08.md) | 8h | 3 notebooks, 5 exercises, 3 diagrams |
+| 9–25 | Coming soon | | [View roadmap](guides/roadmap.md) |
 
 ---
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 97b5818..591ae44 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -133,6 +133,10 @@ nav:
       - "7.1 Introduction": chapters/content/ch07-01_introduction.md
       - "7.2 Intermediate": chapters/content/ch07-02_intermediate.md
       - "7.3 Advanced": chapters/content/ch07-03_advanced.md
+      - "Ch 8: Unsupervised Learning": chapters/chapter-08.md
+      - "8.1 Introduction": chapters/content/ch08-01_introduction.md
+      - "8.2 Intermediate": chapters/content/ch08-02_intermediate.md
+      - "8.3 Advanced": chapters/content/ch08-03_advanced.md
   - Playground: playground.md
   - Community:
     - Contributing: guides/contributing.md
diff --git a/netlify.toml b/netlify.toml
index 75b89e4..7319110 100644
--- a/netlify.toml
+++ b/netlify.toml
@@ -1,10 +1,23 @@
 [build]
   command = "pip install mkdocs-material mkdocs-minify-plugin && mkdocs build"
   publish = "site"
+  functions = "netlify/functions"
 
 [build.environment]
   PYTHON_VERSION = "3.11"
 
+# Newsletter: auto-sends on deploy when chapter_notification.json changes.
+# One-time setup in Netlify Dashboard > Site settings > Environment variables:
+#   RESEND_API_KEY       = your Resend API key
+#   CONFIRM_FROM_EMAIL   = verified sender email
+#   SITE_URL             = https://chapters.berta.one
+#   NETLIFY_API_TOKEN    = personal access token (app.netlify.com/user/applications)
+#   NETLIFY_SITE_ID      = site ID (Site settings > General)
+#
+# To notify subscribers about a new chapter:
+#   1. Edit netlify/functions/chapter_notification.json
+#   2. Commit and deploy — done. No dashboard changes needed.
+
 [[redirects]]
   from = "/chapters"
   to = "/chapters/"
diff --git a/netlify/functions/chapter_notification.json b/netlify/functions/chapter_notification.json
new file mode 100644
index 0000000..b422030
--- /dev/null
+++ b/netlify/functions/chapter_notification.json
@@ -0,0 +1,5 @@
+{
+  "chapter_number": 8,
+  "chapter_title": "Unsupervised Learning",
+  "chapter_description": "K-Means clustering, hierarchical clustering, DBSCAN, PCA, t-SNE, anomaly detection, and a customer segmentation capstone project."
+}
diff --git a/netlify/functions/deploy-succeeded.js b/netlify/functions/deploy-succeeded.js
new file mode 100644
index 0000000..afb4f91
--- /dev/null
+++ b/netlify/functions/deploy-succeeded.js
@@ -0,0 +1,225 @@
+/**
+ * Netlify Function: deploy-succeeded (Background Event)
+ *
+ * Automatically triggered after every successful Netlify deploy.
+ * Reads chapter_notification.json to determine what to notify about,
+ * then checks LAST_NOTIFIED_CHAPTER to avoid re-sending.
+ *
+ * How to send a newsletter for a new chapter:
+ * 1. Update chapter_notification.json with the new chapter details.
+ * 2. Commit, push, and deploy. That's it.
+ *
+ * The function compares the chapter number in the JSON file against
+ * LAST_NOTIFIED_CHAPTER (stored as a Netlify env var). If they differ,
+ * it fetches all subscribers from Netlify Forms and emails them via
+ * Resend, then updates LAST_NOTIFIED_CHAPTER so subsequent deploys
+ * won't re-send.
+ *
+ * One-time setup (Netlify Dashboard > Site settings > Environment variables):
+ *   RESEND_API_KEY       - Resend API key
+ *   CONFIRM_FROM_EMAIL   - Verified sender email in Resend
+ *   SITE_URL             - https://chapters.berta.one
+ *   NETLIFY_API_TOKEN    - Personal access token (app.netlify.com/user/applications)
+ *   NETLIFY_SITE_ID      - Site ID (Site settings > General)
+ *
+ * No per-release changes needed in the dashboard.
+ */
+
+var config = require("./chapter_notification.json");
+
+exports.handler = async function () {
+  var chapterNumber = String(config.chapter_number);
+  var chapterTitle = config.chapter_title;
+  var chapterDescription = config.chapter_description;
+
+  var lastNotified = process.env.LAST_NOTIFIED_CHAPTER || "";
+  if (lastNotified === chapterNumber) {
+    console.log("Chapter " + chapterNumber + " already notified — skipping");
+    return { statusCode: 200, body: "Already notified for chapter " + chapterNumber };
+  }
+
+  var apiKey = process.env.RESEND_API_KEY;
+  if (!apiKey) {
+    console.log("RESEND_API_KEY not set — cannot send emails");
+    return { statusCode: 200, body: "No email service configured" };
+  }
+
+  var netlifyToken = process.env.NETLIFY_API_TOKEN;
+  var siteId = process.env.NETLIFY_SITE_ID;
+  if (!netlifyToken || !siteId) {
+    console.log("NETLIFY_API_TOKEN or NETLIFY_SITE_ID not set");
+    return { statusCode: 200, body: "Netlify API not configured" };
+  }
+
+  var fromEmail = process.env.CONFIRM_FROM_EMAIL || "onboarding@resend.dev";
+  var siteUrl = process.env.SITE_URL || "https://chapters.berta.one";
+
+  // ── Fetch subscribers from Netlify Forms ──
+  var subscribers = [];
+  try {
+    var formsRes = await fetch(
+      "https://api.netlify.com/api/v1/sites/" + siteId + "/forms",
+      { headers: { "Authorization": "Bearer " + netlifyToken } }
+    );
+    if (!formsRes.ok) {
+      console.log("Failed to fetch forms: HTTP " + formsRes.status);
+      return { statusCode: 200, body: "Failed to fetch forms" };
+    }
+    var forms = await formsRes.json();
+    var newsletterForm = forms.find(function (f) { return f.name === "newsletter"; });
+
+    if (!newsletterForm) {
+      console.log("Newsletter form not found");
+      return { statusCode: 200, body: "Newsletter form not found" };
+    }
+
+    var page = 1;
+    var perPage = 100;
+    while (true) {
+      var subsRes = await fetch(
+        "https://api.netlify.com/api/v1/forms/" + newsletterForm.id +
+        "/submissions?per_page=" + perPage + "&page=" + page,
+        { headers: { "Authorization": "Bearer " + netlifyToken } }
+      );
+      if (!subsRes.ok) break;
+      var subs = await subsRes.json();
+      if (!subs || subs.length === 0) break;
+      for (var i = 0; i < subs.length; i++) {
+        var email = subs[i].data && subs[i].data.email;
+        if (email && subscribers.indexOf(email) === -1) {
+          subscribers.push(email);
+        }
+      }
+      if (subs.length < perPage) break;
+      page++;
+    }
+  } catch (err) {
+    console.log("Error fetching subscribers: " + err.message);
+    return { statusCode: 500, body: "Failed to fetch subscribers" };
+  }
+
+  if (subscribers.length === 0) {
+    console.log("No subscribers found");
+    return { statusCode: 200, body: "No subscribers" };
+  }
+
+  console.log("Sending Chapter " + chapterNumber + ": " + chapterTitle +
+    " notification to " + subscribers.length + " subscriber(s)");
+
+  // ── Build email ──
+  var chapterUrl = siteUrl + "/chapters/chapter-" +
+    (parseInt(chapterNumber) < 10 ? "0" : "") + chapterNumber + "/";
+
+  var htmlBody = [
+    "<div style='font-family: Times New Roman, serif; max-width: 600px; margin: 0 auto;'>",
+    "  <h2 style='color: #003366;'>New Chapter Published!</h2>",
+    "  <h3 style='color: #4a90d9;'>Chapter " + chapterNumber + ": " + chapterTitle + "</h3>",
+    "  <p>" + chapterDescription + "</p>",
+    "  <p style='margin: 20px 0;'>",
+    "    <a href='" + chapterUrl + "' style='background: #003366; color: #ffffff; " +
+    "padding: 10px 24px; text-decoration: none; font-weight: bold;'>",
+    "      Read Chapter " + chapterNumber + " Now</a>",
+    "  </p>",
+    "  <hr style='border-top: 2px solid #808080;'>",
+    "  <p><strong>What's included:</strong></p>",
+    "  <ul>",
+    "    <li>3 interactive notebooks (Introduction, Intermediate, Advanced)</li>",
+    "    <li>5 exercises with complete solutions</li>",
+    "    <li>Production-ready toolkit scripts</li>",
+    "    <li>3 SVG diagrams</li>",
+    "    <li>Synthetic datasets for hands-on practice</li>",
+    "  </ul>",
+    "  <p><a href='" + siteUrl + "/chapters/'>Browse all chapters</a> | ",
+    "  <a href='" + siteUrl + "/playground/'>Try the Playground</a></p>",
+    "  <hr style='border-top: 2px solid #808080;'>",
+    "  <p style='font-size: 12px; color: #808080;'>",
+    "    You're receiving this because you subscribed at " + siteUrl + "<br>",
+    "    To unsubscribe, reply to this email with 'unsubscribe'.",
+    "  </p>",
+    "  <p style='font-size: 12px; color: #808080;'>",
+    "    Created by <a href='https://rondanini.net'>Luigi Pascal Rondanini</a> | ",
+    "    Powered by <a href='https://berta.one'>Berta AI</a>",
+    "  </p>",
+    "</div>",
+  ].join("\n");
+
+  // ── Send emails ──
+  var sent = 0;
+  var failed = 0;
+
+  for (var j = 0; j < subscribers.length; j++) {
+    try {
+      var response = await fetch("https://api.resend.com/emails", {
+        method: "POST",
+        headers: {
+          "Authorization": "Bearer " + apiKey,
+          "Content-Type": "application/json",
+        },
+        body: JSON.stringify({
+          from: "Berta Chapters <" + fromEmail + ">",
+          to: [subscribers[j]],
+          subject: "New Chapter: " + chapterTitle + " (Chapter " + chapterNumber + ")",
+          html: htmlBody,
+        }),
+      });
+
+      if (response.ok) {
+        sent++;
+        console.log("Sent to " + subscribers[j]);
+      } else {
+        failed++;
+        var errorText = await response.text();
+        console.log("Failed: " + subscribers[j] + " — " + errorText);
+      }
+    } catch (err) {
+      failed++;
+      console.log("Error: " + subscribers[j] + " — " + err.message);
+    }
+  }
+
+  // ── Update LAST_NOTIFIED_CHAPTER via Netlify API ──
+  if (sent > 0) {
+    try {
+      // Fetch existing env vars for this account (scoped to site)
+      var envRes = await fetch(
+        "https://api.netlify.com/api/v1/accounts/me/env?site_id=" + siteId,
+        { headers: { "Authorization": "Bearer " + netlifyToken } }
+      );
+      var envVars = await envRes.json();
+      var existing = Array.isArray(envVars)
+        ? envVars.find(function (v) { return v.key === "LAST_NOTIFIED_CHAPTER"; })
+        : null;
+
+      if (existing) {
+        // Delete then recreate (Netlify API pattern for updating)
+        await fetch(
+          "https://api.netlify.com/api/v1/accounts/me/env/LAST_NOTIFIED_CHAPTER?site_id=" + siteId,
+          { method: "DELETE", headers: { "Authorization": "Bearer " + netlifyToken } }
+        );
+      }
+      await fetch(
+        "https://api.netlify.com/api/v1/accounts/me/env?site_id=" + siteId,
+        {
+          method: "POST",
+          headers: {
+            "Authorization": "Bearer " + netlifyToken,
+            "Content-Type": "application/json",
+          },
+          body: JSON.stringify([{
+            key: "LAST_NOTIFIED_CHAPTER",
+            scopes: ["functions"],
+            values: [{ value: chapterNumber, context: "all" }],
+          }]),
+        }
+      );
+      console.log("Updated LAST_NOTIFIED_CHAPTER to " + chapterNumber);
+    } catch (err) {
+      console.log("Could not update LAST_NOTIFIED_CHAPTER: " + err.message);
+    }
+  }
+
+  var summary = "Chapter " + chapterNumber + " — Sent: " + sent +
+    ", Failed: " + failed + ", Total: " + subscribers.length;
+  console.log(summary);
+  return { statusCode: 200, body: summary };
+};
diff --git a/netlify/functions/submission-created.js b/netlify/functions/submission-created.js
index 9a272f4..8aefae5 100644
--- a/netlify/functions/submission-created.js
+++ b/netlify/functions/submission-created.js
@@ -41,9 +41,12 @@ exports.handler = async function (event) {
     "  <p>Thank you for subscribing to updates.</p>",
     "  <p>You'll receive an email when new chapters are published. At most one email per week.</p>",
     "  <hr style='border-top: 2px solid #808080;'>",
+    "  <p><strong>Latest chapter:</strong></p>",
+    "  <p><a href='" + siteUrl + "/chapters/chapter-08/'>Chapter 8: Unsupervised Learning</a> — ",
+    "  K-Means, hierarchical clustering, DBSCAN, PCA, t-SNE, anomaly detection, and a customer segmentation capstone.</p>",
     "  <p><strong>Start learning now:</strong></p>",
     "  <ul>",
-    "    <li><a href='" + siteUrl + "/chapters/'>Browse all chapters</a></li>",
+    "    <li><a href='" + siteUrl + "/chapters/'>Browse all 8 chapters</a></li>",
     "    <li><a href='" + siteUrl + "/playground/'>Try the Python Playground</a></li>",
     "    <li><a href='https://github.com/luigipascal/berta-chapters'>Star on GitHub</a></li>",
     "  </ul>",
diff --git a/wiki/Home.md b/wiki/Home.md
index 83c9da6..3a61a8c 100644
--- a/wiki/Home.md
+++ b/wiki/Home.md
@@ -47,7 +47,7 @@ A full journey from basics to advanced AI:
 - **Practitioner (Ch 6–15)** — ML intro, supervised learning, deep learning, NLP, RAG, MLOps  
 - **Advanced (Ch 16–25+)** — Multi-agent systems, reinforcement learning, production AI, ethics  
 
-**7 chapters available now** (56 hours of content). More unlock as the community grows.
+**8 chapters available now** (64 hours of content). More unlock as the community grows.
 
 ### 2. Community-Requested Chapters — Learn What You Need
 
@@ -121,7 +121,7 @@ Same format everywhere. Learn once, then move fast.
 | **Advanced** | 16–25+ | 📋 Planned |
 | **Community** | Your requests | 🚀 Unlimited |
 
-**21 notebooks**, **37 exercises with solutions**, **5 datasets**, **21 diagrams**. All open and ready to run.
+**24 notebooks**, **42 exercises with solutions**, **7 datasets**, **24 diagrams**. All open and ready to run.
 
 ---