🌸 Iris Flower Classification

CodeAlpha Data Science Internship — Task 1

A complete machine learning pipeline that classifies Iris flowers into 3 species using 4 classifiers — achieving up to 100% test accuracy with full EDA, evaluation, and visualization.

📓 View Notebook • 📊 Results • 🚀 How to Run • 📁 Project Structure

📌 Project Overview

This project is Task 1 of the CodeAlpha Data Science Internship. The goal is to build a supervised machine learning model that classifies Iris flowers into one of three species — Setosa, Versicolor, or Virginica — based on 4 physical measurements.

The project covers the complete ML pipeline from data exploration to model evaluation and comparison, using 4 different classifiers side-by-side.

🎯 Objectives

✅ Perform thorough Exploratory Data Analysis (EDA) with visualizations
✅ Preprocess data with StandardScaler for optimal model performance
✅ Train and compare 4 machine learning classifiers
✅ Evaluate models using accuracy, cross-validation, and confusion matrices
✅ Identify the best model and extract feature importance insights

📂 Dataset

Property	Detail
Source	UCI Machine Learning Repository (via `sklearn.datasets`)
Samples	150 (50 per class)
Features	4 — Sepal Length, Sepal Width, Petal Length, Petal Width
Target	3 classes — Setosa, Versicolor, Virginica
Missing Values	None
Class Balance	Perfectly balanced (50 samples each)

📋 Feature Description

Feature	Unit	Description
`sepal_length`	cm	Length of the flower's sepal
`sepal_width`	cm	Width of the flower's sepal
`petal_length`	cm	Length of the flower's petal
`petal_width`	cm	Width of the flower's petal
`species`	—	Target: Setosa / Versicolor / Virginica

🛠️ Tech Stack

Tool	Version	Purpose
Python	3.10+	Core language
Pandas	2.0+	Data manipulation
NumPy	1.24+	Numerical operations
Scikit-learn	1.3+	ML models & evaluation
Matplotlib	3.7+	Visualizations
Seaborn	0.12+	Statistical plots
Jupyter Notebook	—	Development environment

🤖 Models Used

#	Model	Key Hyperparameters
1	Logistic Regression	`max_iter=300`
2	Decision Tree	`max_depth=4`
3	Random Forest	`n_estimators=100`, `max_depth=5`
4	K-Nearest Neighbors (KNN)	`n_neighbors=5`, `metric='euclidean'`

All models evaluated with:

80/20 Train-Test Split (stratified)
5-Fold Stratified Cross-Validation
StandardScaler normalization applied

📊 Results

Rank	Model	Test Accuracy	CV Mean	CV Std
🥇	Random Forest	~97–100%	~96%	±2%
🥈	KNN	~96–100%	~95%	±3%
🥉	Logistic Regression	~96–97%	~95%	±3%
4	Decision Tree	~93–97%	~93%	±4%

💡 Exact values depend on the random seed. Run the notebook to see your results.

📈 Visualizations

The project generates 7 professional plots, all saved as .png files:

Plot	File	Description
🔷 Feature Distributions	`feature_distributions.png`	Histograms by species for all 4 features
🔗 Correlation Heatmap	`correlation_heatmap.png`	Feature correlation matrix
🌐 Pairplot	`pairplot.png`	All feature pair combinations colored by species
📦 Boxplots	`boxplots.png`	Feature spread and outliers per species
🏆 Model Comparison	`model_comparison.png`	Test vs CV accuracy bar chart for all models
🔢 Confusion Matrices	`confusion_matrices.png`	4 confusion matrices side by side
🌳 Feature Importance	`feature_importance.png`	Random Forest feature importance
🗺️ Decision Boundary	`decision_boundary.png`	RF decision boundary (petal features)

💡 Key Insights

Petal features dominate — petal_length and petal_width together explain ~95%+ of the class separability
Setosa is trivially separable — linearly separable from other species in petal space
Versicolor–Virginica boundary is the challenge — slight overlap in feature space
Random Forest handles noise best — ensemble averaging reduces boundary errors
No overfitting detected — consistent train/CV scores across all models

📁 Project Structure

CodeAlpha_IrisFlowerClassification/
│
├── 📓 iris_classification.ipynb    ← Main Jupyter Notebook (full pipeline)
├── 📄 README.md                    ← This file
├── 📋 requirements.txt             ← Python dependencies
│
└── 📊 Generated Plots/
    ├── feature_distributions.png
    ├── correlation_heatmap.png
    ├── pairplot.png
    ├── boxplots.png
    ├── model_comparison.png
    ├── confusion_matrices.png
    ├── feature_importance.png
    └── decision_boundary.png

🚀 How to Run

Option 1 — Clone & Run Locally

# 1. Clone the repository
git clone https://github.com/MOHAMMED-ABUZAR317/CodeAlpha_IrisFlowerClassification.git
cd CodeAlpha_IrisFlowerClassification

# 2. Install dependencies
pip install -r requirements.txt

# 3. Launch Jupyter Notebook
jupyter notebook iris_classification.ipynb

Option 2 — Run on Google Colab (No setup needed)

Click the badge below to open directly in Google Colab:

📦 Requirements

pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
matplotlib>=3.7.0
seaborn>=0.12.0
jupyter>=1.0.0

Install all at once:

pip install -r requirements.txt

📚 What I Learned

How to perform structured EDA with multiple visualization techniques
Understanding class separability through pairplots and correlation analysis
Importance of feature scaling (StandardScaler) for KNN and Logistic Regression
Comparing models objectively using cross-validation vs test accuracy
How ensemble methods (Random Forest) outperform single models on real datasets

🔗 Connect

Platform	Link
💼 LinkedIn	[(https://www.linkedin.com/in/mohammed-abuzar-9061a1375/)
🐙 GitHub	[(https://github.com/MOHAMMED-ABUZAR317)
🏢 Internship	CodeAlpha

🌸 Made with ❤️ during the CodeAlpha Data Science Internship

If you found this project helpful, please give it a ⭐ on GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌸 Iris Flower Classification

CodeAlpha Data Science Internship — Task 1

📌 Project Overview

🎯 Objectives

📂 Dataset

🛠️ Tech Stack

🤖 Models Used

📊 Results

📈 Visualizations

💡 Key Insights

📁 Project Structure

🚀 How to Run

Option 1 — Clone & Run Locally

Option 2 — Run on Google Colab (No setup needed)

📦 Requirements

📚 What I Learned

🔗 Connect

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
boxplot.png		boxplot.png
confusion_matrices.png		confusion_matrices.png
correlation_heatmap.png		correlation_heatmap.png
decision_boundarie.png		decision_boundarie.png
feature_importance.png		feature_importance.png
features_distribution.png		features_distribution.png
iris_classification.ipynb		iris_classification.ipynb
model_comparison.png		model_comparison.png
pair_plot.png		pair_plot.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌸 Iris Flower Classification

CodeAlpha Data Science Internship — Task 1

📌 Project Overview

🎯 Objectives

📂 Dataset

🛠️ Tech Stack

🤖 Models Used

📊 Results

📈 Visualizations

💡 Key Insights

📁 Project Structure

🚀 How to Run

Option 1 — Clone & Run Locally

Option 2 — Run on Google Colab (No setup needed)

📦 Requirements

📚 What I Learned

🔗 Connect

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages