This project classifies six different dermatological diseases using machine learning. The dataset was explored using data visualization and exploratory data analysis (EDA) to better understand feature distributions and relationships.
Two models were built and compared:
- Decision Tree
- Random Forest
The dataset is located in the data folder:
- dermatology_database_1.csv
It contains clinical and histopathological features used to predict six dermatological disease classes.
The following steps were performed:
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA)
- Data visualization of features
- Model training using Decision Tree and Random Forest
- Model evaluation using classification reports
- Visualization of feature importances
Models were evaluated using classification reports including:
- Precision
- Recall
- F1-score
- Accuracy
Feature importance plots were generated to identify the most influential features, mainly from the Random Forest model.
.
├── data
│ └── dermatology_database_1.csv
├── models
│ └── derma_classification_final.ipynb
├── old
│ └── Derma_Classification.ipynb
└── README.md
- Python
- Pandas
- NumPy
- Matplotlib / Seaborn
- Scikit-learn
- Jupyter Notebook
- The models folder contains the final notebook.
- The old folder contains an earlier version of the notebook kept for reference.
Patrick McElroy
For educational and research purposes.