📊 Study Hour Performance Predictor
Project Overview
This project implements a Simple Linear Regression model using Python and NumPy to predict a student's final marks based on the number of hours they spend studying.
This model serves as a practical introduction to the fundamental concepts of machine learning, including data standardization, implementing the Gradient Descent algorithm from scratch, and real-time prediction using the final trained weights.
Key Features
Custom Gradient Descent Implementation: The model's weights (slope and intercept) are learned using the Gradient Descent optimization algorithm, executed over 50 epochs.
Feature Scaling: Input data (StudyHours) is standardized using StandardScaler to ensure the Gradient Descent converges efficiently.
Training Visualization: Includes a real-time animation (using matplotlib.animation) to visualize the regression line converging on the data and the corresponding reduction in Mean Squared Error (MSE) loss during training.
Interactive Prediction: After training, the script enters an interactive mode, allowing the user to input new study hour values and receive an immediate prediction of the potential marks.
Results
After 50 epochs of training, the model converged to a low loss, yielding the following final prediction equation:
Final Mean Squared Error (MSE) Loss: (Insert your final loss value from the end of the history_loss array here, e.g., 2.54)
Final Weights (
This equation, once converted back to the original scale, provides the best linear fit to the data.
Data
The model is trained on the synthetic dataset Studyhours.csv.
Feature
Description
StudyHours
The input feature (X)
Marks
The target variable (y)
Dependencies
This project requires the following Python libraries:
numpy (for numerical operations and matrix math)
pandas (for data loading and handling)
matplotlib (for plotting and animation)
sklearn (specifically StandardScaler for preprocessing)
You can install these dependencies using pip:
pip install numpy pandas matplotlib scikit-learn
How to Run
Environment: Run the code in a Jupyter environment like Google Colab.
Upload Data: Ensure both the Colab notebook (.ipynb file) and the data file (Studyhours.csv) are in the same directory or uploaded to the Colab environment.
Execute Cells: Run all cells in the notebook.
Interactive Mode: After the Gradient Descent training loop completes, the script will prompt you for input:
Enter the number of study hours to predict marks (or type 'quit' to exit):
Predict: Enter a number (e.g., 6.0) to see the predicted marks. Type quit to continue to the animated plots.