|
| 1 | +--- |
| 2 | +title: Final Project |
| 3 | +parent: Archive |
| 4 | +permalink: /archive/final_project/ |
| 5 | +nav_order: 3 |
| 6 | +layout: default |
| 7 | +--- |
| 8 | + |
| 9 | +[← Archive home]({{ site.baseurl }}/archive/) |
| 10 | + |
| 11 | +# <span style="color: #397DFF; font-weight: 350">CRUP Fall 2025 Final Project</span> |
| 12 | + |
| 13 | +## <span style="color: #397DFF">Project Overview</span> |
| 14 | +You will design and execute a complete Machine Learning project that answers a real-world question and present your findings through an interactive website. This project integrates both machine learning and software engineering skills. |
| 15 | + |
| 16 | +## <span style="color: #397DFF">Core Requirements</span> |
| 17 | +- Formulate a specific, real-world problem that can be solved with **Machine Learning** |
| 18 | +- Build and train an **ML model** using a real dataset |
| 19 | +- Create an **interactive website** that explains your process and allows users to interact with your model |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +# <span style="color: #397DFF; font-weight: 350">Part A: Machine Learning Component</span> |
| 24 | + |
| 25 | +## <span style="color: #397DFF">1. Problem Formulation & Dataset Selection</span> |
| 26 | +Your task is to identify a clear, answerable question that requires machine learning to solve. |
| 27 | + |
| 28 | +### Dataset Requirements |
| 29 | +- Must come from a real dataset (e.g., **Kaggle**, **Hugging Face**) |
| 30 | +- Your project must be one of the following: |
| 31 | + - **Classification** (e.g., "Will this customer churn?") |
| 32 | + - **Regression** (e.g., "What will this house sell for?") |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## <span style="color: #397DFF">2. Complete ML Pipeline Implementation</span> |
| 37 | +You must implement all stages of a professional ML workflow. |
| 38 | + |
| 39 | +### a) Data Acquisition and Cleaning |
| 40 | +- Download and load your dataset |
| 41 | +- Perform **Exploratory Data Analysis (EDA)** with visualizations and summary statistics |
| 42 | +- Handle missing values (remove, impute, etc.) |
| 43 | +- Encode categorical variables (e.g., one-hot encoding or label encoding) |
| 44 | + |
| 45 | +### b) Model Training and Hyperparameter Tuning |
| 46 | +- Try multiple models appropriate for your task |
| 47 | +- Train each model |
| 48 | +- Tune hyperparameters using **Grid Search**, **Random Search**, etc. |
| 49 | + |
| 50 | +### c) Rigorous Model Evaluation |
| 51 | + |
| 52 | +#### For Classification |
| 53 | +- **F1 Score** |
| 54 | +- **AUC** |
| 55 | +- **Accuracy** (use carefully) |
| 56 | +- **Confusion Matrix** |
| 57 | + |
| 58 | +#### For Regression |
| 59 | +- **R-squared (R²)** |
| 60 | +- **MAE** |
| 61 | +- **RMSE** |
| 62 | + |
| 63 | +### d) Artifact Preservation |
| 64 | +- Save your trained model (e.g., using `pickle` or `torch.save`) |
| 65 | +- You will load this model into your website |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +# <span style="color: #397DFF; font-weight: 350">Part B: Software Engineering Component</span> |
| 70 | + |
| 71 | +## <span style="color: #397DFF">1. Public-Facing Website</span> |
| 72 | +- Built using **React** |
| 73 | +- Serves as documentation + interactive demonstration |
| 74 | + |
| 75 | +## <span style="color: #397DFF">2. Required Website Content (Documentation)</span> |
| 76 | + |
| 77 | +### a) Central Problem & Real-World Impact |
| 78 | +Explain: |
| 79 | +- What question you're answering |
| 80 | +- Why it matters |
| 81 | +- Who benefits |
| 82 | +- What real-world decisions your model could influence |
| 83 | + |
| 84 | +### b) Data Source & Nature |
| 85 | +Include: |
| 86 | +- Dataset link |
| 87 | +- What each row represents |
| 88 | +- Features included |
| 89 | +- Number of examples |
| 90 | +- Any limitations or biases |
| 91 | + |
| 92 | +### c) ML Methodology |
| 93 | +Clarify: |
| 94 | +- Which algorithms you tried |
| 95 | +- Which you chose |
| 96 | +- Why you chose it |
| 97 | +- What hyperparameters you tuned |
| 98 | + |
| 99 | +### d) Final Performance Metrics |
| 100 | +Report: |
| 101 | +- Your final evaluation metrics |
| 102 | +- A direct answer to your core question |
| 103 | +- Limitations + failure modes |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +# <span style="color: #397DFF; font-weight: 350">3. Interactive Component (MANDATORY)</span> |
| 108 | + |
| 109 | +Your website must include at least **one interactive ML-powered element**. |
| 110 | + |
| 111 | +### Acceptable Options |
| 112 | +- **Prediction Form** (user enters input → model predicts) |
| 113 | +- **Slider-Based Dynamic Prediction** |
| 114 | +- **Interactive Visualizations** |
| 115 | +- **Comparative Predictions (What-if Analysis)** |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +# <span style="color: #397DFF; font-weight: 350">Part C: Deadlines & Deliverables</span> |
| 120 | + |
| 121 | +## <span style="color: #397DFF">📅 Deadlines</span> |
| 122 | +- **Research Proposal** — *Due: End of Thanksgiving Break* |
| 123 | + - One paragraph |
| 124 | + - Includes central question, dataset, and approach |
| 125 | +- **Final Project** — *Due: Before Banquet* |
| 126 | + - Full ML pipeline |
| 127 | + - Fully functional website |
| 128 | + - Complete documentation |
| 129 | + |
| 130 | +## <span style="color: #397DFF">✅ Deliverables Checklist</span> |
| 131 | +- [ ] **Research Proposal** |
| 132 | +- [ ] **ML Solution** |
| 133 | + - Dataset acquired & cleaned |
| 134 | + - Multiple models compared |
| 135 | + - Best model selected |
| 136 | + - Model evaluated |
| 137 | + - Model saved |
| 138 | +- [ ] **Website Component** |
| 139 | + - React website (public-facing) |
| 140 | + - Full documentation |
| 141 | + - At least one interactive component |
| 142 | + - Accessible, clear design |
| 143 | + |
| 144 | +--- |
| 145 | + |
| 146 | +# <span style="color: #397DFF; font-weight: 350">🌟 Exemplary Projects for Inspiration</span> |
| 147 | +- https://llm-attacks.org |
| 148 | +- https://thinkingmachines.ai/blog/modular-manifolds |
| 149 | +- Distill-style explorations: |
| 150 | + - Feature Visualization |
| 151 | + - Activation Atlas |
| 152 | + - Handwriting with Neural Networks |
| 153 | + - Building Blocks of Interpretability |
| 154 | +- https://distill.pub |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +# <span style="color: #397DFF; font-weight: 350">Tips for Success</span> |
| 159 | +- Choose a **focused** question |
| 160 | +- Select a **manageable dataset** |
| 161 | +- Document continuously |
| 162 | +- Build the interactive component early |
| 163 | +- Make explanations accessible to non-ML audiences |
| 164 | + |
| 165 | +Good luck! 🚀 |
0 commit comments