Welcome to the CodeBoxx AI Toolkit repository! This is your comprehensive resource for mastering data science skills across the CodeBoxx Academy's Applied AI curriculum (AAI-101 through AAI-104). The repository is structured module-by-module to guide you seamlessly through increasingly advanced concepts.
By working through these materials, you'll build skills in data manipulation, analysis, machine learning, and more using Jupyter notebooks and Python libraries like Pandas, Matplotlib, spaCy, and others.
This repository is organized by AAI modules (101-104), with common datasets and tools centralized for easy access:
.
├── AAI-101/ # Introduction to AI Tools for Developers
│ ├── air_quality_datasets/ # Air quality data files
│ ├── data/ # Module-specific datasets
│ ├── notebooks/ # Tutorial & practice notebooks
│ │ ├── tutorials/ # Step-by-step Pandas tutorials
│ │ └── worksheets/ # Practice problems & challenges
│ ├── slides/ # Course reference materials
│ └── visualizations/ # Generated charts & graphs
│
├── AAI-102/ # Machine Learning
│ ├── data/ # Module-specific datasets
│ ├── notebooks/ # ML tutorial & practice notebooks
│ ├── slides/ # Course reference materials
│ └── visualizations/ # Generated charts & graphs
│
├── AAI-103/ # Data Science
│ ├── data/ # Module-specific datasets
│ ├── notebooks/ # Data science notebooks
│ ├── slides/ # Course reference materials
│ └── visualizations/ # Generated charts & graphs
│
├── AAI-104/ # Feature Engineering
│ ├── data/ # Module-specific datasets
│ ├── notebooks/ # Feature engineering notebooks
│ ├── slides/ # Course reference materials
│ └── visualizations/ # Generated charts & graphs
│
├── common_data/ # Datasets shared across modules
│ └── elevator_datasets/ # For AAI-102/103 assignments
│
├── common_tools/ # Reusable utilities & scripts
│ └── helper_scripts/ # Conversion & utility scripts
│
├── misc/ # Supplementary materials
│ ├── my_spacy_model/ # Trained spaCy model files
│ ├── spacy_examples/ # NLP examples
│ └── uk_house_prices/ # Additional practice datasets
│
└── visualizations/ # Global visualization resources
Each module follows a consistent structure, making it easy to navigate and find relevant materials as you progress through the curriculum.
This module introduces foundational tools for data manipulation and analysis:
-
Pandas Tutorials (
AAI-101/notebooks/tutorials/):- 10 Sequential Notebooks covering core Pandas operations:
1_pandas_intro.ipynb- Basic DataFrame operations2_read_and_write_tabular_data.ipynb- Importing and exporting data3_selecting_a_subset_of_df.ipynb- Data filtering and selection4_creating_plots.ipynb- Basic data visualization5_new_cols_from_existing_cols.ipynb- Data transformation6_calculating_statistics.ipynb- Statistical operations7_table_reshaping.ipynb- Reshaping data (pivot, melt)8_combining_dataframes.ipynb- Merging and joining datasets9_handling_time_series_data.ipynb- Time series analysis10_manipulating_textual_data.ipynb- Text manipulation
- 10 Sequential Notebooks covering core Pandas operations:
-
Practice Worksheets (
AAI-101/notebooks/worksheets/):challenges_101.ipynb- Practical challenges using student enrollment datafurther_challenges_101.ipynb- Additional practice problems using a dataset of your choice!
-
Datasets (
AAI-101/data/andAAI-101/air_quality_datasets/):- Student and enrollment data (
students.csv,enrollments.json) - Titanic passenger data (
titanic.csv) - Air quality datasets (various CSV files)
- Student and enrollment data (
-
Visualizations (
AAI-101/visualizations/):- Example plots and charts created during tutorials
This module covers machine learning fundamentals:
- Notebooks will guide you through:
- Classification and regression tasks
- Model evaluation techniques
- Working with the Rocket Elevators dataset
- Data preparation for ML
This module focuses on advanced data analysis and visualization:
- Notebooks will explore:
- Exploratory Data Analysis (EDA)
- Statistical analysis techniques
- Data reporting and visualization
- Building data science workflows with CookieCutter structure
This module introduces advanced techniques for preparing data for ML models:
- Notebooks will cover:
- Feature selection and creation
- Dimensionality reduction
- Handling categorical variables
- Advanced preprocessing techniques
-
Set Up Your Environment:
- Install Anaconda (recommended) or Python with pip
- Ensure Jupyter Notebook/Lab is installed
-
Learn Pandas Step-by-Step:
- Start with the AAI-101 tutorial notebooks in numerical order
- Run each cell to see code examples in action
- Experiment by modifying parameters to deepen understanding
-
Practice with Challenges:
- After completing tutorials, move to
challenges_101.ipynb - These exercises mirror real-world data tasks
- Use the provided datasets in the
datadirectory
- After completing tutorials, move to
-
Reference Materials:
- Review slide notes for conceptual explanations
As you progress through the curriculum:
- Each module builds on previous knowledge
- Continue working through notebooks sequentially within each module
- Common datasets and tools in the centralized folders will be used across modules
- Explore additional topics like NLP with spaCy when you're ready
- Clone the repository to your local machine
- Navigate to the folder in your terminal
- Launch Jupyter:
jupyter notebookorjupyter lab - Navigate to the appropriate module folder and open the notebook you wish to work with
- Run cells sequentially (Shift+Enter) to execute code
- Run cells in order: Many notebooks build on previous cells' results
- Don't skip tutorials: Each one introduces important concepts
- Experiment: Modify examples to test your understanding
- Use documentation: Reference the official Pandas docs when needed
- Visualize your data: Use plots to build intuition about your datasets
Each module has specific deliverables aligned with your CodeBoxx Academy assignments:
- Complete the worksheets in the AAI-101 notebooks directory
- Create your own notebook demonstrating key Pandas operations
- Submit according to course instructions
- Work with elevator dataset analyses
- Perform data cleaning and preprocessing tasks
- Merge and analyze datasets for insights
- Analyze incident data
- Create visualizations and reports
- Implement the CookieCutter data science structure
- Advanced feature engineering techniques
- Prepare data for production ML systems
- Dependency Issues: Install missing packages with
pip installorconda install - Path Errors: Ensure you're referencing file paths correctly relative to the notebook location
- Data Not Found: Check that you're in the correct directory and datasets are where expected
-
Official Documentation:
-
Additional Learning:
If you find issues or want to contribute improvements:
- Open an issue describing the problem or enhancement
- For code contributions, fork the repository and submit a pull request
- Follow consistent naming conventions and document your code/notebooks
- For specific questions about assignments, contact your CodeBoxx instructor
- For technical issues with the repository, open a GitHub issue
- For general Pandas questions, search Stack Overflow or the official documentation
Happy learning, and welcome to your journey through AI tools and techniques!