HelloPython-AI-ML — PyTorch, Data Mining & Colab-ready
This document demonstrates how to contribute to HelloPython-AI-ML while following our 10 universal coding principles. It includes PyTorch examples, data mining practices, and Colab-ready code, designed for scalability, reproducibility, and enterprise-grade maintainability.
- Keep code readable and explicit.
- Example:
# Colab-ready installation
# !pip install torch torchvision
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Explicit dataset preparation
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)- Always show data preprocessing steps.
import pandas as pd
# Load CSV dataset explicitly
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")
# Encode categorical labels
df["species_encoded"] = df["species"].astype("category").cat.codes
print(df.head())- Code must run on CPU, GPU, Colab, and local machines.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)- Explain why each step is done, not just what is done.
# Normalize MNIST images
# Improves training stability and convergence
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])- Validate outputs and shapes.
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(28*28, 10)
def forward(self, x):
return self.fc(x.view(-1, 28*28))
model = Net()
x = torch.randn(1, 1, 28, 28)
out = model(x)
assert out.shape == (1, 10), "Model output shape mismatch!"- Use consistent naming and repo structure:
/data # Datasets
/models # Model checkpoints
/examples # Example scripts
/notebooks # Jupyter/Colab notebooks
- Keep variable and function names uniform across files.
- Save models and experiment configs for later use.
# Save model checkpoint
torch.save(model.state_dict(), "mnist_model.pth")
# Load later (Colab or local)
model.load_state_dict(torch.load("mnist_model.pth", map_location=device))- Design code to scale from small to large datasets, CPU or GPU.
for epoch in range(3):
for data, target in train_loader:
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
loss = criterion(model(data), target)
loss.backward()
optimizer.step()- Always fix random seeds to make results consistent.
import numpy as np
import random
seed = 42
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)- Provide notebooks and example scripts for easy onboarding.
[](
https://colab.research.google.com/github/YOUR_USER/HelloPython-AI-ML/blob/main/notebooks/mnist_baseline.ipynb
)-
Data Quality First
- Validate and clean datasets before training.
# Check for missing values
print(df.isnull().sum())-
Reproducibility Always
- Log dataset version, preprocessing steps, and experiment seed.
-
Scalable Pipelines
- Use DataLoader with batching and multiprocessing.
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True)✅ This CONTRIBUTING_EXAMPLES.md ensures that:
- Contributors can run and fine-tune models in Colab immediately.
- Code is enterprise-grade, maintainable, and future-proof for local hardware or servers.
- All 10 principles + data mining rules are followed consistently.