500+ pages β’ 13 chapters β’ 50+ runnable notebooks β’ Zero setup required
By Dr. J. Paul Liu
π’ Updated regularly based on student & reader feedback
| The Revolution | The Impact |
|---|---|
| π§ͺ AI-designed drugs | 80-90% Phase I success vs traditional 40-65% |
| 𧬠AlphaFold protein prediction | 2024 Nobel Prize in Chemistry |
| π GenCast weather AI | Outperforms best models in 97% of scenarios |
| β‘ Neural surrogates | Simulations 1000x faster than traditional methods |
This book teaches you HOW to start and build educational-similar systems yourself.
| In 30 Minutes | You'll Create | Using |
|---|---|---|
| π§ͺ Drug Discovery | Design molecules with target properties | GNNs + Diffusion |
| 𧬠Protein Engineering | Predict 3D structure from sequence | ESMFold |
| π Climate Science | Fast weather/climate emulators | Neural Surrogates |
| βοΈ Physics Simulation | Solve PDEs with neural networks | PINNs |
| π Literature Mining | Extract insights from papers | RAG + LLMs |
Generative AI for Science is a comprehensive, hands-on guide for researchers, students, and practitioners who want to apply cutting-edge AI techniques to scientific discovery. This book bridges the gap between AI/ML expertise and domain science, providing practical implementations across chemistry, biology, physics, geoscience, and beyond.
"Generative AI does not replace the scientific methodβit enhances it. It expands the space of hypotheses we can explore, sharpens experimental design, and reveals patterns hidden in complexity."
| Feature | Description |
|---|---|
| π¬ Theory Meets Practice | Every concept is paired with ready-to-run code |
| π» Interactive Learning | All examples provided as Google Colab notebooksβno installation required |
| π§ͺ Real Scientific Problems | Examples from authentic research across multiple domains |
| π Accessible Yet Rigorous | Suitable for domain scientists exploring AI and ML experts entering scientific applications |
| You Are... | You'll Get... |
|---|---|
| π¬ Domain Scientist | AI skills to accelerate your research |
| π» ML Engineer | Scientific applications for your expertise |
| π Graduate Student | Complete curriculum with hands-on projects |
| π Industry Practitioner | Production-ready code and best practices |
By the end of this book, you will:
- β Understand key AI architectures: Transformers, Diffusion Models, VAEs, and GNNs
- β Represent scientific data types effectively for AI models
- β Apply generative models to problems in climate science, drug discovery, genomics, materials science, and more
- β Follow best practices around ethics, reproducibility, and deployment
- β Stay current with emerging methods and future directions
- β Develop the intuition to know when and how to apply AI to scientific research
| Chapter | Title | Topics |
|---|---|---|
| 1 | Generative AI: A New Frontier for Scientific Discovery | AI revolution in science, core technologies, cross-cutting capabilities |
| 2 | Generative AI Fundamentals | Transformers, LLMs, Diffusion Models, VAEs, GANs, attention mechanisms |
| 3 | Scientific Data & Workflows | Data challenges, FAIR principles, data preparation, workflow automation |
| 4 | Text, Code & Knowledge Generation | Literature synthesis, RAG, hypothesis generation, code generation, scientific writing |
| Chapter | Title | Topics |
|---|---|---|
| 5 | Data-to-Data Models | Missing data imputation, synthetic data with GANs, VAEs, Gaussian processes, time series |
| 6 | Physics-Informed AI and Simulation | PINNs, neural surrogates, code optimization, automated testing |
| Chapter | Title | Topics |
|---|---|---|
| 7 | Domain Applications | Chemistry & Materials, Biology & Biomedicine, Physics & Engineering, Geoscience & Climate |
π Chapter 7 Detailed Breakdown (click to expand)
Part I: Chemistry & Materials Science
- Molecular Graph Learning (GNNs)
- Molecular Generation with Diffusion Models
- Crystal Structure Prediction
- Reaction Outcome Prediction with Transformers
Part II: Biology & Biomedicine
- Protein Structure Prediction (ESMFold, AlphaFold2)
- Protein Sequence Generation (ProteinMPNN, RFDiffusion)
- Genomic Variant Analysis
- Clinical Trial Optimization
Part III: Physics & Engineering
- Particle Physics Applications
- Quantum Systems
- Materials Characterization
Part IV: Geoscience & Climate
- Ocean Forecasting
- Hurricane Prediction
- Climate Modeling
- Weather AI (GenCast, Aurora)
Part V: Cross-Cutting Applications
- Transfer Learning
- Multi-task Learning
- Foundation Models
| Chapter | Title | Topics |
|---|---|---|
| 8 | Fine-Tuning & Domain Adaptation | LoRA, PEFT, domain-specific training, evaluation strategies |
| 9 | Multimodal Generative AI | Vision-language models, graph-text models, multimodal fusion |
| 10 | Evaluation, Validation & Benchmarking | Metrics, validation strategies, uncertainty quantification, robustness testing |
| 11 | Ethics & Responsible AI | Reproducibility, bias & fairness, environmental impact, dual-use, data privacy |
| 12 | Deployment & MLOps | Experiment tracking, data versioning, model lifecycle, continuous training |
| 13 | Future Directions & Conclusion | Emerging architectures, foundation models, AI reasoning, open challenges |
β
Basic Python (functions, loops, data structures)
β
Undergraduate statistics (helpful but not required)
β
A web browser + curiosity
β No prior deep learning experience needed
-
π Get the 500-page book
-
π₯ Pick up a chapter
- Read the chapter and open that chapter's Colab Notebook
βΆοΈ Open any notebook in Google Colab- Click the "Open in Colab" badge in each notebook
- Or upload directly to colab.research.google.com
- GPU runtime recommended for deep learning examples
Generative_AI_For_Science/
βββ π Chapter01_Introduction/
β βββ π Ch01_AI_Scientific_Discovery.ipynb
βββ π Chapter02_Fundamentals/
β βββ π Ch02_Transformers.ipynb
β βββ π Ch02_Diffusion_Models.ipynb
β βββ π Ch02_VAEs_GANs.ipynb
βββ π Chapter03_Data_Workflows/
β βββ π Ch03_Scientific_Data.ipynb
βββ π Chapter04_Text_Code_Knowledge/
β βββ π Ch04_RAG_Literature.ipynb
β βββ π Ch04_Code_Generation.ipynb
βββ π Chapter05_Data_to_Data/
β βββ π Ch05_Autoencoders.ipynb
β βββ π Ch05_GANs.ipynb
β βββ π Ch05_VAEs.ipynb
β βββ π Ch05_Time_Series.ipynb
βββ π Chapter06_Physics_Informed/
β βββ π Ch06_PINNs.ipynb
β βββ π Ch06_Neural_Surrogates.ipynb
βββ π Chapter07_Domain_Applications/
β βββ π Ch07_Chemistry_GNNs.ipynb
β βββ π Ch07_Molecular_Diffusion.ipynb
β βββ π Ch07_Protein_Structure.ipynb
β βββ π Ch07_Genomics.ipynb
β βββ π Ch07_Climate_AI.ipynb
βββ π Chapter08_FineTuning/
β βββ π Ch08_LoRA_PEFT.ipynb
βββ π Chapter09_Multimodal/
β βββ π Ch09_Vision_Language.ipynb
βββ π Chapter10_Evaluation/
β βββ π Ch10_Metrics_Validation.ipynb
βββ π Chapter11_Ethics/
β βββ π Ch11_Responsible_AI.ipynb
βββ π Chapter12_Deployment/
β βββ π Ch12_MLOps.ipynb
βββ π slides/
β βββ π PowerPoint slides for each chapter
βββ π assets/
β βββ πΌοΈ Figures and images
βββ π README.md
| Use Case | Recommendation |
|---|---|
| π As a course text | Follow chapters sequentially for structured introduction |
| π As a reference | Jump directly to sections relevant to your research domain |
| π» As a hands-on guide | Open Colab notebooks alongside each chapter, run and modify code |
| π As a research launchpad | Use provided implementations as starting points for your projects |
- Molecular Property Prediction with Graph Neural Networks
- Drug Design with Diffusion Models
- Crystal Structure Prediction with AI
- Reaction Prediction with Transformers
- Protein Structure Prediction (ESMFold, AlphaFold2)
- Protein Design (ProteinMPNN, RFDiffusion)
- Variant Effect Prediction for genomics
- Clinical Trial Optimization
- Weather Forecasting with GenCast
- Ocean Dynamics modeling
- Climate Projection with surrogates
- Extreme Event Prediction
- Physics-Informed Neural Networks (PINNs)
- Neural Network Surrogates for simulations
- Uncertainty Quantification
| Architecture | Use Cases | Scientific Applications |
|---|---|---|
| Transformers & LLMs | Text, code, sequences | Literature synthesis, protein sequences, code generation |
| Diffusion Models | Structured outputs, images | Molecular structures, protein folding, climate data |
| VAEs & GANs | Latent space learning | Synthetic data, anomaly detection, compression |
| Graph Neural Networks | Molecular graphs | Property prediction, reaction prediction |
| Physics-Informed NNs | PDEs, conservation laws | Fluid dynamics, heat transfer, wave propagation |
While all notebooks run in Google Colab, you can also set up locally:
# Create virtual environment
python -m venv genai-science
source genai-science/bin/activate # Linux/Mac
# or: genai-science\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txttorch>=2.0
transformers>=4.30
rdkit
numpy
pandas
matplotlib
scikit-learn
| Model Type | GPU Memory | Recommended Platform |
|---|---|---|
| Small models (GNNs, VAEs) | < 4 GB | Colab Free Tier |
| Medium models (Diffusion) | 4-8 GB | Colab Pro |
| Large models (LLMs, ESMFold) | 16+ GB | Colab Pro+, A100 |
We welcome contributions! Please see our Contributing Guidelines for details.
- π Report bugs or issues
- π‘ Suggest new examples or applications
- π Improve documentation
- π§ Submit code improvements
- π Translate content
If you use this book or code in your research, please cite:
@book{liu2026generativeai,
title = {Generative AI for Science},
author = {Liu, J. Paul},
year = {2026},
publisher = {Leanpub},
url = {https://leanpub.com/generativeaiforscience}
}Or simply:
J. Paul Liu, 2026. Generative AI for Science. Leanpub, https://leanpub.com/generativeaiforscience
| Platform | Link |
|---|---|
| π§ Email | Contact through Leanpub |
| π¦ Twitter / X | @jpliu168 β follow for updates |
| πΌ LinkedIn | Paul Liu β connect for professional updates |
| π¬ Discussions | Use GitHub Discussions for Q&A |
| π Issues | Report bugs via GitHub Issues |
This book was developed through:
- Graduate courses at the Data Science and AI Academy
- Bioinformatics Research Center workshops
- Cross-campus AI for Research training programs
- Research Triangle AI SocietyβLLM intensive bootcamps
- Collaborations in oceanography, materials science, protein engineering, and literature mining
Special thanks to all students and colleagues who provided feedback and helped refine the material.
This project is licensed under the MIT License - see the LICENSE file for details.
The book content is Β© 2026 J. Paul Liu. Code examples are provided under MIT License for educational use.
If you find this resource helpful:
- β Star this repository to help others discover it
- π¦ Share on Twitter/LinkedIn to spread the word
- π Get the book to support continued development
- π¬ Leave feedback to help improve future editions
π Ready to accelerate your scientific discovery with AI?
"Combine human creativity with machine assistance, and new discoveries become possible."
β Dr. J. Paul Liu
