Web scraping with Python conference teaching guide

Thank you for volunteering to teach this one-hour session at our conference! This teaching guide explains our setup and the material we try to cover.

The exercises live in the "Web scraping with Python" Jupyter notebook. The other notebook has a refresher on some basic Python syntax for folks who are new to Python or could use a reference.

Take it for a spin

At the conference, this repository will be on the conference computers, the virtual environment created and the dependencies (jupyterlab, requests, bs4) installed and tested.

If you're in a "BYO laptop" room, you might check out CoLab or a similar cloud environment that can load git repos.

Run locally

You'll need uv installed. (Or use your own dependency management software.)

Clone or download/unzip this repo onto your computer
cd into the folder
uv sync
uv run jupyter lab

Session description

This session will show you how to use the Python programming language to scrape data from simple websites.

This session is good for: People with some experience working with data. Experience with Python and/or HTML is a plus but not necessary.

Session goals

Some of the ground you want to cover:

How to write and run Python code in a Jupyter notebook
Browser tools for inspecting the source code of a web page
How to use the requests library to fetch the HTML code for a web page
How to use the beautifulsoup4 library to parse the HTML
Using beautifulsoup4's find() and find_all() methods to target and extract information
Writing the results of a scrape to a CSV (if time)
Where to find instructions for installing Python on their own machines (or tell them about JupyterLab desktop or direct them to your install guide of choice)
How to find help when they get stuck

General approach

I Do, We Do, You Do. Demonstrate a concept, work through it together, then give them plenty of time to experiment on their own while you and your coach walk around and answer questions (see sections marked ✍️ Try it yourself).

The pace will be slower than you think, and that's OK! It's not the end of the world if you don't get through everything. Many people who come to this class will have zero experience with programming.

Class setup

We'll have the latest version of Python 3 installed. We're using uv to manage the virtual environment and project dependencies (jupyterlab, bs4 and requests), which will already have been installed and tested prior to your session.

Class outline

Start up the notebook server

Begin the class by walking everyone through the process of activating their virtual environments and launching JupyterLab. Or, if you prefer to use a different tool such as the Jupyter plugin for VS Code or whatever, get them set up.

Open the command-line interface
cd into your class directory (or, if you're on a Mac, you could have them right-click on the class folder and select Services > New Terminal at folder)
uv run jupyter lab

It will take everyone a few minutes to get going. You'll also probably get some questions about what you're doing at this step. Try to avoid a lengthy digression into virtual environments -- it's beyond the scope of this hourlong session, so maybe offer to talk to them after class, or send 'em our way: training@ire.org.

Once everyone is good to go, toggle back to the terminal and show them what's going on: A Jupyter server is running in the background, so don't close the terminal window.

Go over some notebook basics: Adding cells, writing code and running cells, etc. A common gotcha: Writing code that other cells depend on but forgetting to first run it to make it available.

Main course content

Start working your way through the notebook: Practice inspecting a web page, fetch a web page, parse the HTML, target and extract the data, write to CSV (if time). Pause frequently to ask if anyone has questions. There's a bunch of text at the beginning of the notebook that's mostly for them to read and reference, not necessarily a list of things to cover.

Any time you see ✍️ Try it yourself, hit the brakes and give everyone a little time to play around with whatever concept you're discussing.

In our experience, you'll want to budget more time than you'd think for showing how to parse data out of the BeautifulSoup object.

If you have Internet problems, you can pivot and work on the HTML file saved in this directory, sd-warn.html -- there's a cell with some commented-out code that folks can run to read in the HTML.

Debugging

If you can, find an opportunity when someone has gotten an error and take a few minutes to walk through basic debugging strategy: Reading the traceback error from bottom to top, strategic Googling, etc.

If you have extra time at the end

You can set them on the extra credit problems at the end of the notebook or oversee some unstructured lab time -- they can practice scraping other web pages or look up additional methods for navigating the souped HTML, etc.

Ending the session

Have everyone close out of their notebook tabs
In terminal, Ctrl+C to kill the server process
Close the terminal window

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Python syntax cheat sheet.ipynb		Python syntax cheat sheet.ipynb
README.md		README.md
Web scraping with Python.ipynb		Web scraping with Python.ipynb
pyproject.toml		pyproject.toml
sd-warn.html		sd-warn.html
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping with Python conference teaching guide

Take it for a spin

Run locally

Session description

Session goals

General approach

Class setup

Class outline

Start up the notebook server

Main course content

Debugging

If you have extra time at the end

Ending the session

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web scraping with Python conference teaching guide

Take it for a spin

Run locally

Session description

Session goals

General approach

Class setup

Class outline

Start up the notebook server

Main course content

Debugging

If you have extra time at the end

Ending the session

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages