jverzani · jverzani · Sep 3, 2025 · Sep 3, 2025 · Sep 3, 2025
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,7 @@
 /site_libs/
 /_freeze/
 /*/*_files/
+/*/*.html
 /*/*.ipynb/
 TODO.md
 Manifest.toml

diff --git a/EDA/bivariate-julia.qmd b/EDA/bivariate-julia.qmd
@@ -88,7 +88,7 @@ Putting the categorical variable first, presents a graphic (@fig-grouped-dotplot
 
 
 
-Regardless of how the graphic is produced, there appears to be a difference in the centers based on the species, as would be expected -- different species have different sizes.
+Regardless of how the graphic is produced, there appears to be a difference in the centers based on the species, as would be expected---different species have different sizes.
 
 
 
@@ -207,7 +207,7 @@ x_{1}, & x_{2}, \dots, x_{n}\\
 y_{1}, & y_{2}, \dots, y_{n}
 \end{align*}
 
-Or -- to emphasize how the data is paired off -- as $(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$.
+Or---to emphasize how the data is paired off---as $(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$.
 
 ### Numeric summaries
 
@@ -448,12 +448,17 @@ lm(@formula(PetalWidth ~ PetalLength), d)
 
 The output has more detail to be explained later. For now, we only need to know that the method `coef` will extract the coefficients (in the first column) as a vector of length 2, which we assign to the values `bhat0` and `bhat1` below:
 
+::: {#fig-regression-jitter}
+
 ```{julia}
 scatter(jitter(l), jitter(w); legend=false)  # spread out values
 bhat0, bhat1 = coef(res)                     # the coefficients
 plot!(x -> bhat0 + bhat1 * x)                # `predict` does this generically
 ```
 
+Scatter plot with computed regression line
+:::
+
 ::: {.callout-note}
 ##### A constant model
 
@@ -797,7 +802,7 @@ First, suppose we simply adjust the fitted lines up or down for each cluster. Th
 m2 = lm(@formula(PetalLength ~ PetalWidth + Species), iris)
 ```
 
-The second row in the output of `m2` has an identical interpretation as for `m1` -- it is the slope of the regression line. The first line of the output in `m1` is the $x$-intercept, which moves the line up or down. Whereas the first of `m2` is the $x$ intercept for a line that describes *just one* of the species, in this case `setosa`. (A coding for the regression model with a categorical variable chooses one reference level, in this case "setosa."). The 3rd and 4th lines are the slopes for the other two species.
+The second row in the output of `m2` has an identical interpretation as for `m1`---it is the slope of the regression line. The first line of the output in `m1` is the $x$-intercept, which moves the line up or down. Whereas the first of `m2` is the $x$ intercept for a line that describes *just one* of the species, in this case `setosa`. (A coding for the regression model with a categorical variable chooses one reference level, in this case "setosa."). The 3rd and 4th lines are the slopes for the other two species.
 
 We can plot these individually, one-by-one, in a similar manner as before, however when we call `predict` we include a level for `:Species`. The result is the middle figure in @fig-iris-scatterplot-regression.
 

diff --git a/EDA/categorical-data-julia.qmd b/EDA/categorical-data-julia.qmd
@@ -294,7 +294,7 @@ plot(p1, p2, layout = (@layout [a b]))
 
 As seen in the left graphic of @fig-grouped-barchart, there are groups of bars for each level of the first variable (`:Sex`); the groups represent the variable passed to the `group` keyword argument. The values are looked up in the data frame with the computed column that was named `:value` through the `combine` function.
 
-The same graphic on the left -- without the labeling -- is also made more directly with `groupedbar(freqtable(survey, :Sex, :Smoke))`
+The same graphic on the left---without the labeling---is also made more directly with `groupedbar(freqtable(survey, :Sex, :Smoke))`
 
 
 #### Andrews plot

diff --git a/EDA/makie.qmd b/EDA/makie.qmd
@@ -72,7 +72,7 @@ Both the `mapping` and `visual` calls can be used to set attributes:
 
 The attributes are those for the underlying plotting function. For `visual(BoxPlot)`, these can be seen at the help page for `boxplot`, displayed with the command `?boxplot`.
 
-The `mapping` calls shows two uses of the mini language for data manipulation. The basic form is `source => function => target` and works very much like the DataFrames mini language does for `select` or `transform`, but unlike those, the function is *always* applied by row. This makes some transformations, such as $z$-scores not possible within this call -- transformations requiring the entire column need to be done within the values passed to `data`. The abbreviated forms are just `source`, as used with the `color=:species` argument; `source => function`; and `source => target`, such as `:bill_length_mm => "bill length (mm)"` used to rename the variable for labeling purposes. When the source involves more than one column selector, tuples should be used to group them.
+The `mapping` calls shows two uses of the mini language for data manipulation. The basic form is `source => function => target` and works very much like the DataFrames mini language does for `select` or `transform`, but unlike those, the function is *always* applied by row. This makes some transformations, such as $z$-scores not possible within this call---transformations requiring the entire column need to be done within the values passed to `data`. The abbreviated forms are just `source`, as used with the `color=:species` argument; `source => function`; and `source => target`, such as `:bill_length_mm => "bill length (mm)"` used to rename the variable for labeling purposes. When the source involves more than one column selector, tuples should be used to group them.
 
 A few functions are provided to bypass the usual mapping of the data. (For example, `color` maps levels of a factor to a color ramp behind the scenes.) Among these are `nonnumeric` to pass a numeric variable to a value expecting a categorical variable and `verbatim` to avoid this mapping. The latter, `=> verbatim`, will be necessary to add when annotating a figure.
 
@@ -337,7 +337,9 @@ Quantile-quantile plots. The left graphic shows `QQPlot` used to compare the dis
 
 A scatter plot shows $x$ and $y$ pairs as points, a line plot connects these points. There are numerous ways to draw lines with the `AlgebraOfGraphics` including: `visual(Lines)`, for connect-the-dots lines; `visual(LinesFill)`, for shading; `visual(HLines)` and `visual(VLines)`, for horizontal and vertical lines; `visual(Rangebars)` to draw vertical or horizontal line segments.
 
-The graph of a function can be drawn using `Lines`, as in this example, where we add in different range bars to emphasize the role that the two parameters play in this function's graph:
+The graph of a function can be drawn using `Lines`, as in the example shown in @fig-line-plot, where we add in different range bars to emphasize the role that the two parameters play in the function's graph.
+
+::: {#fig-line-plot}
 
 ```{julia}
 ϕ(x; μ=0, σ=1) = 1/sqrt(2*pi*σ^2) * exp(-(1/(2σ)) * (x - μ)^2)
@@ -358,6 +360,9 @@ c += data((x=[1/10, 1/2], y=[0, ϕ(1)], label=["μ", "σ"])) *
 draw(c)
 ```
 
+Density of standard normal distribution with annotations
+:::
+
 The `Rangebars` visual has a `direction` argument, used above to make a horizontal range bar.
 
 The annotation has two subtleties: the qualification of `Makie.Text` is needed, as there is a `Text` type in base `Julia`. More idiosyncratically, the use of `verbatim` in `mapping` is needed to avoid an attempt to map the labels to a glyph, such as a pre-defined marker.
@@ -416,13 +421,16 @@ f
 
 A corner plot, as produced by the `PairPlots` package through its `pairplot` function, is a quick plot to show pair-wise relations amongst multiple numeric values. The graphic uses the lower part of a grid to show paired scatterplots with, by default, contour lines highlighting the relationship. On the diagonal are univariate density plots.
 
+::: {#fig-pairplot}
 ```{julia}
 using PairPlots
 nms = names(penguins, 3:5)
 p = select(penguins, nms .=> replace.(nms, "_mm" => "", "_" => " ")) # adjust names
 pairplot(p)
 ```
 
+Corner plot produced by the `PairPlots` package
+:::
 
 ### 3D scatterplots
 

diff --git a/EDA/tabular-data-julia.qmd b/EDA/tabular-data-julia.qmd
@@ -33,7 +33,7 @@ There are different ways to construct a data frame.
 
 Consider the task of the Wirecutter in trying to select the best [carry on travel bag](https://www.nytimes.com/wirecutter/reviews/best-carry-on-travel-bags/#how-we-picked-and-tested). After compiling a list of possible models by scouring travel blogs etc., they select some criteria (capacity, compartment design, aesthetics, comfort, ...) and compile data, similar to what one person collected in a
 [spreadsheet](https://docs.google.com/spreadsheets/d/1fSt_sO1s7moXPHbxBCD3JIKPa8QIZxtKWYUjD6ElZ-c/edit#gid=744941088).
-Here we create a much simplified spreadsheet for 3 listed bags with measurements of volume, price, laptop compatibility, loading style, and a last-checked date -- as this market improves constantly.
+Here we create a much simplified spreadsheet for 3 listed bags with measurements of volume, price, laptop compatibility, loading style, and a last-checked date---as this market improves constantly.
 
 ```
 product         v  p    l loads       checked
@@ -140,7 +140,7 @@ push!(d, Dict(:b => "Genius", :v => 25, :p => 228, :lap => "Y",
               :load => "clamshell", :d => Date("2022-10-01")))
 ```
 
-(A dictionary is a `key => value` container like a named tuple, but keys may be arbitrary `Julia` objects -- not always symbols -- so we explicitly use symbols in the above command.)
+(A dictionary is a `key => value` container like a named tuple, but keys may be arbitrary `Julia` objects---not always symbols---so we explicitly use symbols in the above command.)
 
 ::: {.callout-note}
 ##### The `Tables` interface
@@ -185,9 +185,10 @@ The filename, may be more general. For example, it could be `download(url)` for
 
 ::: {.callout-note}
 ##### Read and write
-The methods `read` and `write` are qualified in the above usage with the `CSV` module. In the `Julia` ecosystem, the `FileIO` package provides a common framework for reading and writing files; it uses the verbs `load` and `save`. This can also be used with `DataFrames`, though it works through the `CSVFiles` package -- and not `CSV`, as illustrated above. The read command would look like `DataFrame(load(fname))` and the write command like `save(fname, df)`. Here `fname` would have a ".csv" extension so that the type of file could be detected.
+The methods `read` and `write` are qualified in the above usage with the `CSV` module. In the `Julia` ecosystem, the `FileIO` package provides a common framework for reading and writing files; it uses the verbs `load` and `save`. This can also be used with `DataFrames`, though it works through the `CSVFiles` package---and not `CSV`, as illustrated above. The read command would look like `DataFrame(load(fname))` and the write command like `save(fname, df)`. Here `fname` would have a ".csv" extension so that the type of file could be detected.
 :::
 
+
 | Command | Description |
 |---------|-------------|
 | `CSV.read(file_name, DataFrame)` | Read csv file from file with given name |
@@ -197,7 +198,8 @@ The methods `read` and `write` are qualified in the above usage with the `CSV` m
 | `DataFrame(load(file_name))` | Read csv file from file with given name using `CSVFiles` |
 | `save(file_name, df)` | Write data frame `df` to a csv file using `CSVFiles` |
 
-: Basic usage to read/write `.csv` file into a data frame.
+: Basic usage to read/write `.csv` file into a data frame. {#tbl-read-write-data-frame}
+
 
 #### TableScraper
 
@@ -266,7 +268,7 @@ can be very complicated, but here we only assume that `r"name"` will
 match "name" somewhere in the string; `r"^name"` and `r"name$"` will
 match "name" at the beginning and ending of a string.  Using a regular
 expression will return a data frame row (when a row index is
-specified) -- not a value -- as it is possible to return 0, 1 or more
+specified)---not a value---as it is possible to return 0, 1 or more
 columns in the selection.
 
 
@@ -395,7 +397,7 @@ For the `cars` data set, the latter can be used to extract the Volkswagen models
 cars[cars.Manufacturer .== "Volkswagen", :]
 ```
 
-This approach lends itself to the description "find all rows matching some value" then "extract the identified rows," -- written as two steps to emphasize there are two passes through the data. Another mental model would be loop over the rows, and keep those that match the query. This is done generically by the `filter` function for collections in `Julia` or by the `subset` function of `DataFrames`.
+This approach lends itself to the description "find all rows matching some value" then "extract the identified rows,"---written as two steps to emphasize there are two passes through the data. Another mental model would be loop over the rows, and keep those that match the query. This is done generically by the `filter` function for collections in `Julia` or by the `subset` function of `DataFrames`.
 
 The `filter(predicate, collection)` function is used to identify just the values in the collection for which the predicate function returns `true`. When a data frame is used with `filter`, the iteration is over the rows, so the wrapping `eachrow` iterator is not needed. We need a predicate function to replace the `.==` above. One follows. It doesn't need `.==`, as `r` is a data frame row and access produces a value not a vector:
 
@@ -796,7 +798,7 @@ legos.youngest_age = categorical(legos.youngest_age, ordered=true)
 first(legos[:,r"age"], 2)
 ```
 
-With that ordering, an expected pattern becomes clear -- kits for older users have on average more pieces -- though there are unexpected exceptions:
+With that ordering, an expected pattern becomes clear---kits for older users have on average more pieces---though there are unexpected exceptions:
 
 ```{julia}
 @chain legos begin