Merge pull request #179 from KIPAC/minor_corrections

abmantz · web-flow · commit 82fbbdcb7f00 · 2021-06-21T15:38:34.000-07:00
minor corrections
diff --git a/docs/notes/essential_probability.html b/docs/notes/essential_probability.html
@@ -14424,7 +14424,7 @@ <h3 id="Bernoulli-distribution">Bernoulli distribution<a class="anchor-link" hre
 </div>
 <div class="jp-Cell-inputWrapper"><div class="jp-InputPrompt jp-InputArea-prompt">
 </div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
-<h3 id="Binomial-distribution">Binomial distribution<a class="anchor-link" href="#Binomial-distribution">&#182;</a></h3><p>... whose PMF is $p(k|q,n) = {n \choose k} q^k (1-q)^{n-k}$ for $k$ successes. ${n \choose k}$ is the combinatoric "choose" function, $\frac{n!}{k!(n-k)!}$.</p>
+<h3 id="Binomial-distribution">Binomial distribution<a class="anchor-link" href="#Binomial-distribution">&#182;</a></h3><p>... whose PMF is $P(k|q,n) = {n \choose k} q^k (1-q)^{n-k}$ for $k$ successes. ${n \choose k}$ is the combinatoric "choose" function, $\frac{n!}{k!(n-k)!}$.</p>
 <p>The Binomial distribution is additive, in that the sum of two Binomial random variables with the same $q$, respectively with $n_1$ and $n_2$ trials, also follows the Binomial distribution, with parameters $q$ and $n_1+n_2$.</p>
 <p>This distribution might be useful for inferring the fraction of objects belonging to one of two classes ($q$), based on typing of an unbiased but finite number of them ($n$). (The multinomial generalization would work for more than two classes.)</p>
 <p>Or, consider the case that each Bernoulli trial is whether a photon hits our detector in a given, tiny time interval. For an integration whose total length is much longer than any such time interval we could plausibly distinguish, we might take the limit where $n$ becomes huge, thus reaching the...</p>
@@ -14434,7 +14434,7 @@ <h3 id="Binomial-distribution">Binomial distribution<a class="anchor-link" href=
 <div class="jp-Cell-inputWrapper"><div class="jp-InputPrompt jp-InputArea-prompt">
 </div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
 <h3 id="Poisson-distribution">Poisson distribution<a class="anchor-link" href="#Poisson-distribution">&#182;</a></h3><p>... which describes the number of successes when the number of trials is in principle infinite, but $q$ is correspondingly vanishingly small. It has a single parameter, $\mu$, which corresponds to the product $qn$ when interpretted as a limit of the Binomial distribution. The PMF is</p>
-<p>$p(k|\mu) = \frac{\mu^k e^{-\mu}}{k!}$.</p>
+<p>$P(k|\mu) = \frac{\mu^k e^{-\mu}}{k!}$.</p>
 <p>Like the Binomial distribution, the Poisson distribution is additive. It also has the following (probably familiar) properties:</p>
 <ul>
 <li>Expectation value (mean) $\langle k\rangle = \mu$,</li>
@@ -14447,10 +14447,10 @@ <h3 id="Poisson-distribution">Poisson distribution<a class="anchor-link" href="#
 <div class="jp-Cell-inputWrapper"><div class="jp-InputPrompt jp-InputArea-prompt">
 </div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
 <h3 id="Normal-or-Gaussian-distribution">Normal or Gaussian distribution<a class="anchor-link" href="#Normal-or-Gaussian-distribution">&#182;</a></h3><p>... which is, more generally, defined over the real line as</p>
-<p>$p(x|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[ \frac{(x-\mu)^2}{\sigma^2} \right]$,</p>
+<p>$p(x|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right]$,</p>
 <p>and has mean $\mu$ and variance $\sigma^2$. With $\mu=0$ and $\sigma=1$, this is known as the standard normal distribution.</p>
 <p>Because physicists love Gauss more than normality, you can expect to see "Gaussian" more often in the (astro)physics literature. Statisticians normally say "normal".</p>
-<p>The limiting relationship between the Poisson and Normal distributions is one example of the <strong>central limit theorem</strong>, which says that, under quite general conditions, the average of a large number of random variables tends towards normal, even if the individual variables are not normally distributed. Whether and how well it applies, and just what "large number means", in any given situation is a practical question more than anything else, at least in this course. But the bottom line is that this distribution shows up quite often, and is normally a reasonable guess in those situations where we have to make a choice and have no better ideas.</p>
+<p>The limiting relationship between the Poisson and Normal distributions is one example of the <strong>central limit theorem</strong>, which says that, under quite general conditions, the average of a large number of random variables tends towards normal, even if the individual variables are not normally distributed. Whether and how well it applies, and just what "large number" means, in any given situation is a practical question more than anything else, at least in this course. But the bottom line is that this distribution shows up quite often, and is normally a reasonable guess in those situations where we have to make a choice and have no better ideas.</p>
 <p>Note that intentionally binning up data, when doing so throws out potentially useful information, in the hopes of satisfying the central limit theorem, is generally frowned upon in this class.</p>
 <p>Finally, the sum of squares of $\nu$ standard normal variables follows the...</p>
 
diff --git a/docs/notes/generative_models.html b/docs/notes/generative_models.html
@@ -14262,7 +14262,7 @@
 <h1 id="Notes:-Generative-Models">Notes: Generative Models<a class="anchor-link" href="#Notes:-Generative-Models">&#182;</a></h1><p>Containing:</p>
 <ul>
 <li>An introduction to probabilistic models, why we need them, and how to work with them</li>
-<li>Hopefully, some resolution of the frequently confusing key ideas from the overview: (i) data are random, and (2) data are constants</li>
+<li>Hopefully, some resolution of the frequently confusing key ideas from the overview: (1) data are random, and (2) data are constants</li>
 </ul>
 
 </div>
@@ -14462,7 +14462,7 @@ <h2 id="Interpreting-generative-models">Interpreting generative models<a class="
 <h4 id="Endnotes">Endnotes<a class="anchor-link" href="#Endnotes">&#182;</a></h4><ol>
 <li><p>If this bothers you, consider playing <a href="https://en.wikipedia.org/wiki/Frogger">Frogger</a> while blindfolded. The game is executing a fully deterministic program, so in reality there is no uncertainty about whether the frog will be smooshed if it hops into the road at a given time. The player is missing this key information, but can still, in principle, make predictions about how likely the frog is to be smooshed at any given time using probabilistic modeling. With repeated experimentation, they could even refine this model to make it more accurate. As far as the player is concerned, whether there is an obstacle in the road is <em>functionally</em> random, despite the underlying mechanics being non-random.</p>
 </li>
-<li><p>We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like "data". Probability <em>densities</em> over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka "probability mass") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See <strong>probability notes</strong>.)</p>
+<li><p>We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like "data". Probability <em>densities</em> over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka "probability mass") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See the <a href="essential_probability.html">probability notes</a>.)</p>
 </li>
 </ol>
 
diff --git a/html/index.ipynb b/html/index.ipynb
@@ -52,7 +52,7 @@
     "* [Bayesianism and Other \"isms\"](notes/statistical_isms.ipynb)\n",
     "* [Monte Carlo Sampling](notes/montecarlo.ipynb)\n",
     "* [MCMC Diagnostics](notes/mcmc_diagnostics.ipynb)\n",
-    "* [More Sampling Methods](notes/more_samplers.ipynb)\n",
+    "* [More Sampling Methods](notes/more_samplers.ipynb) (and [MC Packages](notes/MC_packages.ipynb))\n",
     "* [Hierarchical Models](notes/hierarchical.ipynb)\n",
     "* [Model Evaluation and Comparison](notes/model_evaluation.ipynb)\n",
     "* [Missing Data and Selection Effects](notes/missingdata.ipynb)\n",
diff --git a/notes/essential_probability.ipynb b/notes/essential_probability.ipynb
@@ -269,7 +269,7 @@
    "source": [
     "### Binomial distribution\n",
     "\n",
-    "... whose PMF is $p(k|q,n) = {n \\choose k} q^k (1-q)^{n-k}$ for $k$ successes. ${n \\choose k}$ is the combinatoric \"choose\" function, $\\frac{n!}{k!(n-k)!}$.\n",
+    "... whose PMF is $P(k|q,n) = {n \\choose k} q^k (1-q)^{n-k}$ for $k$ successes. ${n \\choose k}$ is the combinatoric \"choose\" function, $\\frac{n!}{k!(n-k)!}$.\n",
     "\n",
     "The Binomial distribution is additive, in that the sum of two Binomial random variables with the same $q$, respectively with $n_1$ and $n_2$ trials, also follows the Binomial distribution, with parameters $q$ and $n_1+n_2$.\n",
     "\n",
@@ -286,7 +286,7 @@
     "\n",
     "... which describes the number of successes when the number of trials is in principle infinite, but $q$ is correspondingly vanishingly small. It has a single parameter, $\\mu$, which corresponds to the product $qn$ when interpretted as a limit of the Binomial distribution. The PMF is\n",
     "\n",
-    "$p(k|\\mu) = \\frac{\\mu^k e^{-\\mu}}{k!}$.\n",
+    "$P(k|\\mu) = \\frac{\\mu^k e^{-\\mu}}{k!}$.\n",
     "\n",
     "Like the Binomial distribution, the Poisson distribution is additive. It also has the following (probably familiar) properties:\n",
     "* Expectation value (mean) $\\langle k\\rangle = \\mu$,\n",
@@ -303,13 +303,13 @@
     "\n",
     "... which is, more generally, defined over the real line as\n",
     "\n",
-    "$p(x|\\mu, \\sigma) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}} \\exp \\left[ \\frac{(x-\\mu)^2}{\\sigma^2} \\right]$,\n",
+    "$p(x|\\mu, \\sigma) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}} \\exp \\left[ -\\frac{(x-\\mu)^2}{2\\sigma^2} \\right]$,\n",
     "\n",
     "and has mean $\\mu$ and variance $\\sigma^2$. With $\\mu=0$ and $\\sigma=1$, this is known as the standard normal distribution.\n",
     "\n",
     "Because physicists love Gauss more than normality, you can expect to see \"Gaussian\" more often in the (astro)physics literature. Statisticians normally say \"normal\".\n",
     "\n",
-    "The limiting relationship between the Poisson and Normal distributions is one example of the **central limit theorem**, which says that, under quite general conditions, the average of a large number of random variables tends towards normal, even if the individual variables are not normally distributed. Whether and how well it applies, and just what \"large number means\", in any given situation is a practical question more than anything else, at least in this course. But the bottom line is that this distribution shows up quite often, and is normally a reasonable guess in those situations where we have to make a choice and have no better ideas.\n",
+    "The limiting relationship between the Poisson and Normal distributions is one example of the **central limit theorem**, which says that, under quite general conditions, the average of a large number of random variables tends towards normal, even if the individual variables are not normally distributed. Whether and how well it applies, and just what \"large number\" means, in any given situation is a practical question more than anything else, at least in this course. But the bottom line is that this distribution shows up quite often, and is normally a reasonable guess in those situations where we have to make a choice and have no better ideas.\n",
     "\n",
     "Note that intentionally binning up data, when doing so throws out potentially useful information, in the hopes of satisfying the central limit theorem, is generally frowned upon in this class.\n",
     "\n",
diff --git a/notes/generative_models.ipynb b/notes/generative_models.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "Containing:\n",
     "* An introduction to probabilistic models, why we need them, and how to work with them\n",
-    "* Hopefully, some resolution of the frequently confusing key ideas from the overview: (i) data are random, and (2) data are constants"
+    "* Hopefully, some resolution of the frequently confusing key ideas from the overview: (1) data are random, and (2) data are constants"
    ]
   },
   {
@@ -261,7 +261,7 @@
     "\n",
     "1. If this bothers you, consider playing [Frogger](https://en.wikipedia.org/wiki/Frogger) while blindfolded. The game is executing a fully deterministic program, so in reality there is no uncertainty about whether the frog will be smooshed if it hops into the road at a given time. The player is missing this key information, but can still, in principle, make predictions about how likely the frog is to be smooshed at any given time using probabilistic modeling. With repeated experimentation, they could even refine this model to make it more accurate. As far as the player is concerned, whether there is an obstacle in the road is _functionally_ random, despite the underlying mechanics being non-random.\n",
     "\n",
-    "2. We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like \"data\". Probability _densities_ over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka \"probability mass\") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See **probability notes**.)"
+    "2. We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like \"data\". Probability _densities_ over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka \"probability mass\") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See the [probability notes](essential_probability.ipynb).)"
    ]
   }
  ],
diff --git a/notes/mcmc_diagnostics.ipynb b/notes/mcmc_diagnostics.ipynb
@@ -51,7 +51,7 @@
     "\n",
     "Convergence does **not** mean\n",
     "* that parameters are \"well constrained\" by the data. If you're not satisfied with the constraints you're getting, that's a function of the data and the model, not necessarily a failing of the MCMC sampler.\n",
-    "* that the autocorrelation length is small. A chain can be converged _and_ still highly correlated (though this is not desirable.\n",
+    "* that the autocorrelation length is small. A chain can be converged _and_ still highly correlated (though this is not desirable).\n",
     "* that there are not occasional excursions beyond a locus in parameter space. If the PDF has tails, the chain will need to occasionally find its way to them; otherwise it isn't doing its job."
    ]
   },
diff --git a/tutorials/gaussians.ipynb b/tutorials/gaussians.ipynb
@@ -300,7 +300,7 @@
    "source": [
     "Now, compute the parameters of the posterior for $\\beta$ based on $\\mu_\\beta$ and $\\Sigma_\\beta$ (parameters that appear in the sampling distribution) and the parameters of the conjugate prior. Set the prior parameters to be equivalent to the uniform distribution for the check below (you can put in something different to see how it looks later).\n",
     "\n",
-    "Transform `post_mean` to a shape (3,) numpy array for convenience (as opposed to, say, a 3x1 matrix)."
+    "Transform `post_mean` to a shape (2,) numpy array for convenience (as opposed to, say, a 2x1 matrix)."
    ]
   },
   {
diff --git a/tutorials/probability_transformations.ipynb b/tutorials/probability_transformations.ipynb
@@ -33,7 +33,7 @@
    "source": [
     "## 1. Solve it\n",
     "\n",
-    "Given the PDF of $\\theta$ and the function $b(\\theta)$, what is the PDF of $b$, $p(b)$? Fill in an equation below, and also define it as a function. (As simple as it is, you might want to explciitly write down $p(\\theta)$ first.)"
+    "Given the PDF of $\\theta$ and the function $b(\\theta)$, what is the PDF of $b$, $p(b)$? Fill in an equation below, and also define it as a function. (As simple as it is, you might want to explicitly write down $p(\\theta)$ first.)"
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`"\n",`
`9`	`9`	`"Containing:\n",`
`10`	`10`	`"* An introduction to probabilistic models, why we need them, and how to work with them\n",`
`11`		`- "* Hopefully, some resolution of the frequently confusing key ideas from the overview: (i) data are random, and (2) data are constants"`
	`11`	`+ "* Hopefully, some resolution of the frequently confusing key ideas from the overview: (1) data are random, and (2) data are constants"`
`12`	`12`	`]`
`13`	`13`	`},`
`14`	`14`	`{`
`@@ -261,7 +261,7 @@`
`261`	`261`	`"\n",`
`262`	`262`	"1. If this bothers you, consider playing [Frogger](https://en.wikipedia.org/wiki/Frogger) while blindfolded. The game is executing a fully deterministic program, so in reality there is no uncertainty about whether the frog will be smooshed if it hops into the road at a given time. The player is missing this key information, but can still, in principle, make predictions about how likely the frog is to be smooshed at any given time using probabilistic modeling. With repeated experimentation, they could even refine this model to make it more accurate. As far as the player is concerned, whether there is an obstacle in the road is _functionally_ random, despite the underlying mechanics being non-random.\n",
`263`	`263`	`"\n",`
`264`		`- "2. We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like \"data\". Probability _densities_ over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka \"probability mass\") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See probability notes.)"`
	`264`	+ "2. We'll follow the convention of using capital $P$ for the probability of a discrete outcome, e.g. $P(N)$ where $N$ is integer-values, or for a generic argument like \"data\". Probability _densities_ over real-valued variables will get a lower-case $p$. The distinction between capital-$P$ Probability (aka \"probability mass\") and probability density doesn't matter too often, but occasionally can lead to confusion, so we'll try to keep them straight. (See the [probability notes](essential_probability.ipynb).)"
`265`	`265`	`]`
`266`	`266`	`}`
`267`	`267`	`],`
Original file line number	Diff line number	Diff line change
`@@ -51,7 +51,7 @@`
`51`	`51`	`"\n",`
`52`	`52`	`"Convergence does not mean\n",`
`53`	`53`	`"* that parameters are \"well constrained\" by the data. If you're not satisfied with the constraints you're getting, that's a function of the data and the model, not necessarily a failing of the MCMC sampler.\n",`
`54`		`- "* that the autocorrelation length is small. A chain can be converged _and_ still highly correlated (though this is not desirable.\n",`
	`54`	`+ "* that the autocorrelation length is small. A chain can be converged _and_ still highly correlated (though this is not desirable).\n",`
`55`	`55`	`"* that there are not occasional excursions beyond a locus in parameter space. If the PDF has tails, the chain will need to occasionally find its way to them; otherwise it isn't doing its job."`
`56`	`56`	`]`
`57`	`57`	`},`
Original file line number	Diff line number	Diff line change
`@@ -300,7 +300,7 @@`
`300`	`300`	`"source": [`
`301`	`301`	`"Now, compute the parameters of the posterior for $\\beta$ based on $\\mu_\\beta$ and $\\Sigma_\\beta$ (parameters that appear in the sampling distribution) and the parameters of the conjugate prior. Set the prior parameters to be equivalent to the uniform distribution for the check below (you can put in something different to see how it looks later).\n",`
`302`	`302`	`"\n",`
`303`		- "Transform `post_mean` to a shape (3,) numpy array for convenience (as opposed to, say, a 3x1 matrix)."
	`303`	+ "Transform `post_mean` to a shape (2,) numpy array for convenience (as opposed to, say, a 2x1 matrix)."
`304`	`304`	`]`
`305`	`305`	`},`
`306`	`306`	`{`
Original file line number	Diff line number	Diff line change
`@@ -33,7 +33,7 @@`
`33`	`33`	`"source": [`
`34`	`34`	`"## 1. Solve it\n",`
`35`	`35`	`"\n",`
`36`		`- "Given the PDF of $\\theta$ and the function $b(\\theta)$, what is the PDF of $b$, $p(b)$? Fill in an equation below, and also define it as a function. (As simple as it is, you might want to explciitly write down $p(\\theta)$ first.)"`
	`36`	`+ "Given the PDF of $\\theta$ and the function $b(\\theta)$, what is the PDF of $b$, $p(b)$? Fill in an equation below, and also define it as a function. (As simple as it is, you might want to explicitly write down $p(\\theta)$ first.)"`
`37`	`37`	`]`
`38`	`38`	`},`
`39`	`39`	`{`