Skip to content

Commit e66ea40

Browse files
committed
link to Regexper
1 parent d9d0c75 commit e66ea40

1 file changed

Lines changed: 46 additions & 43 deletions

File tree

lecture_2.ipynb

Lines changed: 46 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,12 @@
3232
"\n",
3333
"- Taking notes in the lecture notebooks\n",
3434
"- Using [another Python/pandas learning resource](https://python-public-policy.afeld.me/en/{{school_slug}}/resources.html)\n",
35-
" - Hear things explained another way\n",
36-
" - Ask in [Ed Discussion]({{discussions_url}}) if others have recommendations\n",
35+
" - Hear things explained another way\n",
36+
" - Ask in [Ed Discussion]({{discussions_url}}) if others have recommendations\n",
3737
"- [Comment-driven development](https://www.sitepoint.com/comment-driven-development/)\n",
38-
" - Otherwise, trying to do two steps in your head:\n",
39-
" 1. Figuring out the logic\n",
40-
" 1. Figuring out the syntax"
38+
" - Otherwise, trying to do two steps in your head:\n",
39+
" 1. Figuring out the logic\n",
40+
" 1. Figuring out the syntax\n"
4141
]
4242
},
4343
{
@@ -57,7 +57,7 @@
5757
"```python\n",
5858
"# find valid ZIP codes\n",
5959
"# filter the DataFrame to only invalid ZIP codes\n",
60-
"```"
60+
"```\n"
6161
]
6262
},
6363
{
@@ -70,7 +70,7 @@
7070
"tags": []
7171
},
7272
"source": [
73-
"## [Boolean indexing](https://pandas.pydata.org/docs/user_guide/10min.html#boolean-indexing)"
73+
"## [Boolean indexing](https://pandas.pydata.org/docs/user_guide/10min.html#boolean-indexing)\n"
7474
]
7575
},
7676
{
@@ -225,7 +225,7 @@
225225
"tags": []
226226
},
227227
"source": [
228-
"When we compare single values (like `x > 6`), we get a single boolean back. Here, we are checking a _bunch_ of values, so we're going to get multiple booleans, returned as a Series."
228+
"When we compare single values (like `x > 6`), we get a single boolean back. Here, we are checking a _bunch_ of values, so we're going to get multiple booleans, returned as a Series.\n"
229229
]
230230
},
231231
{
@@ -365,7 +365,7 @@
365365
"\n",
366366
"```python\n",
367367
"people[people[\"age\"] > 40]\n",
368-
"```"
368+
"```\n"
369369
]
370370
},
371371
{
@@ -382,7 +382,7 @@
382382
"\n",
383383
"> Data Cleansing is a process of removing or fixing incorrect, malformed, incomplete, duplicate, or corrupted data\n",
384384
"\n",
385-
"https://hevodata.com/learn/data-cleansing-a-simplified-guide/"
385+
"https://hevodata.com/learn/data-cleansing-a-simplified-guide/\n"
386386
]
387387
},
388388
{
@@ -395,7 +395,7 @@
395395
"tags": []
396396
},
397397
"source": [
398-
"When have you needed to clean data?"
398+
"When have you needed to clean data?\n"
399399
]
400400
},
401401
{
@@ -408,7 +408,7 @@
408408
"tags": []
409409
},
410410
"source": [
411-
"What are continuous values?"
411+
"What are continuous values?\n"
412412
]
413413
},
414414
{
@@ -421,7 +421,7 @@
421421
"tags": []
422422
},
423423
"source": [
424-
"What are categorical values?"
424+
"What are categorical values?\n"
425425
]
426426
},
427427
{
@@ -439,16 +439,16 @@
439439
"From [my workshop on data cleaning](https://github.com/afeld/data-cleaning):\n",
440440
"\n",
441441
"- Missing data\n",
442-
" - Empty values\n",
442+
" - Empty values\n",
443443
"- Bad (junk) values\n",
444-
" - Duplicates\n",
445-
" - Mismatched types/formatting\n",
444+
" - Duplicates\n",
445+
" - Mismatched types/formatting\n",
446446
"- Categorical values\n",
447-
" - Uniqueness (cardinality)\n",
448-
" - Value counts\n",
447+
" - Uniqueness (cardinality)\n",
448+
" - Value counts\n",
449449
"- Continuous values\n",
450-
" - Ranges\n",
451-
" - Spread (distribution)"
450+
" - Ranges\n",
451+
" - Spread (distribution)\n"
452452
]
453453
},
454454
{
@@ -464,7 +464,7 @@
464464
"Notes:\n",
465465
"\n",
466466
"- \"Values\" in this case can be a single cell (in the spreadsheet sense) or a whole row\n",
467-
"- \"Missing\" or \"duplicates\" can be columns (Series), tables (DataFrames), rows, or cells"
467+
"- \"Missing\" or \"duplicates\" can be columns (Series), tables (DataFrames), rows, or cells\n"
468468
]
469469
},
470470
{
@@ -482,7 +482,7 @@
482482
"- Empty\n",
483483
"- Bad\n",
484484
"- Unique\n",
485-
"- Spread"
485+
"- Spread\n"
486486
]
487487
},
488488
{
@@ -496,7 +496,7 @@
496496
"tags": []
497497
},
498498
"source": [
499-
"## Setup"
499+
"## Setup\n"
500500
]
501501
},
502502
{
@@ -528,7 +528,7 @@
528528
"tags": []
529529
},
530530
"source": [
531-
"### Read our cleaned 311 Service Requests dataset"
531+
"### Read our cleaned 311 Service Requests dataset\n"
532532
]
533533
},
534534
{
@@ -571,7 +571,7 @@
571571
"\n",
572572
"More data cleaning!\n",
573573
"\n",
574-
"![Minion character vacuuming](https://impulsecreative.com/hs-fs/hubfs/cleaning-minion-gif.gif?width=490&name=cleaning-minion-gif.gif)"
574+
"![Minion character vacuuming](https://impulsecreative.com/hs-fs/hubfs/cleaning-minion-gif.gif?width=490&name=cleaning-minion-gif.gif)\n"
575575
]
576576
},
577577
{
@@ -586,7 +586,7 @@
586586
"source": [
587587
"```\n",
588588
"DtypeWarning: Columns (8,20,31,34) have mixed types.\n",
589-
"```"
589+
"```\n"
590590
]
591591
},
592592
{
@@ -1273,7 +1273,7 @@
12731273
"tags": []
12741274
},
12751275
"source": [
1276-
"ZIP codes _look_ numeric, but aren't really."
1276+
"ZIP codes _look_ numeric, but aren't really.\n"
12771277
]
12781278
},
12791279
{
@@ -1286,7 +1286,7 @@
12861286
"tags": []
12871287
},
12881288
"source": [
1289-
"[Read the ZIP codes in as strings.](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#text-data-types)"
1289+
"[Read the ZIP codes in as strings.](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#text-data-types)\n"
12901290
]
12911291
},
12921292
{
@@ -1323,7 +1323,7 @@
13231323
"tags": []
13241324
},
13251325
"source": [
1326-
"We fixed the dtype warning for column 8 (`Incident Zip`)."
1326+
"We fixed the dtype warning for column 8 (`Incident Zip`).\n"
13271327
]
13281328
},
13291329
{
@@ -1728,7 +1728,10 @@
17281728
"└─ start of string\n",
17291729
"```\n",
17301730
"\n",
1731-
"[regex101](https://regex101.com/) is useful for testing them."
1731+
"Helpful tools:\n",
1732+
"\n",
1733+
"- [Regexper](https://regexper.com/#%5E%5Cd%7B5%7D%28%3F%3A-%5Cd%7B4%7D%29%3F%24)\n",
1734+
"- [regex101](https://regex101.com/)\n"
17321735
]
17331736
},
17341737
{
@@ -1911,7 +1914,7 @@
19111914
"tags": []
19121915
},
19131916
"source": [
1914-
"[Clear](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#inserting-missing-data) any invalid ZIP codes:"
1917+
"[Clear](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#inserting-missing-data) any invalid ZIP codes:\n"
19151918
]
19161919
},
19171920
{
@@ -1939,7 +1942,7 @@
19391942
"tags": []
19401943
},
19411944
"source": [
1942-
"[`.loc[]`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) is used for overwriting a subset of values."
1945+
"[`.loc[]`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) is used for overwriting a subset of values.\n"
19431946
]
19441947
},
19451948
{
@@ -1956,7 +1959,7 @@
19561959
"\n",
19571960
"- Hard part is finding what needs to be done\n",
19581961
"- Will be specific to your use case\n",
1959-
"- Document what you did, since it will affect your results"
1962+
"- Document what you did, since it will affect your results\n"
19601963
]
19611964
},
19621965
{
@@ -1969,7 +1972,7 @@
19691972
"tags": []
19701973
},
19711974
"source": [
1972-
"## [In-class exercise](https://python-public-policy.afeld.me/en/{{school_slug}}/lecture_2_exercise.html)"
1975+
"## [In-class exercise](https://python-public-policy.afeld.me/en/{{school_slug}}/lecture_2_exercise.html)\n"
19731976
]
19741977
},
19751978
{
@@ -1984,7 +1987,7 @@
19841987
]
19851988
},
19861989
"source": [
1987-
"## [Concatenation](https://pandas.pydata.org/docs/user_guide/merging.html#concat)"
1990+
"## [Concatenation](https://pandas.pydata.org/docs/user_guide/merging.html#concat)\n"
19881991
]
19891992
},
19901993
{
@@ -2250,7 +2253,7 @@
22502253
"tags": []
22512254
},
22522255
"source": [
2253-
"## Simple [merge](https://pandas.pydata.org/docs/user_guide/merging.html#merge)"
2256+
"## Simple [merge](https://pandas.pydata.org/docs/user_guide/merging.html#merge)\n"
22542257
]
22552258
},
22562259
{
@@ -2263,7 +2266,7 @@
22632266
"tags": []
22642267
},
22652268
"source": [
2266-
"_I had [Copilot](https://code.visualstudio.com/docs/copilot/overview) generate the DataFrames, so no idea if the numbers are real._"
2269+
"_I had [Copilot](https://code.visualstudio.com/docs/copilot/overview) generate the DataFrames, so no idea if the numbers are real._\n"
22672270
]
22682271
},
22692272
{
@@ -2445,7 +2448,7 @@
24452448
"tags": []
24462449
},
24472450
"source": [
2448-
"How should we combine them?"
2451+
"How should we combine them?\n"
24492452
]
24502453
},
24512454
{
@@ -2617,7 +2620,7 @@
26172620
"source": [
26182621
"To join DataFrames together, we will use the [pandas `.merge()` function](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html#join-tables-using-a-common-identifier).\n",
26192622
"\n",
2620-
"![merge diagram](https://pandas.pydata.org/pandas-docs/stable/_images/08_merge_left.svg)"
2623+
"![merge diagram](https://pandas.pydata.org/pandas-docs/stable/_images/08_merge_left.svg)\n"
26212624
]
26222625
},
26232626
{
@@ -2635,7 +2638,7 @@
26352638
"- [SQL `JOIN`](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#join)\n",
26362639
"- [Spreadsheet `VLOOKUP`](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_spreadsheets.html#merging)\n",
26372640
"\n",
2638-
"In general, called [\"record linkage\" or \"entity resolution\"](https://en.wikipedia.org/wiki/Record_linkage)."
2641+
"In general, called [\"record linkage\" or \"entity resolution\"](https://en.wikipedia.org/wiki/Record_linkage).\n"
26392642
]
26402643
},
26412644
{
@@ -2815,7 +2818,7 @@
28152818
"tags": []
28162819
},
28172820
"source": [
2818-
"[Different types of merges](https://www.geeksforgeeks.org/different-types-of-joins-in-pandas/)"
2821+
"[Different types of merges](https://www.geeksforgeeks.org/different-types-of-joins-in-pandas/)\n"
28192822
]
28202823
},
28212824
{
@@ -2832,7 +2835,7 @@
28322835
"source": [
28332836
"## In-class exercise 2\n",
28342837
"\n",
2835-
"Compute the migrant population as a percent of total by country using [UN data](https://data.un.org/). You're welcome to talk with your neighbors."
2838+
"Compute the migrant population as a percent of total by country using [UN data](https://data.un.org/). You're welcome to talk with your neighbors.\n"
28362839
]
28372840
},
28382841
{
@@ -2845,7 +2848,7 @@
28452848
"tags": []
28462849
},
28472850
"source": [
2848-
"## [Homework 2](https://python-public-policy.afeld.me/en/{{school_slug}}/hw_2.html)"
2851+
"## [Homework 2](https://python-public-policy.afeld.me/en/{{school_slug}}/hw_2.html)\n"
28492852
]
28502853
}
28512854
],

0 commit comments

Comments
 (0)