diff --git a/docs/data-science/basics/setup.md b/docs/data-science/basics/setup.md index bf758dd2..421433cb 100644 --- a/docs/data-science/basics/setup.md +++ b/docs/data-science/basics/setup.md @@ -143,6 +143,18 @@ uv init --vcs none # (1)! By default `--vcs git` is set, which initializes a git repository. Since git is not within the scope of this project, we set `--vcs` to none. +???+ warning "Restart VS Code if command fails" + + If the command returns an error saying `uv` was not found, close and reopen + VS Code. This allows your system to recognize the newly installed `uv` + executable. Then run the command again. + +???+ tip "Navigate command history" + + There's no need to re-type old commands. Press ++arrow-up++ / + ++arrow-down++ to cycle through previously executed commands in the + terminal. + This initializes the project. `uv` creates a few files in your folder. Your workspace should look like this: diff --git a/docs/data-science/data/basics.md b/docs/data-science/data/basics.md index 9274e0b8..a934a7b9 100644 --- a/docs/data-science/data/basics.md +++ b/docs/data-science/data/basics.md @@ -71,6 +71,26 @@ Examples include number of students (5) or age (22). A simple rule of thumb: If you can meaningfully have fractional values, it's continuous. If counting whole units makes more sense, it's discrete. +???+ warning "Numbers aren't always numerical data" + + Just because data is stored as numbers doesn't make it numerical. + Consider ZIP codes, their mean is mathematically possible but conceptually + meaningless. + + ```python + zip_codes = pd.Series([6020, 1050, 6011, 1010]) + print(f"Average ZIP code: {zip_codes.mean()}") # Makes no sense! + ``` + + ```title=">>> Output" + Average ZIP code: 3522.75 + ``` + + If you can't meaningfully add, subtract or average the values, it's + categorical data in disguise. + + Other examples are customer IDs or coordinates. + ### Categorical (Qualitative) Categorical data represents qualities or characteristics that place @@ -160,8 +180,8 @@ How many rows and columns has the `penguin` dataset? - [ ] 5 rows and 8 columns - [x] 344 rows and 7 columns -The data set has 344 rows (penguins) and 7 columns (features). Use `data.shape` -to quickly get the datasets dimensions. +The data set has 344 rows (penguins) and 7 columns (features). Use +`penguins.shape` to quickly get the datasets dimensions. ???+ question "Identify attribute types"