Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/data-science/basics/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,18 @@ uv init --vcs none # (1)!
By default `--vcs git` is set, which initializes a git repository. Since
git is not within the scope of this project, we set `--vcs` to none.

???+ warning "Restart VS Code if command fails"

If the command returns an error saying `uv` was not found, close and reopen
VS Code. This allows your system to recognize the newly installed `uv`
executable. Then run the command again.

???+ tip "Navigate command history"

There's no need to re-type old commands. Press ++arrow-up++ /
++arrow-down++ to cycle through previously executed commands in the
terminal.

This initializes the project. `uv` creates a few files in your folder. Your
workspace should look like this:

Expand Down
24 changes: 22 additions & 2 deletions docs/data-science/data/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,26 @@ Examples include number of students (5) or age (22).
A simple rule of thumb: If you can meaningfully have fractional values, it's
continuous. If counting whole units makes more sense, it's discrete.

???+ warning "Numbers aren't always numerical data"

Just because data is stored as numbers doesn't make it numerical.
Consider ZIP codes, their mean is mathematically possible but conceptually
meaningless.

```python
zip_codes = pd.Series([6020, 1050, 6011, 1010])
print(f"Average ZIP code: {zip_codes.mean()}") # Makes no sense!
```

```title=">>> Output"
Average ZIP code: 3522.75
```

If you can't meaningfully add, subtract or average the values, it's
categorical data in disguise.

Other examples are customer IDs or coordinates.

### Categorical (Qualitative)

Categorical data represents qualities or characteristics that place
Expand Down Expand Up @@ -160,8 +180,8 @@ How many rows and columns has the `penguin` dataset?
- [ ] 5 rows and 8 columns
- [x] 344 rows and 7 columns

The data set has 344 rows (penguins) and 7 columns (features). Use `data.shape`
to quickly get the datasets dimensions.
The data set has 344 rows (penguins) and 7 columns (features). Use
`penguins.shape` to quickly get the datasets dimensions.
</quiz>

???+ question "Identify attribute types"
Expand Down