applydp is a slimmed-down version of
applywith only Datapusher+ relevant subcommands/operations (qsvdpbinary variant only).
Table of Contents | Source: src/cmd/applydp.rs | 📇🚀🔣👆 
Description | Examples | Usage | Arguments | Applydp Options | Operations Options | Common Options
Description ↩
applydp is a slimmed-down version of apply specifically created for Datapusher+. It "applies" a series of transformation functions to given CSV column/s. This can be used to perform typical data-wrangling tasks and/or to harmonize some values, etc.
It has three subcommands:
- operations* - 18 string, format & regex operators.
- emptyreplace* - replace empty cells with <--replacement> string.
- dynfmt - Dynamically constructs a new column from other columns using the <--formatstr> template.
- subcommand is multi-column capable.
OPERATIONS (multi-column capable) Multiple operations can be applied, with the comma-delimited operation series applied in order:
trim => Trim the cell trim,upper => Trim the cell, then transform to uppercase
Operations support multi-column transformations. Just make sure the number of transformed columns with the --rename option is the same. e.g.:
$ qsv applydp operations trim,upper col1,col2,col3 -r newcol1,newcol2,newcol3 file.csvIt has 18 supported operations:
- len: Return string length
- lower: Transform to lowercase
- upper: Transform to uppercase
- squeeze: Compress consecutive whitespaces
- squeeze0: Remove whitespace
- trim: Trim (drop whitespace left & right of the string)
- ltrim: Left trim whitespace
- rtrim: Right trim whitespace
- mtrim: Trims --comparand matches left & right of the string (Rust trim_matches)
- mltrim: Left trim --comparand matches (Rust trim_start_matches)
- mrtrim: Right trim --comparand matches (Rust trim_end_matches)
- strip_prefix: Removes specified prefix in --comparand
- strip_suffix: Remove specified suffix in --comparand
- escape - escape (Rust escape_default)
- replace: Replace all matches of a pattern (using --comparand) with a string (using --replacement) (Rust replace)
- regex_replace: Replace all regex matches in --comparand w/ --replacement. Specify as --replacement to remove matches.
- round: Round numeric values to the specified number of decimal places using Midpoint Nearest Even Rounding Strategy AKA "Bankers Rounding." Specify the number of decimal places with --formatstr (default: 3).
- copy: Mark a column for copying
EMPTYREPLACE (multi-column capable)
Replace empty cells with <--replacement> string.
Non-empty cells are not modified. See the fill command for more complex empty field operations.
Dynamically constructs a new column from other columns using the <--formatstr> template. The template can contain arbitrary characters. To insert a column value, enclose the column name in curly braces, replacing all non-alphanumeric characters with underscores.
If you need to dynamically construct a column with more complex formatting requirements and computed values, check out the py command to take advantage of Python's f-string formatting.
Examples ↩
Trim, then transform to uppercase the surname field.
qsv applydp operations trim,upper surname file.csvTrim, then transform to uppercase the surname field and rename the column uppercase_clean_surname.
qsv applydp operations trim,upper surname -r uppercase_clean_surname file.csvTrim, then transform to uppercase the surname field and save it to a new column named uppercase_clean_surname.
qsv applydp operations trim,upper surname -c uppercase_clean_surname file.csvTrim, squeeze, then transform to uppercase in place ALL fields that end with "_name"
qsv applydp operations trim,squeeze,upper '/_name$/' file.csvTrim, then transform to uppercase the firstname and surname fields and rename the columns ufirstname and usurname.
qsv applydp operations trim,upper firstname,surname -r ufirstname,usurname file.csvTrim parentheses & brackets from the description field.
qsv applydp operations mtrim description --comparand '()<>' file.csvReplace ' and ' with ' & ' in the description field.
qsv applydp operations replace description --comparand ' and ' --replacement ' & ' file.csvYou can also use this subcommand command to make a copy of a column:
qsv applydp operations copy col_to_copy -c col_copy file.csvReplace empty cells in file.csv Measurement column with 'None'.
qsv applydp emptyreplace Measurement --replacement None file.csvReplace empty cells in file.csv Measurement column with 'Unknown Measurement'.
qsv applydp emptyreplace Measurement --replacement 'Unknown Measurement' file.csvReplace empty cells in file.csv M1,M2 and M3 columns with 'None'.
qsv applydp emptyreplace M1,M2,M3 --replacement None file.csvReplace all empty cells in file.csv for columns that start with 'Measurement' with 'None'.
qsv applydp emptyreplace '/^Measurement/' --replacement None file.csvReplace all empty cells in file.csv for columns that start with 'observation' case insensitive with 'None'.
qsv applydp emptyreplace --replacement None '/(?i)^observation/' file.csvCreate a new column 'mailing address' from 'house number', 'street', 'city' and 'zip-code' columns:
qsv applydp dynfmt --formatstr '{house_number} {street}, {city} {zip_code} USA' -c 'mailing address' file.csvCreate a new column 'FullName' from 'FirstName', 'MI', and 'LastName' columns:
qsv applydp dynfmt --formatstr 'Sir/Madam {FirstName} {MI}. {LastName}' -c FullName file.csvFor more examples, see tests.
Usage ↩
qsv applydp operations <operations> [options] <column> [<input>]
qsv applydp emptyreplace --replacement=<string> [options] <column> [<input>]
qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --helpArguments ↩
| Argument | Description |
|---|---|
<column> |
The column/s to apply the transformation to. Note that the argument supports multiple columns for the operations & emptyreplace subcommands. See 'qsv select --help' for the format details. |
<operations> |
The operation/s to apply. |
<column> |
The column/s to apply the operations to. |
<column> |
The column/s to check for emptiness. |
<input> |
The input file to read from. If not specified, reads from stdin. |
Applydp Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑c,‑‑new‑column |
string | Put the transformed values in a new column instead. | |
‑r,‑‑rename |
string | New name for the transformed column. | |
‑C,‑‑comparand=<string> |
string | The string to compare against for replace, strip, match-trim (mtrim/mltrim/mrtrim) & regex_replace operations. | |
‑R,‑‑replacement=<string> |
string | The string to use for the replace & emptyreplace operations. | |
‑f,‑‑formatstr=<string> |
string | This option is used by several subcommands: |
Operations Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑j,‑‑jobs |
string | The number of jobs to run in parallel. When not set, the number of jobs is set to the number of CPUs detected. | |
‑b,‑‑batch |
string | The number of rows per batch to load into memory, before running in parallel. Set to 0 to load all rows in one batch. | 50000 |
Common Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑h,‑‑help |
flag | Display this message | |
‑o,‑‑output |
string | Write output to instead of stdout. | |
‑n,‑‑no‑headers |
flag | When set, the first row will not be interpreted as headers. | |
‑d,‑‑delimiter |
string | The field delimiter for reading CSV data. Must be a single character. (default: ,) |
Source: src/cmd/applydp.rs
| Table of Contents | README