applydp

applydp is a slimmed-down version of apply with only Datapusher+ relevant subcommands/operations (qsvdp binary variant only).

Table of Contents | Source: src/cmd/applydp.rs | 📇🚀🔣👆

Description ↩

applydp is a slimmed-down version of apply specifically created for Datapusher+. It "applies" a series of transformation functions to given CSV column/s. This can be used to perform typical data-wrangling tasks and/or to harmonize some values, etc.

It has three subcommands:

operations* - 18 string, format & regex operators.
emptyreplace* - replace empty cells with <--replacement> string.
dynfmt - Dynamically constructs a new column from other columns using the <--formatstr> template.

subcommand is multi-column capable.

OPERATIONS (multi-column capable) Multiple operations can be applied, with the comma-delimited operation series applied in order:

trim => Trim the cell trim,upper => Trim the cell, then transform to uppercase

Operations support multi-column transformations. Just make sure the number of transformed columns with the --rename option is the same. e.g.:

$ qsv applydp operations trim,upper col1,col2,col3 -r newcol1,newcol2,newcol3 file.csv

It has 18 supported operations:

len: Return string length
lower: Transform to lowercase
upper: Transform to uppercase
squeeze: Compress consecutive whitespaces
squeeze0: Remove whitespace
trim: Trim (drop whitespace left & right of the string)
ltrim: Left trim whitespace
rtrim: Right trim whitespace
mtrim: Trims --comparand matches left & right of the string (Rust trim_matches)
mltrim: Left trim --comparand matches (Rust trim_start_matches)
mrtrim: Right trim --comparand matches (Rust trim_end_matches)
strip_prefix: Removes specified prefix in --comparand
strip_suffix: Remove specified suffix in --comparand
escape - escape (Rust escape_default)
replace: Replace all matches of a pattern (using --comparand) with a string (using --replacement) (Rust replace)
regex_replace: Replace all regex matches in --comparand w/ --replacement. Specify as --replacement to remove matches.
round: Round numeric values to the specified number of decimal places using Midpoint Nearest Even Rounding Strategy AKA "Bankers Rounding." Specify the number of decimal places with --formatstr (default: 3).
copy: Mark a column for copying

EMPTYREPLACE (multi-column capable) Replace empty cells with <--replacement> string. Non-empty cells are not modified. See the fill command for more complex empty field operations.

Dynfmt

Dynamically constructs a new column from other columns using the <--formatstr> template. The template can contain arbitrary characters. To insert a column value, enclose the column name in curly braces, replacing all non-alphanumeric characters with underscores.

If you need to dynamically construct a column with more complex formatting requirements and computed values, check out the py command to take advantage of Python's f-string formatting.

Examples ↩

OPERATIONS

Trim, then transform to uppercase the surname field.

qsv applydp operations trim,upper surname file.csv

Trim, then transform to uppercase the surname field and rename the column uppercase_clean_surname.

qsv applydp operations trim,upper surname -r uppercase_clean_surname file.csv

Trim, then transform to uppercase the surname field and save it to a new column named uppercase_clean_surname.

qsv applydp operations trim,upper surname -c uppercase_clean_surname file.csv

Trim, squeeze, then transform to uppercase in place ALL fields that end with "_name"

qsv applydp operations trim,squeeze,upper '/_name$/' file.csv

Trim, then transform to uppercase the firstname and surname fields and rename the columns ufirstname and usurname.

qsv applydp operations trim,upper firstname,surname -r ufirstname,usurname file.csv

Trim parentheses & brackets from the description field.

qsv applydp operations mtrim description --comparand '()<>' file.csv

Replace ' and ' with ' & ' in the description field.

qsv applydp operations replace description --comparand ' and ' --replacement ' & ' file.csv

You can also use this subcommand command to make a copy of a column:

qsv applydp operations copy col_to_copy -c col_copy file.csv

EMPTYREPLACE

Replace empty cells in file.csv Measurement column with 'None'.

qsv applydp emptyreplace Measurement --replacement None file.csv

Replace empty cells in file.csv Measurement column with 'Unknown Measurement'.

qsv applydp emptyreplace Measurement --replacement 'Unknown Measurement' file.csv

Replace empty cells in file.csv M1,M2 and M3 columns with 'None'.

qsv applydp emptyreplace M1,M2,M3 --replacement None file.csv

Replace all empty cells in file.csv for columns that start with 'Measurement' with 'None'.

qsv applydp emptyreplace '/^Measurement/' --replacement None file.csv

Replace all empty cells in file.csv for columns that start with 'observation' case insensitive with 'None'.

qsv applydp emptyreplace --replacement None '/(?i)^observation/' file.csv

DYNFMT

Create a new column 'mailing address' from 'house number', 'street', 'city' and 'zip-code' columns:

qsv applydp dynfmt --formatstr '{house_number} {street}, {city} {zip_code} USA' -c 'mailing address' file.csv

Create a new column 'FullName' from 'FirstName', 'MI', and 'LastName' columns:

qsv applydp dynfmt --formatstr 'Sir/Madam {FirstName} {MI}. {LastName}' -c FullName file.csv

For more examples, see tests.

Usage ↩

qsv applydp operations <operations> [options] <column> [<input>]
qsv applydp emptyreplace --replacement=<string> [options] <column> [<input>]
qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --help

Arguments ↩

Argument	Description
`<column>`	The column/s to apply the transformation to. Note that the argument supports multiple columns for the operations & emptyreplace subcommands. See 'qsv select --help' for the format details.
`<operations>`	The operation/s to apply.
`<column>`	The column/s to apply the operations to.
`<column>`	The column/s to check for emptiness.
`<input>`	The input file to read from. If not specified, reads from stdin.

Applydp Options ↩

Option	Type	Description
`‑c,` `‑‑new‑column`	string	Put the transformed values in a new column instead.
`‑r,` `‑‑rename`	string	New name for the transformed column.
`‑C,` `‑‑comparand=<string>`	string	The string to compare against for replace, strip, match-trim (mtrim/mltrim/mrtrim) & regex_replace operations.
`‑R,` `‑‑replacement=<string>`	string	The string to use for the replace & emptyreplace operations.
`‑f,` `‑‑formatstr=<string>`	string	This option is used by several subcommands:

Operations Options ↩

Option	Type	Description	Default
`‑j,` `‑‑jobs`	string	The number of jobs to run in parallel. When not set, the number of jobs is set to the number of CPUs detected.
`‑b,` `‑‑batch`	string	The number of rows per batch to load into memory, before running in parallel. Set to 0 to load all rows in one batch.	`50000`

Common Options ↩

Option	Type	Description
`‑h,` `‑‑help`	flag	Display this message
`‑o,` `‑‑output`	string	Write output to instead of stdout.
`‑n,` `‑‑no‑headers`	flag	When set, the first row will not be interpreted as headers.
`‑d,` `‑‑delimiter`	string	The field delimiter for reading CSV data. Must be a single character. (default: ,)

Source: src/cmd/applydp.rs | Table of Contents | README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

applydp

Description ↩

Dynfmt

Examples ↩

OPERATIONS

EMPTYREPLACE

DYNFMT

Usage ↩

Arguments ↩

Applydp Options ↩

Operations Options ↩

Common Options ↩

FilesExpand file tree

applydp.md

Latest commit

History

applydp.md

File metadata and controls

applydp

Description ↩

Dynfmt

Examples ↩

OPERATIONS

EMPTYREPLACE

DYNFMT

Usage ↩

Arguments ↩

Applydp Options ↩

Operations Options ↩

Common Options ↩