Skip to content

Latest commit

 

History

History
242 lines (170 loc) · 9.68 KB

File metadata and controls

242 lines (170 loc) · 9.68 KB

applydp

applydp is a slimmed-down version of apply with only Datapusher+ relevant subcommands/operations (qsvdp binary variant only).

Table of Contents | Source: src/cmd/applydp.rs | 📇🚀🔣👆 CKAN

Description | Examples | Usage | Arguments | Applydp Options | Operations Options | Common Options

Description

applydp is a slimmed-down version of apply specifically created for Datapusher+. It "applies" a series of transformation functions to given CSV column/s. This can be used to perform typical data-wrangling tasks and/or to harmonize some values, etc.

It has three subcommands:

  1. operations* - 18 string, format & regex operators.
  2. emptyreplace* - replace empty cells with <--replacement> string.
  3. dynfmt - Dynamically constructs a new column from other columns using the <--formatstr> template.
  • subcommand is multi-column capable.

OPERATIONS (multi-column capable) Multiple operations can be applied, with the comma-delimited operation series applied in order:

trim => Trim the cell trim,upper => Trim the cell, then transform to uppercase

Operations support multi-column transformations. Just make sure the number of transformed columns with the --rename option is the same. e.g.:

$ qsv applydp operations trim,upper col1,col2,col3 -r newcol1,newcol2,newcol3 file.csv

It has 18 supported operations:

  • len: Return string length
  • lower: Transform to lowercase
  • upper: Transform to uppercase
  • squeeze: Compress consecutive whitespaces
  • squeeze0: Remove whitespace
  • trim: Trim (drop whitespace left & right of the string)
  • ltrim: Left trim whitespace
  • rtrim: Right trim whitespace
  • mtrim: Trims --comparand matches left & right of the string (Rust trim_matches)
  • mltrim: Left trim --comparand matches (Rust trim_start_matches)
  • mrtrim: Right trim --comparand matches (Rust trim_end_matches)
  • strip_prefix: Removes specified prefix in --comparand
  • strip_suffix: Remove specified suffix in --comparand
  • escape - escape (Rust escape_default)
  • replace: Replace all matches of a pattern (using --comparand) with a string (using --replacement) (Rust replace)
  • regex_replace: Replace all regex matches in --comparand w/ --replacement. Specify as --replacement to remove matches.
  • round: Round numeric values to the specified number of decimal places using Midpoint Nearest Even Rounding Strategy AKA "Bankers Rounding." Specify the number of decimal places with --formatstr (default: 3).
  • copy: Mark a column for copying

EMPTYREPLACE (multi-column capable) Replace empty cells with <--replacement> string. Non-empty cells are not modified. See the fill command for more complex empty field operations.

Dynfmt

Dynamically constructs a new column from other columns using the <--formatstr> template. The template can contain arbitrary characters. To insert a column value, enclose the column name in curly braces, replacing all non-alphanumeric characters with underscores.

If you need to dynamically construct a column with more complex formatting requirements and computed values, check out the py command to take advantage of Python's f-string formatting.

Examples

OPERATIONS

Trim, then transform to uppercase the surname field.

qsv applydp operations trim,upper surname file.csv

Trim, then transform to uppercase the surname field and rename the column uppercase_clean_surname.

qsv applydp operations trim,upper surname -r uppercase_clean_surname file.csv

Trim, then transform to uppercase the surname field and save it to a new column named uppercase_clean_surname.

qsv applydp operations trim,upper surname -c uppercase_clean_surname file.csv

Trim, squeeze, then transform to uppercase in place ALL fields that end with "_name"

qsv applydp operations trim,squeeze,upper '/_name$/' file.csv

Trim, then transform to uppercase the firstname and surname fields and rename the columns ufirstname and usurname.

qsv applydp operations trim,upper firstname,surname -r ufirstname,usurname file.csv

Trim parentheses & brackets from the description field.

qsv applydp operations mtrim description --comparand '()<>' file.csv

Replace ' and ' with ' & ' in the description field.

qsv applydp operations replace description --comparand ' and ' --replacement ' & ' file.csv

You can also use this subcommand command to make a copy of a column:

qsv applydp operations copy col_to_copy -c col_copy file.csv

EMPTYREPLACE

Replace empty cells in file.csv Measurement column with 'None'.

qsv applydp emptyreplace Measurement --replacement None file.csv

Replace empty cells in file.csv Measurement column with 'Unknown Measurement'.

qsv applydp emptyreplace Measurement --replacement 'Unknown Measurement' file.csv

Replace empty cells in file.csv M1,M2 and M3 columns with 'None'.

qsv applydp emptyreplace M1,M2,M3 --replacement None file.csv

Replace all empty cells in file.csv for columns that start with 'Measurement' with 'None'.

qsv applydp emptyreplace '/^Measurement/' --replacement None file.csv

Replace all empty cells in file.csv for columns that start with 'observation' case insensitive with 'None'.

qsv applydp emptyreplace --replacement None '/(?i)^observation/' file.csv

DYNFMT

Create a new column 'mailing address' from 'house number', 'street', 'city' and 'zip-code' columns:

qsv applydp dynfmt --formatstr '{house_number} {street}, {city} {zip_code} USA' -c 'mailing address' file.csv

Create a new column 'FullName' from 'FirstName', 'MI', and 'LastName' columns:

qsv applydp dynfmt --formatstr 'Sir/Madam {FirstName} {MI}. {LastName}' -c FullName file.csv

For more examples, see tests.

Usage

qsv applydp operations <operations> [options] <column> [<input>]
qsv applydp emptyreplace --replacement=<string> [options] <column> [<input>]
qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --help

Arguments

   Argument    Description
 <column>  The column/s to apply the transformation to. Note that the argument supports multiple columns for the operations & emptyreplace subcommands. See 'qsv select --help' for the format details.
 <operations>  The operation/s to apply.
 <column>  The column/s to apply the operations to.
 <column>  The column/s to check for emptiness.
 <input>  The input file to read from. If not specified, reads from stdin.

Applydp Options

         Option          Type Description Default
 ‑c,
‑‑new‑column 
string Put the transformed values in a new column instead.
 ‑r,
‑‑rename 
string New name for the transformed column.
 ‑C,
‑‑comparand=<string> 
string The string to compare against for replace, strip, match-trim (mtrim/mltrim/mrtrim) & regex_replace operations.
 ‑R,
‑‑replacement=<string> 
string The string to use for the replace & emptyreplace operations.
 ‑f,
‑‑formatstr=<string> 
string This option is used by several subcommands:

Operations Options

     Option      Type Description Default
 ‑j,
‑‑jobs 
string The number of jobs to run in parallel. When not set, the number of jobs is set to the number of CPUs detected.
 ‑b,
‑‑batch 
string The number of rows per batch to load into memory, before running in parallel. Set to 0 to load all rows in one batch. 50000

Common Options

     Option      Type Description Default
 ‑h,
‑‑help 
flag Display this message
 ‑o,
‑‑output 
string Write output to instead of stdout.
 ‑n,
‑‑no‑headers 
flag When set, the first row will not be interpreted as headers.
 ‑d,
‑‑delimiter 
string The field delimiter for reading CSV data. Must be a single character. (default: ,)

Source: src/cmd/applydp.rs | Table of Contents | README