RemesPath is a JSON query language inspired by JMESpath with such useful features as
- indexing in objects with both dot syntax and square bracket syntax
- boolean indexing
- vectorized arithmetic
- many built-in functions, both vectorized and not
- regular expression functions
- recursive search for keys
- SQL-like group_by capabilities (function in non-vectorized functions)
- reshaping and summarization of JSON using projections
- editing of JSON
@selects the entirety of an object or array.- Python-style slices and indices can be used to select parts of an array. For example:
@[1]selects the second element@[-4]selects the fourth-to-last element@[:3]selects the first, second and third elements@[5::2]selects every other element of the array starting with the fifth.
- You can select multiple slices and indices in the same square brackets!
@[1, 5:8, -1]selects the second, sixth, seventh, eight, and last elements.
- Dot syntax and square brackets are both valid ways to select keys of an object.
@.barand@[bar]both select the value corresponding to a single keybar.- You can select multiple keys from a single object by enclosing all of them in square brackets. You cannot follow a dot with square brackets.
- So
@[foo, bar, baz]gets the values associated with keysfoo,bar, andbaz.
- So
- Backticks (`) can be used to enquote strings. Thus
@.`bar`and@[`bar`]are equivalent to@.barand@[bar]. - The literal backtick character ` can be rendered by an escaped backtick \` inside a backtick-enclosed string.
- Any string that does not begin with an underscore or an ASCII letter and contain only underscores, ASCII letters, and digits must be enclosed in backticks.
- So
@.a12,@._,@.a_1, but@[`1_a`].
- So
- Each time you select an index or key, the next index selected from the corresponding value(s).
- Consider the array
[[1, 2], [3, 4], [5, 6]].@[:2][1]selects the second element of each of the first and second elements of that array. So@[:2][1]will return[2, 4].
- Consider the array of objects
[{"a": 1, "b": ["_"]}, {"a": 2, "b": ["?"]}].@[:].b[0]will return the first value of the array child of keybin each object, so["_", "?"].@[0][b, a]will return keys{"a": 1, "b": ["_"]}.- Note that the order of keys in the index is not preserved because objects are inherently unordered.
- Consider the array
- If every indexer in a chain of indexers returns only one index/key, the query will not return an array or object containing the result; it will only return the result.
- Consider again the array
[[1, 2], [3, 4], [5, 6]].- Query
@[0][1]returns2.
- Query
- Consider again the array of objects
[{"a": 1, "b": ["_"]}, {"a": 2, "b": ["?"]}].- Query
@[1].breturns["?"]. - Query
@[0].b[0]returns"_".
- Query
- Consider again the array
- An out-of-bounds index on an array will return an empty array; indexing an object with a key it does not have returns an empty object.
- Consider the array
[1, 2, 3]- Queries
@[4]and@[-8]will both return[] - NOTE: prior to v5.5.0,
@[-n]on an array with fewer thannelements would cause an error to be thrown instead.
- Queries
- Consider the object
{"a": 1, "b": 2}- Queries
@.zand@[x, j]will both return{}
- Queries
- Consider the array
Suppose you want to match every key in an object except c and d, or every element in an array except the 3rd. RemesPath has always offered ways to do this (often roundabout), but beginning in v5.7, this is much easier with ! (exclamation point) before any of the key-selecting or index-selecting indexers described above:
- To select every key except
candd, use the query@![c, d]- The query
@![c, d]on JSON{"a": 1, "b": 2, "c": 3, "d": 4}returns{"a": 1, "b": 2}
- The query
- To select every key that does not match the regex
[a-c], use the query@!.g`[a-c]`- The query
@!.g`[a-c]`on JSON{"a": 1, "b": 2, "c": 3, "d": 4}returns{"d": 4}
- The query
- To select every value of an array except the 3rd and the last three, use the query
@![2, -3:]- The query
@![2, -3:]on JSON[1, 2, 3, 4, 5, 6, 7, 8]returns[1, 2, 4, 5]
- The query
Negated indexing does not work with recursive key selection. For example @!..a will raise an error.
- Many operations are vectorized in RemesPath. That is, they are applied to every element in an iterable.
- Consider the array
[1, 2, 3].2 * @returns[2, 4, 6].str(@)returns["1", "2", "3"]becausestris a vectorized function for converting things to their string representations.- @ + @ / 2 returns
[1.5, 3.0, 4.5]. @ > @[1]returns[false, false, true].
- Consider the object
{"a": 1, "b": 2, "c": 3}.@ ** @returns{"a": 1.0, "b": 4.0, "c": 27.0}@ & 1returns{"a": 1, "b": 0, "c": 1}.@ > @.areturns{"a": false, "b": true, "c": true}.
- Consider the array
All binary operators in RemesPath are vectorized over iterables.
The binary operators in RemesPath are as follows:
| Symbol | Operator | Precedence | Return type |
|---|---|---|---|
& |
bitwise/logical AND |
0 | int/bool |
| |
bitwise/logical OR |
0 | int/bool |
^ |
bitwise/logical XOR |
0 | int/bool |
=~ |
string matches regex | 1 | bool |
==, !=, <, >, <=, >= |
the usual comparison operators | 1 | bool |
+ |
Addition of numbers, concatenation of strings | 2 | int/float/string |
- |
subtraction | 2 | int/float |
// |
floor division | 3 | int |
% |
modulo | 3 | int/float |
* |
multiplication | 3 | int/float/string |
/ |
division | 3 | float |
** |
exponentiation | 5 | float |
All binary operators are left-associative (evaluated left-to-right when precedence is tied), except exponentiation (**), which is right-associative.
In general, binary operators should raise an exception when two objects of unequal type are compared. The only exception is that numbers (including booleans) may be freely compared to other numbers, and ints and floats can freely interoperate.
Starting in v5.4.0, all arithmetic operations can accept a boolean as one or both of the arguments. For example, prior to 5.4.0, true * 3 - (false / 2.5) was a type error, but since then it is valid.
Beginning in v5.1.0, the * operator in supports multiplication of strings by integers (but not integers by strings). For example, ["a", "b", "c"] * [1,2,3] will return ["a", "bb", "ccc"]. Starting in 5.4.0, multiplication of a string by a boolean or a negative integer is valid.
If you find that a binary operator can operate on a number and a non-number without raising an exception, this is a bug in my implementation.
As in normal math, the unary minus operator (e.g., -5) has lower precedence than exponentiation and higher precedence than everything else.
Starting in 5.4.0, the unary + operator has the same precedence as the unary minus operator. Unary + is a no-op on floats and ints, but it converts true and false to 1 and 0 respectively.
The not operator introduced in 5.4.0 (which replaced the older function of the same name) is very similar to the Python operator of the same name, in that not x returns False if x is "truthy" (see below), and True if x is "falsy".
Similar to in JavaScript and Python, RemesPath has the concept of "truthiness" (and its opposite, "falsiness"), where in some cases a non-boolean is treated as a boolean.
The rules are as follows:
trueis "truthy",falseis "falsy".0and0.0are "falsy", and any nonzero numbers are "truthy".""(the empty string) is "falsy", and any non-empty strings are "truthy".[]and{}(empty arrays and objects) are "falsy", and non-empty arrays and objects are "truthy"nullis "falsy".- Anything that is not covered by the above cases is "falsy". In practice this should never happen.
A regular expression can be created in a RemesPath expression by prefixing a `` string with the character "g". So g`\\s+` is the regular expression "\s+", i.e., at least one whitespace character.
JsonTools uses .NET regular expressions instead of the Boost library used by the Notepad++ find/replace form.
There are numerous differences between JsonTools regular expressions and Notepad++ find/replace form regexes, but the main differences are as follows:
- Prior to v7.0,
^and$would only match at the beginning and end of the string, respectively (except as noted below fors_subands_fa)- Note that
\Acan still be used to match the start of the string, and\zcan be used to match the end of the string.
- Note that
- Even after v7.0,
^and$only treat\nas as the end of the line. That means that\ris not matched at all, and\r\nis matched, but regexes must use\r?$instead of$to handle\r\ncorrectly. - Matching is case-sensitive by default, whereas Notepad++ is case-insensitive by default. The
(?i)flag can be added at the beginning of any regex to make it case-insensitive.
A JSON literal can be created inside a RemesPath expression by prefixing a `` string with the character "j". So j`[1, 2, 3]` creates the JSON array [1, 2, 3].
You can select all the keys of an object that match a regular expression by using a regular expression with the dot or square bracket syntax.
Examples:
- Consider the object
{"foo": 1, "bar": 2, "baz": 3}. @.g`^b`returns{"bar": 2, "baz": 3}.@[g`r$`, foo]returns{"bar": 2, "foo": 1}.
You can select all elements in an iterable that satisfy a condition by applying a boolean index.
A boolean index can be one of the following:
- A single boolean. If it's
false, an empty array is returned (prior to v5.5.0, it's always an array, even for one-element boolean indices on objects). If it's true, the whole iterable is returned.- Consider the array
[1, 2, 3] - e.g.,
@[in(2, @)]will return[1, 2, 3]becausein(2, @)istrue. @[in(4, @)]will return[]becausein(4, @)isfalse.- Starting in 5.5.0, boolean indices with a single boolean can be applied to non-iterables (e.g.
@[@ > 2]returns[]for1and3for3)- With the input
[1, 2, 3],@[:][@ < 3]returns[1, 2]in v5.5.0+, but prior to that it would just raise an error.
- With the input
- Starting in 5.5.0, a one-boolean boolean index on an object returns the original object, allowing the user to query the result of the boolean index.
- With the input
[{"a": 1, "b": "a"}, {"a": 2, "b": "b"}],@[@.a < 2].breturns["a"]in v5.5.0+, but prior to that is would just raise an error.
- With the input
- Consider the array
- If the iterable is an array, an array of booleans of the same length as the iterable. An array with all the values for which the boolean index was
truewill be returned.- Consider the array
[1, 2, 3] @[@ > @[0]]will return[2, 3].@[@ ** 2 < 1]will return[].@[@[:2] > 0]will throw a VectorizedArithmeticException, because the boolean index has length 2 and the array has length 3.
- Consider the array
- If the iterable is an object, an object of booleans with exactly the same keys as the iterable. An object will be returned with all the pairs k: v for which the boolean index's value corresponding to k was
true.- Consider the object
{"a": 1, "b": 2, "c": 3} @[@ > @.a]returns{"b": 2, "c": 3}.@[@[a,b] > 1]will throw a VectorizedArithmeticException, because the boolean index is{"a": false, "b": true}, which does not have exactly the same keys as the object.
- Consider the object
Grouping parentheses work exactly the way you expect them to with arithmetic expressions.
2 ** 3 / (4 - 5)evaluates to8/-1and thus returns-8.0.
Grouping parentheses can also be used to make the query parser treat a single expression as atomic.
- Consider the object
[{"a": [1, 2, 3]}]. - The query
@[:].a[@ > @[0]]returns[[2, 3]]. In pseudo-code, this would be:
make an array arr
for each object obj in this
make a subarray subarr
for each element in obj[a]
if element > obj[a][0], add element to subarr
add subarr to arr
return arr
- However, we can't select the first element of each array by just making the query
@[:].a[@ > @[0]][0]. This will throw an error. - That's because the query has already descended to the level of individual elements, and we can't index on the individual elements.
- Instead, we enclose the original query in grouping parentheses:
(@[:].a[@ > @[0]]). - Now we can select the first element of each array as follows:
(@[:].a[@ > @[0]])[:][0].
- Suppose you have really deep JSON, but all you really want is a certain key in an object.
- For example, consider the JSON
[[[{"a": 1, "b": 2}], [{"a": 3, "b": 4}]]]. - You can recursively search for the key "a" in this JSON with double-dot syntax
@..a. This will return[1, 3]. - You can also recursively search for the keys "b" and "a" with the query
@..[b, a]. This will return[2, 1, 4, 3].
Added in v3.7.0
@..* will return a single array containing all the scalar descendants of the current JSON, no matter their depth.
It will not return indices or parents, only the child nodes
For example, the @..* query on JSON
{"a": [true, 2, [3]], "b": {"c": ["d", "e"], "f": null}}will return
[true, 2, 3, "d", "e", null]RemesPath supports a variety of functions, some of which are vectorized and some of which are not.
We'll present the non-vectorized functions separately from the vectorized ones to avoid confusion.
Each subset will be organized in alphabetical order.
add_items(obj: object, k1: string, v1: anything, ...: string, anything (alternating)) -> object
Takes 3+ arguments. As shown, every even-numbered argument must be a string (new keys).
Returns a new object with the key-value pair(s) k_i, v_i added.
Does not mutate the original object.
EXAMPLES
- add_items({}, "a", 1, "b", 2, "c", 3, "d", 4) -> {"a": 1, "b": 2, "c": 3, "d": 4}
all(x: array[bool]) -> bool
Returns true if all of the values in x (which must contain all booleans) are true, else false.
and(x: anything, y: anything, ...: anything) -> bool
Returns true if and only if all of the arguments are "truthy".
Unlike the & binary operator above, this function uses conditional execution.
This means that for example, if the input is "abc", and(is_num(@), @ < 3) will return false, because @ < 3 will only be evaluated if is_num(@) evaluates to true.
any(x: array[bool]) -> bool
Returns true if any of the values in x (which must contain all booleans) are true, else false.
append(x: array, ...: anything) -> array
Takes an array and any number of things (any JSON) and returns a new array with the other things added to the end of the first array.
Does not mutate the original array.
The other things are added in the order that they were passed as arguments.
EXAMPLES
append([], 1, false, "a", [4]) -> [1, false, "a", [4]]
at(x: array | object, inds: array | int | str) -> float
If x is an array, inds must be an integer or an array of integers. If x is an object, inds must be a string or an array of strings. If inds is an array:
- returns
x[k]for key/indexkininds.
EXAMPLES
- at([1, 2, 3], 0) -> 1
- at(["foo", "bar", "baz"], [-1, 0]) -> ["baz", "foo"]
- at({"foo": 1, "bar": 2}, "bar") -> 2
- at({"foo": 1, "bar": 2}, ["bar", "foo"]) -> [2, 1]
avg(x: array) -> float
Finds the arithmetic mean of an array of numbers. mean is an alias for this function.
concat(x: array | object, ...: array | object) -> array | object
Takes 2+ arguments, either all arrays or all objects.
If all args are arrays, returns an array that contains all elements of every array passed in, in the order they were passed.
If all args are objects, returns an object that contains all key-value pairs in all the objects passed in.
If multiple objects have the same keys, objects later in the arguments take precedence.
EXAMPLES
concat([1, 2], [3, 4], [5])->[1, 2, 3, 4, 5]concat({"a": 1, "b": 2}, {"c": 3}, {"a": 4})->{"b": 2, "c": 3, "a": 4}concat([1, 2], {"a": 2})raises an exception because you can't concatenate arrays with objects.concat(1, [1, 2])raises an exception because you can't concatenate anything with non-iterables.
csv_regex(nColumns: int, delim: string=",", newline: string="\r\n", quote_char: string="\"")
Returns the regex that s_csv uses to match a single row of a CSV file (formatted according to RFC 4180) with delimiter delim, nColumns columns, quote character quote_char, and newline newline.
dict(x: array) -> object
If x is an array of 2-element subarrays where the first element in each subarray is a string, return an object where each subarray is converted to a key-value pair.
Example:
dict([["a", 1], ["b", 2]])returns{"a": 1, "b": 2}.
enumerate(x: array) -> array
For each index in the array, returns a subarray containing that index and the element at that index. Added in v5.2
Example:
enumerate(["a", "b", "c"])returns[[0, "a"], [1, "b"], [2, "c"]]
flatten(x: array, depth: int = 1]) -> array
Recursively searches in x down to a depth of depth, pulling each element of every sub-array at that depth into the final array.
It's easier to understand with some examples:
flatten([[1, 2], [3, 4]])returns[1, 2, 3, 4].flatten([1, 2, 3])returns[1, 2, 3].flatten([1, [2, [3]]])returns[1, 2, [3]].flatten([1, [2, [3, [4]]]], 3)returns[1, 2, 3, 4].
group_by(x: array, k: int | str | array) -> object
- If
kis an array (and the JsonTools version is v5.7 or greater)- Returns a new object where each value
vassociated withk[n]is mapped to all (children of itbl wherechild[k[n]] == v) recursively grouped by[k[n + 1], k[n + 2], ...].
- Returns a new object where each value
- If
xis an array of arrays:- If
kis not an int, throw an error. - Return an object where key
str(v)has an array of sub-arrayssubarrsuch thatsubarr[k] == vistrue. - Note that
subarr[k]might not be a string in these sub-arrays. However, keys in JSON objects must be strings, so the key is the string representation ofsubarr[k]rather thansubarr[k]itself. - Prior to v5.5.0, Python-style negative indices were not allowed for the
kargument.
- If
- If
xis an array of objects:- If
kis not a string, throw an error. - Return an object where key
str(v)has an array of sub-objectssubobjsuch thatsubobj[k] == vistrue. - Note that
subobj[k]might not be a string in these sub-objects.
- If
- If
xis an array of anything else, or it has a mixture of arrays an objects, throw an error.
Examples:
group_by([{"foo": 1, "bar": "a"}, {"foo": 2, "bar": "b"}, {"foo": 3, "bar": "a"}], "bar")returns
{"a": [{"foo": 1, "bar": "a"}, {"foo": 3, "bar": "a"}], "b": [{"foo": 2, "bar": "b"}]}group_by([[1, "a"], [2, "b"], [2, "c"], [3, "d"]], 0)returns
{"1": [[1, "a"]], "2": [[2, "b"], [2, "c"]], "3": [[3, "d"]]}group_by([{"a": 1, "b": "x", "c": -0.5}, {"a": 1, "b": "y", "c": 0.0}, {"a": 2, "b": "x", "c": 0.5}], ["a", "b"])returns
{"1": {"x": [{"a": 1, "b": "x", "c": -0.5}], "y": [{"a": 1, "b": "y", "c": 0.0}]}, "2": {"x": [{"a": 2, "b": "x", "c": 0.5}]}}group_by([[1, "x", -0.5], [1, "y", 0.0], [2, "x", 0.5]], [1, 0])returns
{"x": {"1": [[1, "x", -0.5]], "2": [[2, "x", 0.5]]}, "y": {"1": [[1, "y", 0.0]]}}group_by([[1, 2, 2, 0.0], [1, 2, 3, -1.0], [1, 3, 3, -2.0], [1, 3, 4, -3.0], [2, 2, 2, -4.0]], [0, 1, 2])returns
{"1": {"2": {"2": [[1, 2, 2, 0.0]], "3": [[1, 2, 3, -1.0]]}, "3": {"3": [[1, 3, 3, -2.0]], "4": [[1, 3, 4, -3.0]]}}, "2": {"2": {"2": [[2, 2, 2, -4.0]]}}}in(elt: anything, itbl: object | array) -> bool
- If
itblis an array:- if
elthas a type that is not comparable with an element ofitbl, throws an error. - returns
trueifeltis equal to any element.
- if
- If
itblis an object:- if
eltis not a string, throws an error. - If
eltis one of the keys initbl, returnstrue.
- if
index(x: array, elt: anything, reverse: bool = false) -> int
- If
reverseisfalse(default): Returns the index of the first element inxthat is equal toelt. - If
reverseistrue: Returns the index of the last element inxthat is equal toelt. - If no elements in x are equal to elt, throws an error.
items(x: object) -> array
Returns an array of 2-item subarrays (the key-value pairs of x).
Because objects are not inherently ordered, you may need to sort the key-value pairs by their key or value to get the same result every time.
iterable(x: anything) -> bool
Returns whether x is an iterable (object or array). Added in v5.2
Because this function is not vectorized, use this instead of is_expr if you want to a single bool returned for an entire iterable.
keys(x: object) -> array
Returns an array of the keys in x.
len(x: object | array) -> int
Returns the number of key-value pairs in x (if an object) or the number of elements in x (if an array).
max(x: array) -> float
Returns a floating-point number equal to the maximum value in an array.
max_by(x: array, k: int | str | function) -> anything
- If
kis a function:- Return the child
maxchildinxsuch thatk(maxchild) >= k(child2)for every other childchild2inx.
- Return the child
- If
xis an array of arrays:- If
kis not an int or ifk >= len(x) or k < -len(x), throw an error. - Return the subarray
maxarrsuch thatmaxarr[k] >= subarr[k]for all other sub-arrayssubarrinx. - NOTE: prior to v5.5.0, Python-style negative indices were not allowed at all.
- If
- If
xis an array of objects:- If
kis not a string, throw an error. - Return the subobject
maxobjsuch thatmaxobj[k] >= subobj[k]for all other sub-objectssubobjinx.
- If
Examples:
- With
[[1, 2], [2, 0], [3, -1]]as input,max_by(@, 0)returns[3, -1]because that is the subarray with the largest first element. - With
[{"a": 1, "b": 3}, {"a": 2, "b": 2}, {"a": 3, "b": 1}]as input,max_by(@, b)returns{"a": 1, "b": 3}because that is the subobject with the largest value associated with keyb. - With
["a", "bbb", "cc"]as input,max_by(@, s_len(@))returns"bbb", because that is the child with the greatest length (recall thats_lenreturns the length of a string).
min(x: array) -> float
Returns a floating-point number equal to the minimum value in an array.
min_by(x: array, k: int | str) -> array | object
See max_by, but minimizing instead of maximizing.
or(x: anything, y: anything, ...: anything) -> bool
Returns true if and only if any of the arguments are "truthy".
Unlike the | binary operator above, this function uses conditional execution.
This means that for example, if the input is 3, or(is_num(@), s_len(@) < 3) will return true, because s_len(@) < 3 will only be evaluated if is_num(@) evaluates to false.
pivot(x: array[object | array], by: str | int, val_col: str | int, ...: str | int) -> object[str, array]
There must be at least 3 arguments to this function.
The first argument should be an array whose sub-iterables have a repeating cycle of values for one column (by), and the only other column that varies within a given cycle is the values column (val_col).
The result is an object where each distinct value of the by column is mapped to an array of the corresponding values in the val_col column. Additionally, you may include any number of other columns.
Examples:
- With
[
["foo", 2, 3, true],
["bar", 3, 3, true],
["foo", 4, 4, false],
["bar", 5, 4, false]
]as input, pivot(@, 0, 1, 2, 3) (use 0 as pivot, 1 as values, 2 and 3 as other columns) returns
{
"foo": [2, 4],
"bar": [3, 5],
"2": [3, 4],
"3": [true, false]
}- With
[
{"a": "foo", "b": 2, "c": 3},
{"a": "bar", "b": 3, "c": 3},
{"a": "foo", "b": 4, "c": 4},
{"a": "bar", "b": 5, "c": 4}
]as input, pivot(@, a, b, c) returns
{
"foo": [2, 4],
"bar": [3, 5],
"c": [3, 4]
}quantile(x: array, q: float) -> float
x must contain only numbers.
q must be between 0 and 1, exclusive.
Returns the q^th quantile of x, as a floating-point number.
So quantile(x, 0.5) returns the median, quantile(x, 0.75) returns the 75th percentile, and so on.
Uses linear interpolation if the index found is not an integer.
For example, suppose that the 60th percentile is at index 6.6, and elements 6 and 7 are 8 and 10.
Then the returned value is 0.6*10 + 0.4*8, or 9.2.
rand() -> float
Random number between 0 (inclusive) and 1 (exclusive). Added in v5.2
randint(start: int, end: int=null) -> int
Added in v6.0
Returns a random integer greater than or equal to start and less than end.
If end is not specified, instead return a random integer greater than or equal to 0 and less than start.
range(start: int, end: int = null, step: int = 1) -> array[int]
Returns an array of integers.
- If
endandstepare not supplied, return all the integers from 0 to start, excluding start.- So
range(3)returns[0, 1, 2] range(-1)returns[]because -1 is less than 0.
- So
- If
stepis not supplied, return all the integers fromstarttoend, excludingend.range(3, 5)returns[3, 4].range(3, 1)returns[]because 1 is less than 3.
- If all arguments are supplied, return all the integers from
starttoend, incrementing bystepeach time.range(3, 1, -1)returns[3, 2].range(0, 6, 3)returns[0, 3].
s_cat(x: anything, ...: anything) -> string
Added in v6.1
Concatenates the string representation (or the value, for a string) of every argument. Arrays and objects are incorporated using the Python-style compact representation, with a single space after item-separating commas and key-value separating colons.
Example:
- With input
[[1, 2], 3, {"a": 4}],s_cat(@[0], foo, ` bar `, @[1] * 3, @[2])will return"[1, 2]foo bar 9{\"a\": 4}"
s_join(sep: string, x: array) -> string
Every element of x must be a string.
Returns x string-joined with sep (i.e., returns a string that begins with x[0] and has sep between x[i - 1] and x[i] for 1 <= i <= len(x))
set(x: array) -> object
Added in v6.0
Returns an object mapping each unique string representation of an element in x to null. This may be preferable to unique because of the O(1) average-case lookup performance in an object.
Example: set(j`["a", "b", "a", 1, 2.0, null, 1, null]`) returns {"a": null, "b": null, "1": null, "2.0": null, "null": null}
One issue with this function that may make the unique function preferable: two different elements may have the same string representation for the purposes of this function (e.g., null and "null", 2.0 and "2.0")
sort_by(x: array, k: string | int | function, descending: bool = false)
x must be:
- an array of arrays (if
kis an integer) - an array of objects (if
kis a string) - any array (if
kis a function)
Returns:
- a new array of subarrays/subobjects
subitblsuch thatsubitbl[k]is sorted (ifkis an integer or string) - a new array of children
childsuch thatk(child)is sorted (ifkis a function)
Analogous to SQL ORDER BY.
By default, these sub-iterables are sorted ascending. If descending is true, they will instead be sorted descending.
Prior to v5.5.0, Python-style negative indices were not allowed for the k argument.
Examples:
- With
[[1, 2], [2, 0], [3, -1]]as input,sort_by(@, 1)returns[[3,-1],[2,0],[1,2]]because it sorts ascending by the second element. - With
[{"a": 1, "b": 3}, {"a": 2, "b": 2}, {"a": 3, "b": 1}]as input,sort_by(@, a, true)returns[{"a":3,"b":1},{"a":2,"b":2},{"a":1,"b":3}]because it sorts descending by keya. - With
["a", "bbb", "cc"]as input,sort_by(@, s_len(@))returns["a", "cc", "bbb"], because the children are sorted ascending by string length.
sorted(x: array, descending: bool = false)
x must be an array of all strings or all numbers. Either is fine so long as all elements are comparable.
Returns a new array where the elements are sorted ascending. If descending is true, they're instead sorted descending.
See the general notes on string sorting for notes on how strings are sorted.
sum(x: array) -> float
Returns the sum of the elements in x.
x must contain only numbers. Booleans are fine.
stringify(elt: anything, print_style: string=m, sort_keys: bool=true, indent: int | str=4) -> str
Returns the string representation (compressed, minimal whitespace, sort keys) of x.
When called with one argument, stringify differs from str in two regards:
stringifyis not vectorized.- If
xis a string,strreturns a copy ofx, butstringifyreturns the string representation ofx.- For example,
str(abc)returns"abc", butstringify(abc)returns"\"abc\"".
- For example,
Added in v5.5.0.
The optional arguments did not exist before v7.0. Since that version, they work as follows:
If the third argument (sort_keys, default true) is false, object keys are not sorted.
If the fourth argument (indent, default 4) is an integer, the indent for pretty-print options is that integer. If it is `\t` (the tab character), tabs are used for indentation.
- if
print_style(the second argument) ism(the default), return the minimal-whitespace compact representation. - if
print_styleisc, return the Python-style compact representation (one space after ',' or ':') - if
print_styleisg, return the Google-style pretty-printed representation - if
print_styleisw, return the Whitesmith-style pretty-printed representation - if
print_styleisp, return the PPrint-style pretty-printed representation
to_csv(x: array, delimiter: string=",", newline: string="\r\n", quote_char: string="\"") -> string
Added in v6.0
Returns x formatted as a CSV (RFC 4180 rules as normal), according to the following rules:
- if x is an array of non-iterables, each child is converted to a string on a separate line
- if x is an array of arrays, each subarray is converted to a row
- if x is an array of objects, the keys of the first subobject are converted to a header row, and the values of every subobject become their own row.
See json-to-csv.md for information on how JSON values are represented in CSVs.
to_records(x: iterable, [strategy: str]) -> array[object]
Converts some iterable to an array of objects, using one of the strategies used to make a CSV in the JSON-to-CSV form. The resulting JSON is just the JSON equivalent of the CSV that would be generated with x as input and the same strategy (each object has the same column types and same column names as the corresponding row of the CSV).
The strategy argument must be one of the following strings:
- 'd': default
- 'r': full recursive
- 'n': no recursion
- 's': stringify iterables
type(x: anything) -> str
Returns the JSON Schema type name for x. Added in v5.5.0.
unique(x: array, sorted: bool = false)
Returns an array of all the unique elements in x.
If sorted is true, sorts the array ascending. This will raise an error if not all of x's elements are comparable.
value_counts(x: array, sort_by_count: bool = false) -> array
Returns an array of two-element subarrays [k: anything, count: int] where count is the number of elements in x equal to k.
The order of the sub-arrays is unreliable.
As of 5.3.0, there is an second optional argument (default false). If true, the subarrays are sorted by count descending.
Example:
value_counts(["a", "b", "c", "c", "c", "b"], true)returns[["c", 3], ["b", 2], ["a", 1]]
zip(x1: array, ...: array) -> array
There must be at least two arguments to this function, all arrays.
Returns a new array in which each i^th element is an array containing the i^th elements of each argument, in the order in which they were passed.
All the argument arrays must have the same length.
In other words, it's like the Python zip function, except it returns an array, not a lazy iterator.
Example:
zip(["a", "b", "c"], [1, 2, 3])returns[["a", 1], ["b", 2], ["c", 3]].
All of these functions are vectorized across their first argument, meaning that when one of these functions is called on an array or object, any functions in the second and subsequent arguments reference the entire array/object, but the first argument is set to one element at a time.
For example, consider the vectorized function s_mul(s: string, n: int) -> string. This function concatenates n instances of string s.
- With array input
["a", "cd", "b"],s_mul(@, len(@))returns["aaa", "cdcdcd", "bbb"]- The first argument references each element of the array separately.
- The second argument
len(@)references the entire array, and is thus3, because the array has three elements. - Because the first element of the first argument is
"a", the first element of the output iss_mul(a, 3), or"aaa" - Because the second element of the first argument is
"cd", the second element of the output iss_mul(cd, 3), or"cdcdcd"
- With object input
{"foo": "a", "bar": "cd"},s_mul(@, len(@))returns{"foo": "aa", "bar": "cdcd"}(NOTE: this example will fail on JsonTools earlier than v7.0)- The first argument references each element of the object separately.
- The second argument
len(@)references the entire object, and is thus2, because the object has two children. - Because the child of key
fooof the first argument is"a", the child of keyfooof the output iss_mul(a, 2), or"aa" - Because the child of key
barof the first argument is"cd", the child of keybarof the output iss_mul(cd, 2), or"cdcd"
All the vectorized string functions have names beginning with s_.
abs(x: number) -> number
Returns the absolute value of x.
bool(x: anything) -> bool
True if x is "truthy".
float(x: number | string) -> number
- If x is a boolean, integer, or float: Returns a 64-bit floating-point number equal to x.
- If x is a decimal string representation of a floating-point number: returns the 64-bit floating point number that is represented.
ifelse(cond: anything, if_true: anything, if_false: anything) -> anything
Returns if_true if cond is "truthy", otherwise returns if_false.
Note:
- Beginning in v7.0, this function's execution is conditional, meaning that only the chosen branch is executed.
- For example, consider the input
["foo", 1, "a", null]. - Prior to v7.0, the query
@[:]->ifelse(is_str(@), s_len(@), -1)would raise an error on that input, because it would calls_lenon non-strings (illegal arguments). - As of v7.0, the expected
[3, -1, 1, -1]would be returned, because thes_lenfunction would only be called whenis_strreturned true (i.e., on strings).
int(x: number | string) -> int
- If x is a boolean or integer: returns a 64-bit integer equal to x.
- If x is a float: returns the closest 64-bit integer to x.
- Note that this is NOT the same as the Python
intfunction, because if x is halfway between two integers, the nearest even integer is returned.
- Note that this is NOT the same as the Python
- If x is a decimal string representation of an integer: returns the integer that is represented. This means hex numbers can't be parsed by this function, and you should use
numbelow instead for that.
is_expr(x: anything) -> bool
Returns true iff x is an array or object.
is_num(x: anything) -> bool
Returns true iff x is a number.
is_str(x: anything) -> bool
Returns true iff x is a string.
isna(x: number) -> bool
Returns true iff x is the floating-point Not-A-Number (represented in some JSON by NaN).
Recall that NaN is NOT in the original JSON specification.
isnull(x: anything) -> bool
Returns true iff x is null, else false.
log(x: number, base: number = e) -> number
Returns the log base base of x. If base is not specified, returns the natural logarithm (base e) of x.
log2(x: number) -> number
Returns the log base 2 of x.
num(x: anything) -> float
Added in v6.0
As float above, but also handles hex integers preceded by 0x (and optional + or - sign).
This is the only function that is guaranteed to be able to parse anything captured by the (NUMBER) capture group in the s_fa and s_sub functions.
EXAMPLES:
- With
["+0xff" "-0xa", "10", "-5e3", 1, true, false, -3e4, "0xbC"]as input, returns[255.0, -10.0, 10.0, -5000.0, 1.0, 1.0, 0.0, -30000.0, 188.0]
not(x: bool) -> bool
Logical NOT. Replaced with a unary operator of the same name in 5.4.0.
parse(x: str) -> anything
Attempts to parse x as JSON according to the most permissive parser setttings. Added in v5.5.0.
- If
xis not a string or there is a fatal error while parsing, returns{"error": "the exception raised as a string"} - If
xis parsed successfully, returns{"result": x parsed as JSON}
EXAMPLE: Consider the input
[
"[1,2,3]",
"u"
]The query parse(@) will return
[
{"result": [1, 2, 3]},
{"error": "No valid literal possible at position 0 (char 'u')"}
]round(x: number, sigfigs: int = 0) -> float | int)
x must be an integer or a floating-point number, not a boolean.
- If sigfigs is 0: Returns the closest 64-bit integer to
x. - If sigfigs > 0: Returns the closest 64-bit floating-point number to
xrounded tosigfigsdecimal places.
s_count(x: string, sub: regex | string) -> int
Returns the number of times substring/regex sub occurs in x.
s_csv(csvText: string, nColumns: int, delimiter: string=",", newline: string="\r\n", quote: string="\"", header_handling: string="n", ...: int) -> array[(array[string | number] | object[string | number])]
Arguments:
csvText(1st arg): the text of a CSV file encoded as a JSON stringnColumns(2nd arg): the number of columnsdelimiter(3rd arg, default,): the column separatornewline(4th arg default\r\n): the newline. Must be one of (``)quote(5th arg, default"): the character used to wrap columns thatnewline,quote, ordelimiter.header_handling(6th arg, defaultn): how the header row is treated. Must be one ofn,d, orh. Each of these options will be explained in the list below.n: skip header row (this is the default). This would parse the CSV file"foo,bar\n1,2as[["1", "2"]]h: include header row. This would parse the CSV file"foo,bar\n1,2as[["foo", "bar"], ["1", "2"]]d: return an array of objects, using the header row as keys. This would parse the CSV file"foo,bar\n1,2as[{"foo": "1", "bar": "2"}]
...(7th and subsequent args): the numbers of columns to attempt to parse as numbers. Any valid number within the JSON5 specification can be parsed. You can pass a negative number here to get the nth-to-last column rather than the nth column.
Return value:
- if
nColumnsis 1, returns an array of strings - otherwise, returns an array of arrays of strings, where each sub-array is a row that has exactly
nColumnscolumns.
Notes:
- Any row that does not have exactly
nColumnscolumns will be ignored completely. - See RFC 4180 for the accepted format of CSV files. A brief synopsis is below.
- Any column that starts and ends with a quote character is assumed to be a quoted string. In a quoted string, anything is fine, but a literal quote character in a quoted column must be escaped with itself.
- For example,
"""quoted"",string,in,quoted column"is a valid column in a file with,delimiter and"quote character. - On the other hand,
" " "is not a valid column if"is the quote character because it contains an unescaped"in a quoted column. - Finally,
a,bwould be treated as two columns in a CSV file with"quote character, but"a,b"is a single column because a comma is not treated as a column separator in a quoted column.
- For example,
- Columns containing literal quote characters or the newline characters
\rand\nmust be wrapped in quotes. - When
s_csvparses a file, quoted values are parsed without the enclosing quotes and with any internal doubled quote characters replaced with a single instance of the quote character. Thus the valid value (for"quote character)"foo""bar"would be parsed as the JSON string"foo\"bar" - You can pass in
nullfor the 3rd, 4th, and 5th args. Any instance ofnullin those args will be replaced with the default value. - To improve performance, this function and
s_fause a shared cache that maps (input, function argument) pairs to the return value of the function. Up to 8 return values can be cached, only documents between 100KB and (5MB if 32bit, else 10MB) use the cache, and the cache is disabled for mutating queries (to avoid mutating values in the cache). - Prior to v6.1, this function did not work properly if the delimiter was a regex metacharacter like
|.
Example:
Suppose you have the JSON string "nums,names,cities,date,zone,subzone,contaminated\nnan,Bluds,BUS,,1,'',TRUE\n0.5,dfsd,FUDG,12/13/2020 0:00,2,c,TRUE\n,qere,GOLAR,,3,f,\n1.2,qere,'GOL''AR',,3,h,TRUE\n'',flodt,'q,tun',,4,q,FALSE\n4.6,Kjond,,,,w,''\n4.6,'Kj\nond',YUNOB,10/17/2014 0:00,5,z,FALSE"
which represents this CSV file (7 columns, comma delimiter, LF newline, ' quote character):
nums,names,cities,date,zone,subzone,contaminated
nan,Bluds,BUS,,1,'',TRUE
0.5,dfsd,FUDG,12/13/2020 0:00,2,c,TRUE
,qere,GOLAR,,3,f,
1.2,qere,'GOL''AR',,3,h,TRUE
'',flodt,'q,tun',,4,q,FALSE
4.6,Kjond,,,,w,''
4.6,'Kj
ond',YUNOB,10/17/2014 0:00,5,z,FALSE
Notice that the 8th row of this CSV file has a newline in the middle of the second column, and this is fine, because as discussed above, this column is quoted and newlines are allowed within a quoted column.
The query s_csv(@, 7, `,`, `\n`, `'`) will correctly parse this as an array of seven 7-string subarrays (omitting the header), shown below:
[
["nan", "Bluds", "BUS", "", "1", "", "TRUE"],
["0.5", "dfsd", "FUDG", "12/13/2020 0:00", "2", "c", "TRUE"],
["", "qere", "GOLAR", "", "3", "f", ""],
["1.2", "qere", "GOL'AR", "", "3", "h", "TRUE"],
["", "flodt", "q,tun", "", "4", "q", "FALSE"],
["4.6", "Kjond", "", "", "", "w", ""],
["4.6", "Kj\nond", "YUNOB", "10/17/2014 0:00", "5", "z", "FALSE"]
]The query s_csv(@, 7, `,`, `\n`, `'`, h, 0, -3) will correctly parse this as an array of eight 7-item subarrays (including the heaader) with the 1st and 3rd-to-last (i.e. 5th) columns parsed as numbers where possible, shown below:
[
["nums", "names", "cities", "date", "zone", "subzone", "contaminated"],
["nan", "Bluds", "BUS", "", 1, "", "TRUE"],
[0.5, "dfsd", "FUDG", "12/13/2020 0:00", 2, "c", "TRUE"],
["", "qere", "GOLAR", "", 3, "f", ""],
[1.2, "qere", "GOL'AR", "", 3, "h", "TRUE"],
["", "flodt", "q,tun", "", 4, "q", "FALSE"],
[4.6, "Kjond", "", "", "", "w", ""],
[4.6, "Kj\nond", "YUNOB", "10/17/2014 0:00", 5, "z", "FALSE"]
]s_fa(x: string, pat: regex | string, includeFullMatchAsFirstItem: bool = false, ...: int) -> array[string | number] | array[array[string | number]]
Added in v6.0.
- If the third argument,
includeFullMatchAsFirstItem, is set tofalse(the default):- If
patis a regex with no capture groups or one capture group, returns an array of the substrings ofxthat matchpat. - If
pathas multiple capture groups, returns an array of subarrays of substrings, where each subarray has a number of elements equal to the number of capture groups.
- If
- otherwise:
- If
patis a regex with no capture groups, returns an array of the substrings ofxthat matchpat. - If
pathas at least one capture group, returns an array of subarrays of substrings, where each subarray has a number of elements equal to the number of capture groups + 1, and the first element of each subarray is the entire text of the match (including the uncaptured text).
- If
The fourth argument and any subsequent argument must all be the number of a capture group to attempt to parse as a number (0 matches the match value if there were no capture groups). Any valid number within the JSON5 specification can be parsed. If a capture group cannot be parsed as a number, the capture group is returned. As with s_csv above, you can use a negative number to parse the nth-to-last column as a number instead of the nth column as a numer.
SPECIAL NOTES FOR s_fa:
s_fatreats^as the beginning of a line and$as the end of a line, but elsewhere in JsonTools (prior to v7.0)^matches only the beginning of the string and$matches only the end of the string.- Every instance of
(INT)inpatwill be replaced by a regex that captures a decimal number or (a hex integer preceded by0x), optionally preceded by a+or-. A noncapturing regex that matches the same thing is available through(?:INT). - Every instance of
(NUMBER)inpatwill be replaced by a regex that captures a decimal floating point number or (a hex integer preceded by0x). A noncapturing regex that matches the same thing is available through(?:NUMBER). Neither(NUMBER)nor(?:NUMBER)matchesNaNorInfinity, but those can be parsed if desired. s_famay be very slow ifpatis a function of input, because the above described regex transformations need to be applied every time the function is called instead of just once at compile time.
Examples:
s_fa(`1 -1 +2 -0xF +0x1a 0x2B`, `(INT)`)will return["1", "-1", "+2", "-0xF", "+0x1a", "0x2B"]s_fa(`1 -1 +2 -0xF +0x1a 0x2B 0x10000000000000000`, `(?:INT)`,false, 0)will return[1, -1, 2, -15, 26, 43, "0x10000000000000000"]because passing0as the fourth arg caused all the match results to be parsed as integers, except0x10000000000000000, which stayed as a string because its numeric value was too big for the 64-bit integers used in JsonTools.s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (NUMBER) (INT)\r?$`,false, 1)will return[["a",1.5,"1"],["b",-30000.0,"2"],["c",-0.2,"6"]]. Note that the second column but not the third will be parsed as a number, because only1was passed in as the number of a capture group to parse as a number.s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (NUMBER) (INT)\r?$`,false, -2, 2)will return[["a",1.5,1],["b",-30000.0,2],["c",-0.2,6]]. This time the same input is parsed with numbers in the second-to-last and third columns because-2and2were passed as optional args.s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (?:NUMBER) (INT)\r?$`,false, 1)will return[["a",1],["b",2],["c",6]]. This time the same input is parsed with only two columns, because we used a noncapturing version of the number-matching regex.-
s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](INT)`, true, 1)will return[["a1",1],["b+2",2],["c-0xF",-15],["d+0x1a",26]]because the third argument istrueand there is one capture group, meaning that the matches will be represented as two-element subarrays, with the first element being the full text of the match, and the second element being the captured integer parsed as a number.
-
s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](?:INT)`, true)will return["a1","b+2","c-0xF","d+0x1a"]because the third argument istruebut there are no capture groups, so an array of strings is returned instead of 1-element subarrays.
s_find(x: string, sub: regex | string) -> array[string]
Returns an array of all the substrings in x that match sub.
As of v6.0, this function is DEPRECATED in favor of s_fa. However, it can still be useful if you always want the result to be a single string rather than an array of capture groups.
s_format(s: str, print_style: string=m, sort_keys: bool=true, indent: int | str=4, remember_comments: bool=false) -> str
If s is not valid JSON (according to the most permissive parsing rules, same as used by the parse() function),
return a copy of s.
Otherwise, let elt be the JSON returned by parsing s.
If the third argument (sort_keys, default true) is false, object keys are not sorted.
If the fourth argument (indent, default 4) is an integer, the indent for pretty-print options is that integer. If it is `\t` (the tab character), tabs are used for indentation.
If not remember_comments (the fifth argument), return elt formatted as follows:
- if
print_style(the second argument) ism(the default), return the minimal-whitespace compact representation. - if
print_styleisc, return the Python-style compact representation (one space after ',' or ':') - if
print_styleisg, return the Google-style pretty-printed representation - if
print_styleisw, return the Whitesmith-style pretty-printed representation - if
print_styleisp, return the PPrint-style pretty-printed representation
If remember_comments, any comments in s will be remembered as described in the remember_comments setting, and return elt formatted as follows:
- if
print_styleismorc(the default), compressed. - if
print_styleisgorw, pretty-printed Google-style. - if
print_styleisp, pretty-printed PPrint-style.
s_len(x: string) -> int
The length of string x, when encoded in UTF-16. In brief, this means that most characters count for 1, but some characters like 😀 count for 2 or more.
Note that the character count in the Notepad++ status bar indicates the number of bytes in the UTF-8 representation of text, and this will be greater than the value returned by s_len for any text that contains non-ASCII characters.
s_lower(x: string) -> string
The lower-case form of x.
s_lines(x: string) -> array[string]
Added in v6.1
Returns an array of all the lines (including an empty string at the end if there's a trailing newline) in x.
This function treats \r, \n, and \r\n all as valid newlines. Use s_split below if you want to only accept one or two of those.
s_lpad(x: string, padWith: string, padToLen: int) -> string
Added in v6.1
return a string that contains s padded on the left with enough repetitions of padWith
to make a composite string with length at least padToLen
EXAMPLES:
s_lpad(foo, e, 5)returns"eefoo"s_lpad(ab, `01`, 5)returns"0101ab"s_lpad(abc, `01`, 5)returns"01abc"
s_mul(x: string, reps: int) -> string
A string containing x repeated reps times. E.g., s_mul(`abc`, 3) returns "abcabcabc".
Basically x * reps in Python, except that the binary operator * doesn't have that capability in RemesPath.
Note that as of v5.1, this function is unnecessary because x * reps will return the same thing as s_mul(x, reps).
s_rpad(x: string, padWith: string, padToLen: int) -> string
Added in v6.1
return a string that contains s padded on the right with enough repetitions of padWith
to make a composite string with length at least padToLen
EXAMPLES:
s_rpad(foo, e, 5)returns"fooee"s_rpad(ab, `01`, 5)returns"ab0101"s_rpad(abc, `01`, 5)returns"abc01"
s_slice(x: string, sli: slice | int) -> string
sli can be an integer or slice with the same Python slice syntax used to index arrays (see above).
Returns the appropriate slice/index of x.
Prior to v5.5.0, Python-style negative indices were not allowed for the sli argument.
s_split(x: string, sep: regex | string=g`\s+`) -> array[string]
If sep is not specified (the function is called with one argument):
- Returns
xsplit by whitespace.- E.g.,
s_split(`a b c\n d `)returns["a", "b", "c", "d", ""](the last empty string is becausexends with whitespace) - The 1-argument option was added in v6.0.
- E.g.,
If sep is a string (which is treated as a regex) or regex:
- Returns an array containing substrings of
xwhere the parts that matchsepare missing.- E.g.,
s_split(`a big bad man`, g`\\s+`)returns["a", "big", "bad", "man"].
- E.g.,
- However, if
sepcontains any capture groups, the capture groups are included in the array.s_split(`a big bad man`, g`(\\s+)`)returns["a", " " "big", " ", "bad", " ", "man"].s_split(`bob num: 111-222-3333, carol num: 123-456-7890`, g`(\\d{3})-(\\d{3}-\\d{4})`)returns["bob num: ", "111", "222-3333", ", carol num: ", "123", "456-7890", ""]
- See the docs for C# Regex.Split for more info.
s_strip(x: string) -> string
Strips the whitespace off both ends of x.
s_sub(x: string, to_replace: regex | string, replacement: string | function) -> string
Replaces all instances of string/regex to_replace in x with replacement.
- If
to_replaceis a string, replaces all instances ofto_replacewithreplacement. NOTE: This is a new behavior in JsonTools 4.10.1. Prior to that, this function treatedto_replaceas a regex no matter what. - If
to_replaceis a regex:- if
replacementis a string, replaces every instance ofto_replacewith thereplacementstring according to C# regex substitution syntax. - If
replacementis a function (which must take an array as input and return a string), replaces every instance ofto_replacewith that function called on the array of strings captured by the regex. New in v6.0.- Within the callback function, you can reference
loop(), a no-argument function that returns 1 + the number of replacements made so far.
- Within the callback function, you can reference
- if
Examples:
s_sub(abbbbbcb, g`b+`, z)returnsazcz.s_sub(abbbbbc, `b+`, z)returnsabbbbbc, becauseb+is not being matched as a regex. Prior to version 4.10.1, this would return the same thing ass_sub(abbbbbc, g`b+`, z).s_sub(abbbbbc, b, z)returnsazzzzzc, because every instance ofbis replaced byz.
Consider as input the JSON string version of the following:
1. Frank Foomeister
2. Bob Barheim
3. Bill Bazenstein
The regex-replace s_sub(@, g`^(\d+)\. (\w+)`, @[2] + str(int(@[1]) * loop())) would return
Frank1 Foomeister
Bob4 Barheim
Bill9 Bazenstein
Let's unpack how that worked:
- the regex we're searching for,
g`^(\d+)\. (\w+)`, matches an integer ((\d+), the first capture group) at the start of a line, then., then a space, then a word ((\w+), the second capture group). - every time the regex is matched, the callback function
@[2] + str(int(@[1]) * loop())is invoked on an array containing[the captured string, the first capture group, the second capture group]. - This concatenates the second capture group to the integer value of the first capture group multiplied by
loop(), which is1 + the number of replacements made so far. - Thus the callback function returns
Frank+1 * 1when called on the line1. Frank Foomeisterbecause the match array is["1. Frank", "Frank", "1"]. - On the second match,
loop()returns2, so we the callback function returnsBob+2 * 2when invoked on2. Bob Barheim.
Notes on regular expressions in s_sub:
- Like the function
s_fa,s_subuses^and$to match the start and end of lines, rather than the start and end of a string. Before v7.0, elsewhere in RemesPath,^and$would match only at the start and end of a string. (INT)and(NUMBER)match integers and floating point decimals, respectively, just as ins_faabove.(?:INT)and(?:NUMBER)are non-capturing versions of the same regular expressions.
s_upper(x: string) -> string
Returns the upper-case form of x.
str(x: anything) -> string
Returns the string representation of x, unless x is a string, in which case it returns a copy of x.
zfill(x: anything, padToLen: int) -> string
Added in v6.1
return a string that contains x (or the string representation of x, if not a string)
padded on the left with enough repetitions of the 0 character to make a composite string with length padToLen
EXAMPLES:
zfill(10, 5)returns"00010"zfill(ab, 4)returns"00ab"
A projection (a concept from JMESpath) is a subquery that is somehow based on the current JSON. Projections can be used to reshape JSON, capture summaries, and much more.
In RemesPath, a projection can be created by following a valid RemesPath query with:
- a comma-separated list of elements enclosed by
{}(curly braces) produces an array - a comma-separated list of key-value pairs enclosed by
{}produces an object ->followed by any valid RemesPath expression (with some restrictions; try wrapping it in parentheses if it can't be parsed) can produce any type (introduced in v5.6)
For example, suppose you have an array of arrays of numbers.
[
[1, 2, 3, 4],
[5, 6],
[7, 8, 9]
]You might be interested in getting a list of the length of the array, the length of the first element of the array, and the length of the last element.
@{
len(@),
len(@[0]),
len(@[-1])
}
returns
[3, 4, 3]Or maybe you want to know the sum and average of each subarray.
@[:]{sum(@), avg(@)} returns
[[10.0, 2.5],
[11.0, 5.5],
[24.0, 8.0]]Or maybe you prefer to get that information as an object, so that the reader can more easily figure out what each row is.
@[:]{row_sum: sum(@), row_avg: avg(@)} returns
[
{"row_avg": 2.5, "row_sum": 10.0},
{"row_avg": 5.5, "row_sum": 11.0},
{"row_avg": 8.0, "row_sum": 24.0}
]These projections can themselves be queried, allowing you to do perform some SQL-like analyses of your data.
Suppose you want to know the average and length of the two rows that have the highest average.
sort_by(
@[:]
{`len`: len(@), `avg`: avg(@)},
`avg`,
true
)[:2]
returns
[
{"avg": 8.0, "len": 3},
{"avg": 5.5, "len": 2}
]Note that in this example, we're using quotes around the key names avg and len to indicate that they're being used as strings and not function names. Otherwise the parser will get confused.
Finally, we have the aforementioned -> projections introduced in v5.6.
Projections with -> are quite simple: a -> b returns b(a) if b is a function, and b otherwise.
Thus, still considering the JSON [[1,2,3,4],[5,6],[7,8,9]], the query @[:]->len(@)->(str(@)*@) returns
["4444","22","333"]Beginning in v6.1, RemesPath supports f-strings, quoted strings preceded by the f character that can contain complex expressions inside of curly braces.
These work similarly to f-strings in Python and $-strings in C#.
Because curly braces are used to wrap expressions in the f-string, you need to use }} to get a single literal } character, and {{ to get a single literal { character in an f-string.
For example, consider the input
[
{"a": "foo", "b": -5.5},
{"a": "bar", "b": 7},
{"c": ["y", -1, null]}
]Examples:
- The query
f`first a = {@[0][a]}. Is first b less than second b? {@[0].b < @[1].b}! Show me third c: {@[2].c}`will return"first a = foo. Is first b less than second b? true! Show me third c: [\"y\", -1, null]" - The query
f`sum of b's, wrapped in curlybraces = {{ {sum(@[:].b)} }}`will return"sum of b's, wrapped in curlybraces = { 1.5 }"because we needed to use double curlybraces to get literal curlybrace characters.
Notes:
- f-strings use the
s_catfunction under the hood to concatenate all the parts of the f-string together. This means that it may be possible to get an error message that references thes_catfunction in an expression that uses f-strings but does not explicitly calls_cat.
Added in version v2.0.0
A RemesPath query can contain at most one = separating two valid expressions. This is the assignment operator.
The LHS of the assignment expression is typically a query that selects items from a document (e.g., @.foo[@ > 0]).
The RHS is typically a scalar (if you want to give everything queried the same value) or a function like @ + 1.
If the RHS is a function and the LHS is an iterable, the RHS is applied separately to each element of the iterable.
- Until further notice, you cannot mutate an object or array, other than to change its scalar elements
- For example, the query
@ = len(@)on JSON[[1, 2, 3]]will fail, because this ends up trying to mutate the subarray[1, 2, 3].
- For example, the query
- You also cannot mutate a non-array or non-object into an array or object. For example, the query
@[0] = j`[1]`on the input[0]will fail because you're trying to convert a scalar (the integer0) to an array ([1]).
An assignment expression mutates the input and then returns the input.
In these examples, we'll use the input
{
"foo": [-1, 2, 3],
"bar": "abc",
"baz": "de"
}Some examples:
- The query
@.foo[@ < 0] = @ + 1will yield{"foo": [0, 2, 3], "bar": "abc", "baz": "de"} - The query
@.bar = s_slice(@, :2)will yield{"foo": [-1, 2, 3], "bar": "ab", "baz": "de"} - The query
@.g`b` = s_len(@)will yield{"foo": [-1, 2, 3], "bar": 3, "baz": 2}
Beginning in v5.7, it is possible to run queries with multiple statements. Each statement in the query must be terminated by a semicolon (;), except the final statement.
In addition, you can assign variables using the syntax var <name> = <statement>
Let's see how this works in practice with the following multi-statement query.
var a = 1;
var b = @[0];
var c = a + 2;
b = @ * c;
@[:][1]
If we run this query on the JSON
[
[-1, 1],
[1, 2]
]here's what will happen:
- The variable
awill be set to valuea = 1. - The variable
bwill be set to@[0], i.e., the first element of the input array. Sob = [-1, 1] - The variable
cwill be set to valuea + 2, which is justc = 3. - The variable
bwill be mutated by multiplying every element byc, because a statement of the form<LHS> = <RHS>is an assignment expression as described above unless it is preceded by thevarkeyword.- Since this is an assignment expression, b has been changed to
b = [-3, 3], and the input JSON has also been changed!
- Since this is an assignment expression, b has been changed to
- Finally, we run the query
@[:][1]on the input JSON. This gets the second element of every sub-array in the input JSON.- Since the previous statement
b = @ * cchanged the input to[[-3, 3], [1, 2]], the final result is:
- Since the previous statement
[3, 1]While this toy example doesn't fully showcase the utility of variable assignment, it should be obvious that this is a significant improvement in the expressive power of RemesPath.
Some notes:
- All variables are always passed by reference in RemesPath, not by value.
- For example, the statement
var x = 1; var y = x; y = @ + 1; xwill return2because the statementvar y = x;turnsyinto a reference tox, and the mutationy = @ + 1will also changex. - However, defining a variable as a non-identity function of another variable copies the first variable. Thus,
var x = 1; var y = x + 0; y = @ + 1will not affectxbecausevar y = x + 0;creates a copy that is the result of adding 0 to x.
- For example, the statement
- Variables can have the same name as functions, because an unquoted string is only interpreted as the name of a function if it is immediately followed by an open parenthesis.
- For example, the query
var ifelse = blah; var s_len = s_len(ifelse); ifelse(s_len < 3, foo, bar)is actually perfectly legal, even though it declares two variables that have the same name as functions that are also used in it. - Obviously this behavior will no longer be sustainable if it becomes possible to pass functions as arguments to other functions in RemesPath, but that may never happen.
- For example, the query
- The tree view (as well as the "query result" JSON that can be converted to a CSV or pasted in a new document)
- To redefine the value of a variable named
a, just usevar a = <whatever>to redefine it. This is acceptable because, as noted above, the statementa = <whatever>is already reserved for assignment expressions. - For example, given the input
{bar: "bar", baz: "baz"}, the query
var bar = @.bar;
var baz = @.baz;
var barbaz = bar + baz;
var baz = @{bar, baz, barbaz};
baz
will return
["bar", "baz", "barbaz"]because when baz is redefined, it just uses the value of baz that was previously defined, and no weird infinite loops of self-reference will happen.
Beginning in v6.0, you can loop over an array by assigning a variable to the array with the for keyword rather than the var keyword.
When you assign a variable x to an array with the for keyword, here is what happens:
toLoopOver = x
start of loop = statement after assignment of x
end of loop = end of query OR next instance of "end for;" statement
for each value of toLoopOver:
x = value
execute each statement between start of loop and end of loop
- If the last statement of a query is
end for;, or if aforloop is not closed, the value returned by the query (and thus the value that the tree view will be populated with) is the array that was looped over. For example, the return value of the queryfor x = j`[1, 2, 3]`; x = @ + 1; end for;is[2, 3, 4], since we added 1 to every value in the array. - As in Python, a loop variable persists after a for loop is finished. Thus
for x = j`[1, 2];` end for; xreturns2, since that was the last value in the array[1, 2]that was looped through.
Let's see an example of loop variables on this JSON:
{
"a" : [1, 2, 3],
"b": ["a", "bb", "c"]
}and this query:
var a = @.a;
var b = @.b;
var b_maxlen = ``;
for i = range(len(a));
var bval = at(b, i);
bval = @ * at(a, i);
var b_maxlen = ifelse(s_len(bval) > s_len(b_maxlen), bval, b_maxlen);
end for;
b_maxlen;
The query will return "bbbb"
and will mutate the JSON to
{
"a": [1, 2, 3],
"b": ["a", "bbbb", "ccc"]
}Here's how the query is executed:
- Set the variable
ato[3, 2, 1](which is@.a) - Set the variable
bto["a", "bb", "c"](statement@.b) - Set the variable
b_maxlento an initial value of""(statementvar b_maxlen = ``;) - For each index
iof arraya(statementfor i = range(len(a));):- First, mutate the current element of
bby multiplying it by the corresponding element ofa(statementsvar bval = at(b, i);andbval = @ * at(a, i)) - Now check if the current element of
bis longer thanb_maxlen. If it is, reassignb_maxlento the current element ofb(statementvar b_maxlen = ifelse(s_len(bval) > s_len(b_maxlen), bval, b_maxlen);)
- First, mutate the current element of
- Return the current value of
b_maxlen, which is"bbbb"because that's the longest string in the JSON after the transformation.
Beginning in v5.8, the * spread operator has been added that allows the user to pass in an array to stand for multiple arguments, which RemesPath will attempt to get from the elements of that array.
Examples:
zip(*j`[[1, 2], ["a", "b"], [true, false]]`)returns[[1, "a", true], [2, "b", false]]- Consider the input
{
"a": [[[1, 0], [0, 1]], 1],
"b": [[[1, 0], [0, 1]], 0]
}The query @.*->max_by(*@) returns {"a": [0, 1], "b": [1, 0]} because when working with key a, we max by the second element of each subarray, and when working with key b, we max by the first element.
Notes:
- Only the final argument to a function can be spread. For example,
zip(*j`[1, 2]`, j`[3, 4]`)is not a legal query because a non-final argument was spread.
Omitting optional function arguments before the final argument (added in v6.0)
Beginning in v6.0, if a function has multiple optional arguments, you can leave any number of optional arguments (including the last) empty, rather than writing null.
For example, if the function foo has two optional arguments:
foo(1, , 2)would be equivalent tofoo(1, null, 2)foo(1, 2, )would be equivalent tofoo(1, 2, null)orfoo(1, 2).
Comments (added in v7.0)
Beginning in v7.0, queries can include any number of Python-style single-line comments.
Thus the query
foo # comment1
+ #comment2
# comment3
bar #comment4
would simply be parsed as foo + bar