Conversation
lauryntalbot
commented
Mar 5, 2026
- make_gridfiles in ncprocess: QC grids using nanmax rather than mean
- gappy_vertical_fill in utils: doesn't fill gaps >50m
Updated gappy_fill_vertical to fill small NaN runs based on max_gap parameter.
pyglider/utils.py
Outdated
| Applied column-wise. | ||
|
|
||
| data = gappy_fill_vertical(data) | ||
| Fill ONLY small NaN runs (<= max_gap bins) inside each vertical column. |
There was a problem hiding this comment.
I am pretty sure xarray has a gappy interp. Should we wswitch to that rather than this bespoke version?
There was a problem hiding this comment.
Yeah , interpolate_na has a max_gap or "limit" parameters. Or does this do something different?
There was a problem hiding this comment.
replace: dsout[k].values = utils.gappy_fill_vertical(dsout[k].values)
with :
dsout[k] = dsout[k].interpolate_na( dim="depth",method="linear",max_gap=50)
Then we don't need that other function
pyglider/ncprocess.py
Outdated
| ds[k].attrs['processing'] += ( | ||
| ' Using geometric mean implementation ' 'scipy.stats.gmean' | ||
| ) | ||
| elif 'QC_protocol' in ds[k].attrs: |
There was a problem hiding this comment.
OK, so does this work? And how do you specify nanmax in the ds['k'].attrs['average_method']?
There was a problem hiding this comment.
It worked. I added it to the attributes when I defined _QC:
ts['conductivity_QC'] = xr.where(np.isfinite(ts.conductivityClean), 1, 4)
ts['conductivity_QC'].attrs['method'] = 'QC_protocol'
There was a problem hiding this comment.
I think it should be
ts['conductivity_QC'].attrs['average_method'] = 'QC_protocol'
But I think this test tests if 'QC_protocol' is a key to attrs (eg if attrs['QC_protocol']) not if QC_protocol is the value of one of the elements.
There was a problem hiding this comment.
You are right. I am changing this to: elif 'QC_protocol' in ds[k].attrs.values():
There was a problem hiding this comment.
But why not check the proper key?
pyglider/ncprocess.py
Outdated
| dz=1, | ||
| starttime='1970-01-01', | ||
| ): | ||
| inname, outdir, deploymentyaml, *, fnamesuffix='', dz=1, starttime='1970-01-01', maskfunction=CPROOF_mask): |
There was a problem hiding this comment.
Shoot. I forgot to add that. It is masking over QC4 (the bad data)
def CPROOF_mask(ds):
"""Mask QC4 samples in data variables (set to NaN) so gridding ignores them.
Does NOT shorten arrays or overwrite QC variables.
"""
_log = logging.getLogger(name)
ds = ds.copy()
for k in list(ds.data_vars):
# skip QC variables themselves
if k.endswith("_QC"):
continue
qc_name = f"{k}_QC"
if qc_name not in ds:
continue
# mask data where QC == 4, preserving dims/coords
ds[k] = ds[k].where(ds[qc_name] != 4)
ds[qc_name] = ds[qc_name].where(ds[qc_name] != 4)
return ds
pyglider/ncprocess.py
Outdated
|
|
||
| ds = xr.open_dataset(inname, decode_times=True) | ||
| ds = ds.where(ds.time > np.datetime64(starttime), drop=True) | ||
| ds0 = xr.open_dataset(inname, decode_times=True) |
There was a problem hiding this comment.
You don't use ds0 after you make ds so is there a reason to not just name this ds?
There was a problem hiding this comment.
Ya, I could just mask at the very top then do everything with ds
| 'depth' and 'profile', so each variable is gridded in depth bins and by | ||
| profile number. Each profile has a time, latitude, and longitude. | ||
| The depth values are the bin centers | ||
| profile number. Each profile has a time, latitude, and longitude. |
There was a problem hiding this comment.
Please describe maskfunction and also describe the behaviour of QC_protocol fallback
There was a problem hiding this comment.
Also let's add a description of "average_method" in here in general. It should be part of this documentation.
Add CPROOF_mask function to mask QC4 samples in data variables, and update make_gridfiles to apply this mask function before gridding.
Added original function back
pyglider/ncprocess.py
Outdated
|
|
||
| # Fill only continuous variables, not QC flags | ||
| if 'QC_protocol' not in ds[k].attrs : | ||
| dsout[k] = dsout[k].interpolate_na(dim="depth", method="linear", max_gap=50) |
There was a problem hiding this comment.
Does this work OK?
Also not sure if we should hard code this versus making it a parameter. Thus ends up being 50 m, right?
There was a problem hiding this comment.
Made into parameter - Yes, I chose 50m since it is relatively small. The spikes the original code was making was interpolating over >50m
|
|
||
|
|
||
|
|
||
|
|
| data[:, j][ind[0] : ind[-1]] = np.interp(int, ind, data[ind, j]) | ||
| return data | ||
|
|
||
|
|
pyglider/ncprocess.py
Outdated
| continue | ||
| if 'average_method' in ds[k].attrs: | ||
| average_method = ds[k].attrs['average_method'] | ||
| # variables are treated as d continuous data. |
There was a problem hiding this comment.
| # variables are treated as d continuous data. | |
| # variables are treated as continuous data. |
pyglider/ncprocess.py
Outdated
| Vertical grid spacing in meters. | ||
|
|
||
| maskfunction : callable or None, optional | ||
| Function applied to the dataset before gridding. |
There was a problem hiding this comment.
| Function applied to the dataset before gridding. | |
| Function applied to the dataset before gridding, usually to choose what data will be set to NaN based on quality flags. |
Added vertical interpolation for QC and continuous variables in ncprocess.py.