Skip to content

Commit ea21613

Browse files
authored
Merge pull request #56 from TaskarCenterAtUW/feature-3246
[0.3.5 BUG 3246] Fix OSW filename-based schema selection
2 parents ff9f7b0 + a91704d commit ea21613

7 files changed

Lines changed: 158 additions & 95 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Change log
22

3+
### 0.3.5 - 2026-03-16
4+
- Fixed filename-based schema selection to use exact dataset suffixes such as `.nodes.geojson`, `.edges.geojson`, and the legacy `.nodes.OSW.geojson` form instead of loose substring matching.
5+
- Prevented false schema selection for filenames with misleading prefixes such as `gs_metaline_falls_uga.nodes.geojson` and `gs_yarrow_point.edges.geojson`.
6+
- Updated extracted dataset filename validation to enforce the same suffix-based rules and reject glued names such as `roadEdges.geojson`.
7+
- Added regression coverage for suffix-based filename matching and refreshed the README to document current ZIP input, validation output, and supported filename patterns.
8+
39
### 0.3.4 - 2026-02-04
410
- Update leaf_cycle enums in 0.3 lines/points/polygons schemas and add coverage for the new allowed values.
511
- Add unit tests that validate acceptance/rejection of leaf_cycle values with the 0.3 schemas.

README.md

Lines changed: 83 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,22 @@
11
# TDEI python lib OSW validation package
22

3-
This package validates the OSW geojson file. Package requires a OSW zip file path
3+
This package validates OSW GeoJSON datasets packaged as a ZIP file.
44

55
## System requirements
66

77
| Software | Version |
88
|----------|---------|
9-
| Python | 3.10.x |
9+
| Python | >= 3.10 |
1010

1111
## What this package does?
1212

13-
- It unzip the provided zip files
14-
- Check for the required nodes and edges geojson files inside the unzipped folder
15-
- Validate each file (edges, lines, nodes, points, polygons and zones) against the matching schema (0.3 defaults live in `src/python_osw_validation/schema`)
16-
- Return true or false according to validation
17-
- you can check the error if it returned false.
13+
- Extracts the provided ZIP file
14+
- Finds supported OSW dataset files inside the extracted directory
15+
- Validates each file (`edges`, `lines`, `nodes`, `points`, `polygons`, and `zones`) against the matching schema
16+
- Runs cross-file integrity checks such as duplicate `_id` detection and edge or zone references back to nodes
17+
- Returns a `ValidationResult` object with `is_valid`, `errors`, and `issues`
18+
19+
Any subset of the six supported dataset files may be present. By default, no individual dataset file is required.
1820

1921
## Starting a new project with template
2022

@@ -28,103 +30,105 @@ This package validates the OSW geojson file. Package requires a OSW zip file pat
2830
from python_osw_validation import OSWValidation
2931

3032
validator = OSWValidation(zipfile_path='<Zip file path>')
31-
result = validator.validate()
33+
result = validator.validate()
3234
print(result.is_valid)
33-
print(result.errors) # will return first 20 errors by default if there are errors
35+
print(result.errors) # returns up to the first 20 high-level errors by default
36+
print(result.issues) # per-file or per-feature issues
3437

35-
result = validator.validate(max_errors=10)
38+
result = validator.validate(max_errors=10)
3639
print(result.is_valid)
37-
print(result.errors) # will return first 10 errors depending on the max_errors parameter
40+
print(result.errors) # returns up to the first 10 high-level errors
41+
```
42+
43+
You can also override schemas:
44+
45+
```python
46+
from python_osw_validation import OSWValidation
3847

48+
validator = OSWValidation(
49+
zipfile_path='<Zip file path>',
50+
schema_paths={
51+
'nodes': 'path/to/opensidewalks.nodes.schema-0.3.json',
52+
'edges': 'path/to/opensidewalks.edges.schema-0.3.json',
53+
},
54+
)
3955
```
4056

57+
## Supported filenames
58+
59+
The validator accepts dataset files whose names end with one of these exact suffixes:
60+
61+
- `.edges.geojson`
62+
- `.lines.geojson`
63+
- `.nodes.geojson`
64+
- `.points.geojson`
65+
- `.polygons.geojson`
66+
- `.zones.geojson`
67+
68+
It also accepts the legacy form:
69+
70+
- `.edges.OSW.geojson`
71+
- `.lines.OSW.geojson`
72+
- `.nodes.OSW.geojson`
73+
- `.points.OSW.geojson`
74+
- `.polygons.OSW.geojson`
75+
- `.zones.OSW.geojson`
76+
77+
Examples:
78+
79+
- `gs_metaline_falls_uga.nodes.geojson` is valid
80+
- `gs_yarrow_point.edges.geojson` is valid
81+
- `roadEdges.geojson` is invalid
82+
83+
If a dataset uses canonical OSW 0.3 names that start with `opensidewalks.`, then only these exact names are allowed:
84+
85+
- `opensidewalks.edges.geojson`
86+
- `opensidewalks.lines.geojson`
87+
- `opensidewalks.nodes.geojson`
88+
- `opensidewalks.points.geojson`
89+
- `opensidewalks.polygons.geojson`
90+
- `opensidewalks.zones.geojson`
91+
4192
### Testing
4293

43-
The project is configured with `python` to figure out the coverage of the unit tests. All the tests are in `tests`
44-
folder.
94+
All unit tests are under `tests/unit_tests`.
4595

46-
- To execute the tests, please follow the commands:
96+
- To execute the tests:
4797

4898
`pip install -r requirements.txt`
4999

50100
`python -m unittest discover -v tests/unit_tests`
51101

52-
- To execute the code coverage, please follow the commands:
102+
- To execute code coverage:
53103

54104
`coverage run --source=src/python_osw_validation -m unittest discover -v tests/unit_tests`
55105

56-
`coverage html` // Can be run after 1st command
57-
58-
`coverage report` // Can be run after 1st command
59-
60-
- After the commands are run, you can check the coverage report in `htmlcov/index.html`. Open the file in any browser,
61-
and it shows complete coverage details
62-
- The terminal will show the output of coverage like this
63-
64-
```shell
65-
66-
> coverage run --source=src/python_osw_validation -m unittest discover -v tests/unit_tests
67-
test_duplicate_files (test_extracted_data_validator.TestExtractedDataValidator) ... ok
68-
test_empty_directory (test_extracted_data_validator.TestExtractedDataValidator) ... ok
69-
test_invalid_directory (test_extracted_data_validator.TestExtractedDataValidator) ... ok
70-
test_missing_optional_file (test_extracted_data_validator.TestExtractedDataValidator) ... ok
71-
test_no_geojson_files (test_extracted_data_validator.TestExtractedDataValidator) ... ok
72-
test_valid_data_at_root (test_extracted_data_validator.TestExtractedDataValidator) ... ok
73-
test_valid_data_inside_folder (test_extracted_data_validator.TestExtractedDataValidator) ... ok
74-
test_edges_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
75-
test_edges_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
76-
test_edges_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
77-
test_external_extension_file_inside_zipfile (test_osw_validation.TestOSWValidation) ... ok
78-
test_external_extension_file_inside_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
79-
test_external_extension_file_inside_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
80-
test_extra_field_zipfile (test_osw_validation.TestOSWValidation) ... ok
81-
test_id_missing_zipfile (test_osw_validation.TestOSWValidation) ... ok
82-
test_invalid_geometry_zipfile (test_osw_validation.TestOSWValidation) ... ok
83-
test_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
84-
test_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
85-
test_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
86-
test_minimal_zipfile (test_osw_validation.TestOSWValidation) ... ok
87-
test_minimal_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
88-
test_minimal_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
89-
test_missing_identifier_zipfile (test_osw_validation.TestOSWValidation) ... ok
90-
test_no_entity_zipfile (test_osw_validation.TestOSWValidation) ... ok
91-
test_nodes_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
92-
test_nodes_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
93-
test_nodes_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
94-
test_points_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
95-
test_points_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
96-
test_points_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
97-
test_valid_zipfile (test_osw_validation.TestOSWValidation) ... ok
98-
test_valid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
99-
test_valid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
100-
test_wrong_datatypes_zipfile (test_osw_validation.TestOSWValidation) ... ok
101-
test_extract_invalid_zip (test_zipfile_handler.TestZipFileHandler) ... ok
102-
test_extract_valid_zip (test_zipfile_handler.TestZipFileHandler) ... ok
103-
test_remove_extracted_files (test_zipfile_handler.TestZipFileHandler) ... ok
104-
105-
----------------------------------------------------------------------
106-
Ran 37 tests in 1284.068s
107-
108-
OK
109-
```
106+
`coverage html`
110107

111-
## Use locally:
108+
`coverage report`
109+
110+
After running coverage, open `htmlcov/index.html` to inspect the report in a browser.
111+
112+
## Use locally
112113
To use the library locally, use the [example.py](./src/example.py) code
113114

114-
## Deployment:
115+
## Deployment
116+
117+
- The library can be pushed to [TestPyPI](https://test.pypi.org/project/python-osw-validation/) or [PyPI](https://pypi.org/project/python-osw-validation/)
115118

116-
- The library can be pushed to [TestPy](https://test.pypi.org/project/python-osw-validation/) or [PYPI](https://pypi.org/project/python-osw-validation/)
117-
### Deploy to TestPy
118-
- On every push to `dev` branch, a workflow is triggered which publishes the updated version to TestPy
119+
### Deploy to TestPyPI
120+
121+
- On every push to `dev` branch, a workflow is triggered which publishes the updated version to TestPyPI
119122

120123
### Deploy to PyPI
121-
- This happens whenever a tag/release is created with `*.*.*` notation (eg. 0.0.8)
122-
- To change the version, change the version at [version.py](./src/python_osw_validation/version.py)
124+
125+
- This happens whenever a tag or release is created with `*.*.*` notation, for example `0.0.8`
126+
- To change the version, update [version.py](./src/python_osw_validation/version.py)
123127
- To release a new version:
124-
- Go to Github link of this repository
128+
- Go to the GitHub repository
125129
- Under [releases](https://github.com/TaskarCenterAtUW/TDEI-python-lib-osw-validation/releases), click on `Draft a new release`
126-
- Under `choose a new tag`, add a new tag `v*.*.*` , Generate Release notes
130+
- Under `choose a new tag`, add a new tag `v*.*.*`, then generate release notes
127131
- Choose `main` branch for release
128132
- Publish the release.
129-
- This release triggers a workflow to generate the new version of the Package.
133+
- This release triggers a workflow to generate the new package version.
130134
- The new package will be available at https://pypi.org/project/python-osw-validation/

src/python_osw_validation/__init__.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -112,21 +112,21 @@ def _get_colset(self, gdf: Optional[gpd.GeoDataFrame], col: str, filekey: str) -
112112
return set()
113113

114114
def _schema_key_from_text(self, text: Optional[str]) -> Optional[str]:
115-
"""Return dataset key (edges/nodes/points/lines/polygons/zones) if mentioned in text."""
115+
"""Return dataset key from exact filename suffixes only."""
116116
if not text:
117117
return None
118-
lower = text.lower()
119-
aliases = {
120-
"edges": ("edge", "edges"),
121-
"lines": ("line", "lines", "linestring"),
122-
"nodes": ("node", "nodes"),
123-
"points": ("point", "points"),
124-
"polygons": ("polygon", "polygons", "area"),
125-
"zones": ("zone", "zones"),
126-
}
127-
for key, variants in aliases.items():
128-
if any(alias in lower for alias in variants):
118+
119+
basename = os.path.basename(text).lower()
120+
stem, _ = os.path.splitext(basename)
121+
for key in self.dataset_schema_paths:
122+
if (
123+
stem == key
124+
or stem == f"{key}.osw"
125+
or stem.endswith(f".{key}")
126+
or stem.endswith(f".{key}.osw")
127+
):
129128
return key
129+
130130
return None
131131

132132
def _contains_disallowed_features_for_02(self, geojson_data: Dict[str, Any]) -> set:

src/python_osw_validation/extracted_data_validator.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,20 @@
4848
}
4949

5050

51+
def _matches_dataset_filename(basename: str, dataset_key: str) -> bool:
52+
lower_name = basename.lower()
53+
if not lower_name.endswith(".geojson"):
54+
return False
55+
56+
stem = lower_name[:-len(".geojson")]
57+
return (
58+
stem == dataset_key
59+
or stem == f"{dataset_key}.osw"
60+
or stem.endswith(f".{dataset_key}")
61+
or stem.endswith(f".{dataset_key}.osw")
62+
)
63+
64+
5165
class ExtractedDataValidator:
5266
def __init__(self, extracted_dir: str):
5367
self.extracted_dir = extracted_dir
@@ -100,7 +114,7 @@ def is_valid(self) -> bool:
100114

101115
allowed_keys = tuple(OSW_DATASET_FILES.keys())
102116
unsupported_files = sorted(
103-
{bn for bn in basenames if not any(key in bn for key in allowed_keys)}
117+
{bn for bn in basenames if not any(_matches_dataset_filename(bn, key) for key in allowed_keys)}
104118
)
105119
if unsupported_files:
106120
allowed_fmt = ", ".join(allowed_keys)
@@ -121,7 +135,7 @@ def is_valid(self) -> bool:
121135
file_count = 0
122136
for filename in geojson_files:
123137
base_name = os.path.basename(filename)
124-
if required_file in base_name and base_name.endswith('.geojson'):
138+
if _matches_dataset_filename(base_name, required_file):
125139
file_count += 1
126140
save_filename = filename
127141
if file_count == 0:
@@ -138,7 +152,7 @@ def is_valid(self) -> bool:
138152
file_count = 0
139153
for filename in geojson_files:
140154
base_name = os.path.basename(filename)
141-
if optional_file in base_name and base_name.endswith('.geojson'):
155+
if _matches_dataset_filename(base_name, optional_file):
142156
file_count += 1
143157
save_filename = filename
144158
if file_count == 1:
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '0.3.4'
1+
__version__ = '0.3.5'

tests/unit_tests/test_extracted_data_validator.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,12 @@ def test_valid_subset_of_allowed_files(self):
6969
self.assertTrue(validator.is_valid())
7070
self.assertEqual(len(validator.files), 1)
7171

72+
def test_valid_legacy_osw_suffix_files(self):
73+
validator = ExtractedDataValidator(self.test_dir)
74+
self.create_files(['wa.microsoft.graph.nodes.OSW.geojson', 'wa.microsoft.graph.edges.OSW.geojson'])
75+
self.assertTrue(validator.is_valid())
76+
self.assertEqual(len(validator.files), 2)
77+
7278
def test_non_standard_filenames_raise_error(self):
7379
validator = ExtractedDataValidator(self.test_dir)
7480
self.create_files(['custom.nodes.geojson', 'opensidewalks.nodes.geojson'])
@@ -109,6 +115,16 @@ def test_unsupported_files_are_rejected(self):
109115
'Allowed file names are *.{edges, nodes, points, lines, zones, polygons}.geojson'
110116
)
111117

118+
def test_glued_dataset_names_are_rejected(self):
119+
validator = ExtractedDataValidator(self.test_dir)
120+
self.create_files(['roadedges.geojson', 'roadnodes.geojson'])
121+
self.assertFalse(validator.is_valid())
122+
self.assertEqual(
123+
validator.error,
124+
'Unsupported .geojson files present: roadedges.geojson, roadnodes.geojson. '
125+
'Allowed file names are *.{edges, nodes, points, lines, zones, polygons}.geojson'
126+
)
127+
112128

113129
if __name__ == '__main__':
114130
unittest.main()

tests/unit_tests/test_osw_validation_extras.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -867,6 +867,29 @@ def test_pick_schema_filename_fallback(self):
867867
v.line_schema_path,
868868
)
869869

870+
def test_pick_schema_ignores_prefix_substrings(self):
871+
v = OSWValidation(zipfile_path="dummy.zip")
872+
self.assertEqual(
873+
v.pick_schema_for_file("/tmp/gs_metaline_falls_uga.nodes.geojson", {"features": []}),
874+
v.dataset_schema_paths["nodes"],
875+
)
876+
self.assertEqual(
877+
v.pick_schema_for_file("/tmp/gs_yarrow_point.edges.geojson", {"features": []}),
878+
v.dataset_schema_paths["edges"],
879+
)
880+
self.assertEqual(
881+
v.pick_schema_for_file("/tmp/wa.microsoft.graph.nodes.OSW.geojson", {"features": []}),
882+
v.dataset_schema_paths["nodes"],
883+
)
884+
self.assertEqual(
885+
v.pick_schema_for_file("/tmp/baseline.nodes.geojson", {"features": []}),
886+
v.dataset_schema_paths["nodes"],
887+
)
888+
self.assertEqual(
889+
v.pick_schema_for_file("/tmp/roadEdges.geojson", {"features": []}),
890+
v.line_schema_path,
891+
)
892+
870893
def test_pick_schema_force_single_schema_override(self):
871894
force = "/forced/opensidewalks.schema-0.3.json"
872895
v = OSWValidation(zipfile_path="dummy.zip", schema_file_path=force)

0 commit comments

Comments
 (0)