Evaluation script for expected run length (ERL)
- Efficient implementation: more than 10x speed up than funlib implementation
- Refined ERL definition (backward compatible):
no-merge: specify non-merging regions (for example, different semantic classes) with a mask, so false merges into forbidden regions are counted correctly.--merge-threshold: specify merge tolerance (merge percentage / voxel-count threshold): allow minor false-merge contacts instead of immediately forcing a skeleton score to0.
- Reproduced FFN's result on j0126 dataset (Januszewski et. al 2018):
scripts/README.md
- Python
>=3.10is required.
conda create -n erl-eval python=3.10 -y
conda activate erl-eval
# if only h5 files
pip install -e ".[h5]"
| Ground Truth (2 seg) | Prediction (1 false merge and 1 false split) | No-merge Mask |
![]() |
![]() |
![]() |
- GT: Each of the two axons is around 4μm and the total ERL is 4.275μm.
- Prediction: One predicted axon is falsely merged with a dendrite. The other two predicted segments are falsely split (around 2μm each).
- Naive ERL evaluation (defined in the FFN paper)
python scripts/volume_eval.py -p tests/data/vol_pred.h5 -g tests/data/gt_graph.npz -r 30,30,30- ❌ The falsely merged segment is considered correct (ERL=4μm), as the ground truth segments do not know the existence of other segments.
- ✅ The ground truth axon matched with two falsely split segments has ERL=2μm. It agrees with the intuition that 1 split error per 2μm.
- The total ERL=3.054μm is overrated.
- Improved ERL evaluation with
no-mergemaskpython scripts/volume_eval.py -p tests/data/vol_pred.h5 -g tests/data/gt_graph.npz -r 30,30,30 -m tests/data/vol_no-mask.h5- ✅ The falsely merged segment is considered wrong, as it merges with the
no-mergemask. The corresponding gt segment has ERL=0μm. - ✅ The falsely split segment is the same as above.
- The total ERL=1.176μm is reasonable.
- Merge-tolerant false-merge handling (
--merge-threshold)python scripts/volume_eval.py -p <pred.h5> -g <gt_graph.npz> -r 30,30,30 -m <no_merge_mask.h5> -t 30- Small accidental merge contacts below the threshold are tolerated, instead of harshly zeroing the skeleton score for tiny false merges.
- Funkelab: Original implementation with
networkx, which requires much memory. - jasonkena: Designed a
networkx-liteclass that only keeps relevant info, which is still costly to compute. - current: Uses flat-array
ERLGraphstorage (node arrays + edge arrays +edge_ptr), precomputed edge lengths, and detailed error statistics.


