-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Description
There is a significant performance bottleneck when processing large medical datasets. The get_dicom_header() function in pyaslreport/main.py currently attempts to parse every single file in a target directory to build an array of valid DICOMs, before finally returning the header of the first file. For clinical datasets containing thousands of DICOM slices, this causes massive, unnecessary disk I/O and memory overhead.
Steps to Reproduce
- Trigger a report generation using a directory containing a large number of DICOM files (e.g., 3,000+ slices).
- Monitor the execution time and memory usage.
- Observe the massive delay caused by
pydicom.dcmreaditerating over every file.
Expected Behavior
The function should return the dcm_header immediately upon successfully parsing the very first valid DICOM file, turning an O(N) operation into an O(1) operation (best case).
Actual Behavior
The loop iterates through and parses every single file in the directory before returning.
Environment
- OS: Ubuntu 24.04.4 LTS
- Python Version: 3.11
- Package: pyaslreport
Additional Context
I have already written an optimized fix for this that returns the header immediately and stops the loop. I will link a PR shortly. #29