Skip to content

Split Dataset XML generation#157

Open
seanmcculloch wants to merge 6 commits intomainfrom
split_dataset_r2r
Open

Split Dataset XML generation#157
seanmcculloch wants to merge 6 commits intomainfrom
split_dataset_r2r

Conversation

@seanmcculloch
Copy link
Copy Markdown
Collaborator

Description of the Changes You Made

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Instructions for Testing

Additional Info

@seanmcculloch
Copy link
Copy Markdown
Collaborator Author

Verified that XML output of split_dataset matches xml from bigstitcher's Virtual Split command.
https://www.diffchecker.com/Jp8Z1JrM/
(original is bigstitcher, changed is rhapso). Differences in tile size is the only diff, and is irrelevant for this fix

timepoint = il.get("timepoint")
file_path = il.find("path").text if il.find("path") is not None else None
channel = file_path.split("_ch_", 1)[1].split(".ome.zarr", 1)[0]
file_path = il.get("path")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update parse_image_loader_zarr() to properly handle bigstitcher xml conventions, and channel default when not in filename

if vid == view_id:
to_process_interval = (lb, ub)

ub_inclusive = (ub[0]+1, ub[1]+1, ub[2]+1)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval needs to use inclusive coordinates for upper bound, so that it is compatible with block_interval coordinates when calling self.contain()

return int(a + b - (a % b))

def find_min_step_size(self):
def find_min_step_size(self, lowest_resolution=(1.0, 1.0, 1.0)):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this to allow for blocks to be created at 1.0 pixel resolution (previously was limited to blocksizes of powers of 64)

outer_timepoints = ET.Element('Timepoints', {'type': 'pattern'})
ip = ET.SubElement(outer_timepoints, 'integerpattern')
ip.text = "0"
tps = sorted({int(v['old_view'][0]) for v in self.self_definition if v['old_view'][0] is not None})
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determine timepoints integer range from pre-split views


class SplitImages:
def __init__(self, target_image_size, target_overlap, min_step_size, data_gloabl, n5_path, point_density, min_points, max_points,
def __init__(self, target_image_size, target_overlap, min_step_size, data_global, n5_path, point_density, min_points, max_points,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix typo in data_global

if size < 0:
size = l + size
return size
def last_image_size(self, L, S, O):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite of this function to properly determine the size of the last tile. Logic was incorrect previously. perfect uniform tiling has been verified


if length <= self.target_image_size[i]:
pass
dim_intervals.append((0, length - 1))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the entire dataset is smaller than blocksize, still make one block - use the size of the dataset in this dimension

for j in range(i):
other_interval = intervals[j]
intersection = self.intersect(interval, other_interval)
new_v_ip_l = []
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large diff starting here changes handling of IP detections when splitting:

  • adds support for not creating any fake interest points when splitting.
  • adds support for virtual splitting of tiles when no IPs have been detected.

file_path = il.find("path").text if il.find("path") is not None else None

channel = file_path.split("_ch_", 1)[1].split(".ome.zarr", 1)[0]
timepoint = il.get("tp") or il.get("timepoint")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update xml_to_dataframe_split.parse_image_loader_zarr() to match changes to data_prep.xml_to_dataframe.parse_image_loader_zarr()

@seanmcculloch seanmcculloch marked this pull request as ready for review February 11, 2026 23:41
@seanmcculloch
Copy link
Copy Markdown
Collaborator Author

test fusion by running split, then calling fusion to test that it fusion works.

run rhapso pipeline all the way through

  • run on exaspim through to split affine.
  • then, run that output on bigstitcher capsule for fusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant