Modify `ThermoDigitalTwin` to support reading preacquired datasets

##  Motivation
 
1. We want to support running analysis methods on pre-acquired microscopy datasets — without needing a live instrument connection. Use cases include:
  -1a. Segmentation of atoms/particles
  -1b.  Bayesian optimisation on spectrum images
  -1c.  Inpainting methods
  -1d.  Running LLMs on pre-acquired data via MCP

2. Ideally, users should not need to set up any servers. The server can be hosted centrally (e.g., on the Gatan PC), while users interact through a client-side notebook that handles data loading and exposes simple commands which displays them their data. Now they can build and test different methods on it.
     - Question becomes -> user transfers the file on Gatan pc as the server sits there? is there a better way to handle this?

3. This will be handy for next Mic-hackathon


 
## Supported Dataset Types
 
| Format | Data Type | Example |
|--------|-----------|---------|
| `.emd` | HAADF | [example] |
| `.emd` | SI-EDX | [example](https://github.com/pycroscopy/pyTEMlib/blob/main/example_data/SI%20HAADF%201402.emd) |
| `.emd` | Single EDX spectrum | [example](https://github.com/pycroscopy/pyTEMlib/blob/main/example_data/EDS-STO.emd) |
| `.dm4` | HAADF | — |
| `.dm4` | SI-EELS | — |
| `.dm4` | Single EELS spectrum | — |
| `.mrc` | 4D-STEM | — |
 
> **Scope for this issue:** Start with all `.emd` cases.
 
## Proposed API
 
### Client-side behaviour
 
1. User downloads a `.emd` file locally
2. User instantiates `ThermoDigitalTwin`, passing the file path as a `device_attribute`
3. User calls:
 
```python
mic_proxy.get_preacquired_data(
    file_type: Literal[".emd", ".dm4", ".mrc"],
    data_type: Literal["HAADF", "SI-EDX", "SI-EELS", "Spectrum"]
    file_path: Literal["path/to/file"]

)
```
 
**If `data_type="HAADF"`:**
- Single frame → returns image array + metadata (e.g. `pixel_size`)
- Multiple frames → user can choose:
  - Get the *i*-th frame
  - Get the mean of all frames
  - Get all frames as a stack
 
**If `data_type="SI-EDX"`:**
- Returns the corresponding HAADF image + metadata (e.g. `pixel_size`)
- User can then place the beam and acquire a spectrum:
 
```python
mic_proxy.place_beam(coordinate: tuple)  # place beam at (x, y)
mic_proxy.get_spectrum()                 # returns spectrum array + metadata (e.g. energy_offset, dispersion)
```
 
### Server-side changes
 
Extend `ThermoDigitalTwin` with:
 
**`device_attributes`**
- `file_path: str` — path to the pre-acquired dataset
 
**New commands**
```python
def get_preacquired_data(
    file_type: Literal[".emd", ".dm4", ".mrc"],
    data_type: Literal["HAADF", "SI-EDX", "SI-EELS", "Spectrum"]
) -> ...
```
 
**Internal helpers**
```python
def _load_data(
    file_type: Literal[".emd", ".dm4", ".mrc"],
    data_type: Literal["HAADF", "SI-EDX", "SI-EELS", "Spectrum"]
) -> ...
```
 
## Testing
 
- Upload a small representative file for each format to [SciFiDatasets](https://github.com/pycroscopy/SciFiDatasets)
- Write tests asserting:
  - Correct array **shape** and **dtype** per data type
  - Expected **metadata fields** are present and correctly typed (e.g. `pixel_size`, `energy_offset`, `dispersion`)
  - Correct **reader behaviour** — using either `pyTEMlib` reader utils or `scifireader` directly

## Open Questions
- What should be the standard return type? (numpy vs sidpy)
- Should beam placement be stateful or stateless?
- How to handle large datasets (lazy loading vs full load)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify `ThermoDigitalTwin` to support reading preacquired datasets #62

Motivation

Supported Dataset Types

Proposed API

Client-side behaviour

Server-side changes

Testing

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Format	Data Type	Example
`.emd`	HAADF	[example]
`.emd`	SI-EDX	example
`.emd`	Single EDX spectrum	example
`.dm4`	HAADF	—
`.dm4`	SI-EELS	—
`.dm4`	Single EELS spectrum	—
`.mrc`	4D-STEM	—

Modify ThermoDigitalTwin to support reading preacquired datasets #62

Description

Motivation

Supported Dataset Types

Proposed API

Client-side behaviour

Server-side changes

Testing

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Modify `ThermoDigitalTwin` to support reading preacquired datasets #62