Problem Description
I am using ecoli_engine_process.py for colony simulation under baseline conditions. While I can successfully save snapshots of the colony state, I continuously encounter errors when trying to emit Parquet files for single-cell data. I wonder if anyone has successfully done this without an online emitter?
The errors and attempted debugging steps are shown below. I noticed the implemented antibiotic simulation uses an online database as the emitter, so I included observations about that in the last section.
It could be a different underlying reason why my simulation failed, and I would appreciate any insights or suggestions!
Environment
- vEcoli commit: (current main branch)
- Python 3.12
- Config:
spatial.json inheritance with "emitter": "parquet"
Goal
Resume a colony simulation from a saved JSON state file (baseline_2gen_seed_0_colony_t6000.json) and output to Parquet files.
My Configuration File
configs/colony_baseline_test2.json:
{
"inherit_from": ["spatial.json"],
"description": "Test2: read from trial1, run for another generation, save parquet per cell, save final state",
"initial_colony_file": "baseline_2gen_seed_0_colony_t6000",
"seed": 0,
"sim_data_path": "out/all_media_conditions1/parca/kb/simData.cPickle",
"emitter": "parquet",
"emitter_arg": {
"out_dir": "out/colony_runs/baseline_3rd_gen_seed_0"
},
"emit_config": false,
"max_duration": 9000,
"save": true,
"save_times": [9000],
"colony_save_prefix": "baseline_3rd_gen",
"parallel": false,
"engine_process_reports": [
["boundary"],
["bulk"],
["listeners"],
["environment", "exchange"]
]
}
Issue 1: NumpyRandomStateSerializer Serialization/Deserialization Mismatch
Command:
python ecoli/experiments/ecoli_engine_process.py --config configs/colony_baseline_test2.json
Error:
Traceback (most recent call last):
File "ecoli/experiments/ecoli_engine_process.py", line 526, in <module>
run_simulation(config)
File "ecoli/experiments/ecoli_engine_process.py", line 389, in run_simulation
initial_state = get_state_from_file(...)
File "ecoli/library/json_state.py", line 168, in get_state_from_file
return json.loads(f.read(), object_hook=custom_decoder)
File "ecoli/library/serialize.py", line 83, in deserialize
data = orjson.loads(data)
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)
Possible Root Cause:
serialize() at line 70-72 appears to output Python tuple format: ('MT19937', [...])
deserialize() at line 82 uses orjson.loads() which expects JSON array format: ["MT19937", [...]]
File: ecoli/library/serialize.py
Original code (lines 78-85):
def deserialize(self, data):
matched_regex = self.regex_for_serialized.fullmatch(data)
if matched_regex:
data = matched_regex.group(1)
data = orjson.loads(data)
rng = np.random.RandomState()
rng.set_state(data)
return rng
Attempted Fix: Replace orjson.loads() with ast.literal_eval() to handle Python tuple format:
def deserialize(self, data):
import ast
matched_regex = self.regex_for_serialized.fullmatch(data)
if matched_regex:
data = matched_regex.group(1)
if data.startswith("("):
data = ast.literal_eval(data)
else:
data = orjson.loads(data)
rng = np.random.RandomState()
rng.set_state(tuple(data))
return rng
Issue 2: Parquet Emitter Cannot Handle pint.Quantity in Config Metadata
Error (after attempting to fix Issue 1):
Traceback (most recent call last):
File "ecoli/library/parquet_emitter.py", line 963, in emit
v = np.asarray(v, dtype=np_dtype(v, k))
File "ecoli/library/parquet_emitter.py", line 706, in np_dtype
raise ValueError(f"{field_name} has unsupported type {type(val)}.")
ValueError: spatial_environment_config__multibody__bounds has unsupported type <class 'pint.Quantity'>.
During handling of the above exception, another exception occurred:
File "ecoli/library/parquet_emitter.py", line 967, in emit
v = pl.Series([v])
TypeError: not yet implemented: Nested object types
Possible Root Cause:
spatial.json contains pint.Quantity values (e.g., "!units[50 micrometer]")
np_dtype() doesn't appear to handle pint.Quantity, and falls back to Polars
pl.Series([v]) also seems unable to handle pint.Quantity objects
File: ecoli/library/parquet_emitter.py
Original code (line 967):
Attempted Fix: Convert unsupported types to string:
Issue 3: emit_config Setting Not Passed to Engine
Note: Even after the str(v) fix, I attempted to disable config emission via JSON config "emit_config": false, but it appeared to have no effect.
Possible Root Cause:
ecoli_engine_process.py does not seem to pass the emit_config parameter to the Engine constructor.
File: ecoli/experiments/ecoli_engine_process.py
Original code (around line 465):
engine = Engine(
processes=composite.processes,
topology=composite.topology,
initial_state=initial_state,
experiment_id=experiment_id,
emitter=emitter_config,
progress_bar=config["progress_bar"],
metadata=metadata,
profile=config["profile"],
initial_global_time=config.get("start_time", 0.0),
)
Attempted Fix: Add emit_config parameter:
engine = Engine(
...
initial_global_time=config.get("start_time", 0.0),
emit_config=config.get("emit_config", False),
)
Issue 4: Parquet Emitter Assumes agents Key in Data Structure
Error (after attempting to fix Issues 1-3):
Traceback (most recent call last):
File "ecoli/experiments/ecoli_engine_process.py", line 485, in run_simulation
colony_save_states(engine, config)
File "ecoli/experiments/ecoli_engine_process.py", line 255, in colony_save_states
engine.update(time_to_next_save)
...
File "ecoli/processes/engine_process.py", line 505, in next_update
self.emitter.emit(emit_config)
File "ecoli/library/parquet_emitter.py", line 1007, in emit
if len(data["data"]["agents"]) > 1:
KeyError: 'agents'
Possible Root Cause:
ParquetEmitter.emit() appears to expect a data["data"]["agents"] structure (outer simulation)
- EngineProcess inner emitter seems to send cell data directly without the
agents wrapper
- The inner emitter is configured via
inner_emitter in the EngineProcess config
File: ecoli/library/parquet_emitter.py (line 1007)
Observations on Implemented Antibiotic Simulation
The tet_amp_sim.py uses a different configuration that appears to avoid these issues:
# From configs/cloud.json (inherited by antibiotics.json)
{
"emitter": "database", # MongoDB, not Parquet
"emitter_arg": {"host": "10.138.0.75:27017", "emit_limit": 4100000}
}
- MongoDB can handle arbitrary Python objects including
pint.Quantity
- No JSON serialization issues with
NumpyRandomState
- No
agents key structure assumptions
Summary Table
| Issue |
File |
Line |
Status |
| 1. RandomState serialize/deserialize mismatch |
serialize.py |
78-85 |
Attempted fix with ast.literal_eval() |
2. pint.Quantity not supported |
parquet_emitter.py |
967 |
Attempted fix with str(v) |
3. emit_config not passed to Engine |
ecoli_engine_process.py |
~470 |
Attempted fix by adding parameter |
4. Missing agents key handling |
parquet_emitter.py |
1007 |
UNRESOLVED |
Problem Description
I am using
ecoli_engine_process.pyfor colony simulation under baseline conditions. While I can successfully save snapshots of the colony state, I continuously encounter errors when trying to emit Parquet files for single-cell data. I wonder if anyone has successfully done this without an online emitter?The errors and attempted debugging steps are shown below. I noticed the implemented antibiotic simulation uses an online database as the emitter, so I included observations about that in the last section.
It could be a different underlying reason why my simulation failed, and I would appreciate any insights or suggestions!
Environment
spatial.jsoninheritance with"emitter": "parquet"Goal
Resume a colony simulation from a saved JSON state file (
baseline_2gen_seed_0_colony_t6000.json) and output to Parquet files.My Configuration File
configs/colony_baseline_test2.json:{ "inherit_from": ["spatial.json"], "description": "Test2: read from trial1, run for another generation, save parquet per cell, save final state", "initial_colony_file": "baseline_2gen_seed_0_colony_t6000", "seed": 0, "sim_data_path": "out/all_media_conditions1/parca/kb/simData.cPickle", "emitter": "parquet", "emitter_arg": { "out_dir": "out/colony_runs/baseline_3rd_gen_seed_0" }, "emit_config": false, "max_duration": 9000, "save": true, "save_times": [9000], "colony_save_prefix": "baseline_3rd_gen", "parallel": false, "engine_process_reports": [ ["boundary"], ["bulk"], ["listeners"], ["environment", "exchange"] ] }Issue 1:
NumpyRandomStateSerializerSerialization/Deserialization MismatchCommand:
Error:
Possible Root Cause:
serialize()at line 70-72 appears to output Python tuple format:('MT19937', [...])deserialize()at line 82 usesorjson.loads()which expects JSON array format:["MT19937", [...]]File:
ecoli/library/serialize.pyOriginal code (lines 78-85):
Attempted Fix: Replace
orjson.loads()withast.literal_eval()to handle Python tuple format:Issue 2: Parquet Emitter Cannot Handle
pint.Quantityin Config MetadataError (after attempting to fix Issue 1):
Possible Root Cause:
spatial.jsoncontainspint.Quantityvalues (e.g.,"!units[50 micrometer]")np_dtype()doesn't appear to handlepint.Quantity, and falls back to Polarspl.Series([v])also seems unable to handlepint.QuantityobjectsFile:
ecoli/library/parquet_emitter.pyOriginal code (line 967):
Attempted Fix: Convert unsupported types to string:
Issue 3:
emit_configSetting Not Passed to EngineNote: Even after the
str(v)fix, I attempted to disable config emission via JSON config"emit_config": false, but it appeared to have no effect.Possible Root Cause:
ecoli_engine_process.pydoes not seem to pass theemit_configparameter to the Engine constructor.File:
ecoli/experiments/ecoli_engine_process.pyOriginal code (around line 465):
Attempted Fix: Add
emit_configparameter:Issue 4: Parquet Emitter Assumes
agentsKey in Data StructureError (after attempting to fix Issues 1-3):
Possible Root Cause:
ParquetEmitter.emit()appears to expect adata["data"]["agents"]structure (outer simulation)agentswrapperinner_emitterin the EngineProcess configFile:
ecoli/library/parquet_emitter.py(line 1007)Observations on Implemented Antibiotic Simulation
The
tet_amp_sim.pyuses a different configuration that appears to avoid these issues:pint.QuantityNumpyRandomStateagentskey structure assumptionsSummary Table
serialize.pyast.literal_eval()pint.Quantitynot supportedparquet_emitter.pystr(v)emit_confignot passed to Engineecoli_engine_process.pyagentskey handlingparquet_emitter.py