SAFEHR-data · jeremyestein · Feb 20, 2026 · Jan 30, 2026 · Jan 30, 2026 · Feb 16, 2026
diff --git a/README.md b/README.md
@@ -46,10 +46,10 @@ separate to the Emap project root.
 
 ### Expected top-level dir structure
 ```
-├── PIXL
-├── config
-├── waveform-controller
-└── waveform-export
+├── PIXL (repo root of the PIXL repo)
+├── config (config files for the waveform project)
+├── waveform-controller (repo root for this repo)
+└── waveform-export (bind mounted by the containers, this is the main working directory for the waveform project)
 ```
 
 ### Instructions for achieving this structure
@@ -59,6 +59,11 @@ separate to the Emap project root.
 Clone this repo (`waveform-controller`) and [PIXL](https://github.com/SAFEHR-data/PIXL),
 both inside your root directory.
 
+If on a system that has access to sensitive data, disable push remotes on all cloned repos as follows:
+```
+git remote set-url --push origin no_push.example.com
+```
+
 #### make config files
 Set up the config files as follows:
 ```
@@ -112,6 +117,9 @@ docker compose build
 docker compose up -d
 ```
 
+For more complex deployment scenarios, such as where there is existing data you need to preserve,
+see the more advanced [deployment doc](docs/deployment.md)
+
 ## 3 Check if it's working
 
 Running the controller will save (to `../waveform-export`) waveform messages

diff --git a/docs/azure_hashing.md b/docs/azure_hashing.md
@@ -26,6 +26,13 @@ There is a one-off (per key vault) step that needs to be performed manually.
 
 First, install the Azure CLI tools in the usual way for your OS.
 
+On the GAE you can run the AZ CLI in a container like so:
+```
+docker run --rm -e HTTPS_PROXY=$HTTPS_PROXY -it mcr.microsoft.com/azure-cli:azurelinux3.0
+```
+as per https://learn.microsoft.com/en-us/cli/azure/run-azure-cli-docker?view=azure-cli-latest
+
+
 Log in using the service principal.
 Do not include password on command line; let it prompt you and then paste it in.
 ```

diff --git a/docs/deployment.md b/docs/deployment.md
@@ -0,0 +1,142 @@
+# Notes on deployment into production
+
+# About
+
+This document is intended for deployers of the Waveform export pipeline,
+who likely overlap with its developers.
+
+It describes how to deploy the system into production, and especially how to
+"upgrade", ie. re-deploy in an environment where some processing has already taken place.
+
+Because this project required changes to Emap (mainly the waveform-reader),
+it also covers when that might need to be upgraded.
+
+## Background
+The current situation is that we are running an instance 
+of Emap on `star_dev` that is independent of the "live" versions
+on `star_a` and `star_b`, because the waveform export pipeline
+is dependent on software changes to Emap and we don't have time
+to deploy those changes into Emap main on our schedule —
+rebuilding the database takes ~12 weeks now.
+
+Although the waveform controller queries `star_[ab]`, not `star_dev`, we still keep
+a full Emap system running on `star_dev` so that the streamlit visualisation
+can run.
+
+This document should be read in conjunction with the
+[pipeline diagram](https://github.com/SAFEHR-data/emap/blob/develop/docs/technical_overview/waveforms/pipeline.md)
+
+## How to rebuild the system
+
+It depends on what you have changed! You could take the
+sledgehammer approach which is rather similar to
+[the initial setup in the main README](../README.md):
+* Emap: `emap docker down --volumes` to take down the containers and delete the rabbitmq data
+* Delete all Emap tables in `star_dev` as per Emap deployment instructions.
+* Waveform: `docker compose down` to bring everything down
+* git pull and rebuild containers for the two repos.
+* Change config if necessary
+* Bring it all up again
+
+This is mostly going to be unnecessary though, because eg. the
+Emap ADT processing is unlikely to have changed.
+
+Let's go for a more granular approach. Each step is potentially
+optional, so read carefully.
+
+### Stop the Emap waveform-reader
+> ![TIP]
+> Refer to the Emap deployment guide at
+> https://github.com/SAFEHR-data/emap/blob/main/docs/SOP/release_procedure.md
+
+If you have made changes to the way we receive waveform HL7
+messages, you should stop this container with `emap docker stop waveform-reader`.
+
+This can take a while, because it will flush out any HL7
+data in memory to disk.
+
+This will stop listening on port 7777, and in the absence of buffering
+on the Smartlinx server, we are now losing waveform data forever, so
+try to minimise the amount of time it's in this state.
+See https://github.com/SAFEHR-data/emap/issues/135 re buffering.
+
+Checkout the code you wish to deploy with eg. `(cd emap && git pull)`.
+
+Build the new version of the waveform-reader image with
+`emap docker build waveform-reader`.
+
+Does any config need updating? See if any config params
+have been added/removed
+from the Emap global config, and re-run `emap setup -g` as appropriate.
+
+### Drain the rabbitmq queues
+Observe the `waveform_emap` and `waveform_export` queues in rabbitmq.
+They are consumed by Emap core and waveform-controller respectively.
+
+We stopped incoming messages in the previous step, but the queues
+probably still contain messages that were generated with the old version of
+waveform-reader, so we must decide what to do with them.
+
+One option is to wait for those consumers to finish their jobs and empty the queues.
+
+If for some reason the consumers are not running or are malfunctioning (perhaps
+they are rejecting and requeueing the messages), then another option is to purge one
+or both queues in the rabbitmq admin console.
+
+If the rabbitmq topology has changed, you might consider bringing down the entire
+rabbitmq container and deleting its data volume.
+
+
+### Emap DB and core processor
+Less likely, you may have changed the Emap core processor or the
+Emap star database.
+
+If so, you will want to stop and rebuild the `core` service:
+```
+emap docker stop core
+emap docker build core
+```
+(we will bring it back up later)
+
+We don't have a framework for doing migrations when the database schema has changed, so
+any migrations would have to be done on an ad hoc basis.
+That's why we tend to delete the entire database and rebuild it.
+
+However, because no other tables depend on the `waveform` table
+(ie. it is a "leaf" of the database schema),
+it would be relatively easy to delete only that table and let hibernate rebuild it,
+thus avoiding a full rebuild.
+When the core service comes back up, it would continue to update the non-waveform data.
+
+### Waveform controller/exporter (ie. this repo)
+
+You may need to delete files in the host directory `waveform-export`, which
+is bind mounted by the `waveform-controller` and `waveform-exporter` containers.
+
+Snakemake won't regenerate files if the timestamps of upstream
+files suggest they don't need updating. Therefore, if you have made
+a change that would affect the contents of those files and wish to
+force a re-processing, you will need to manually delete those files.
+
+To force a re-upload only, delete files in `ftps-logs`.
+
+To force a reconversion from CSV to parquet (which includes pseudonymisation),
+delete files in `pseudonymised` and `hash-lookups`.
+
+Files in `original-csv` are produced by the waveform-controller.
+If you need to regenerate those,
+you will need to replay HL7 messages (see later section).
+
+### Bring it all back up
+It shouldn't matter what order things are brought back up in, so let's do it in the same order
+it was brought down.
+
+Bring up any Emap services that we brought down:
+Emap repo: `emap docker up -d`
+
+Bring up the waveform controller/export if you brought them down.
+Waveform repo: `docker compose up -d`
+
+### Replay old HL7 data
+
+Not yet supported, see https://github.com/SAFEHR-data/emap/issues/139
diff --git a/docs/develop.md b/docs/develop.md
@@ -70,6 +70,19 @@ git commit -m "Making pre-commit pass."
 git push
 ```
 
+## Dev tips
+
+`waveform-exporter` normally runs via cron once per 24 hours. This is not very convenient for dev!
+You can either set the cron frequency to every minute (`* * * * *`) and bring up the service in
+the normal way, or manually run the one-shot command below when required.
+
+```
+# make sure hasher is up first
+docker compose up -d waveform-hasher
+# run
+docker compose run --build --entrypoint /app/exporter-scripts/scheduled-script.sh waveform-exporter
+```
+
 ## Testing
 
 Even though we are largely running in docker, you may wish to let your IDE have access to a venv for running tests in.