diff --git a/README.md b/README.md index 2b5427a..cb9763f 100644 --- a/README.md +++ b/README.md @@ -46,10 +46,10 @@ separate to the Emap project root. ### Expected top-level dir structure ``` -├── PIXL -├── config -├── waveform-controller -└── waveform-export +├── PIXL (repo root of the PIXL repo) +├── config (config files for the waveform project) +├── waveform-controller (repo root for this repo) +└── waveform-export (bind mounted by the containers, this is the main working directory for the waveform project) ``` ### Instructions for achieving this structure @@ -59,6 +59,11 @@ separate to the Emap project root. Clone this repo (`waveform-controller`) and [PIXL](https://github.com/SAFEHR-data/PIXL), both inside your root directory. +If on a system that has access to sensitive data, disable push remotes on all cloned repos as follows: +``` +git remote set-url --push origin no_push.example.com +``` + #### make config files Set up the config files as follows: ``` @@ -112,6 +117,9 @@ docker compose build docker compose up -d ``` +For more complex deployment scenarios, such as where there is existing data you need to preserve, +see the more advanced [deployment doc](docs/deployment.md) + ## 3 Check if it's working Running the controller will save (to `../waveform-export`) waveform messages diff --git a/docs/azure_hashing.md b/docs/azure_hashing.md index 5db9928..d3917cf 100644 --- a/docs/azure_hashing.md +++ b/docs/azure_hashing.md @@ -26,6 +26,13 @@ There is a one-off (per key vault) step that needs to be performed manually. First, install the Azure CLI tools in the usual way for your OS. +On the GAE you can run the AZ CLI in a container like so: +``` +docker run --rm -e HTTPS_PROXY=$HTTPS_PROXY -it mcr.microsoft.com/azure-cli:azurelinux3.0 +``` +as per https://learn.microsoft.com/en-us/cli/azure/run-azure-cli-docker?view=azure-cli-latest + + Log in using the service principal. Do not include password on command line; let it prompt you and then paste it in. ``` diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 0000000..1c8940a --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,142 @@ +# Notes on deployment into production + +# About + +This document is intended for deployers of the Waveform export pipeline, +who likely overlap with its developers. + +It describes how to deploy the system into production, and especially how to +"upgrade", ie. re-deploy in an environment where some processing has already taken place. + +Because this project required changes to Emap (mainly the waveform-reader), +it also covers when that might need to be upgraded. + +## Background +The current situation is that we are running an instance +of Emap on `star_dev` that is independent of the "live" versions +on `star_a` and `star_b`, because the waveform export pipeline +is dependent on software changes to Emap and we don't have time +to deploy those changes into Emap main on our schedule — +rebuilding the database takes ~12 weeks now. + +Although the waveform controller queries `star_[ab]`, not `star_dev`, we still keep +a full Emap system running on `star_dev` so that the streamlit visualisation +can run. + +This document should be read in conjunction with the +[pipeline diagram](https://github.com/SAFEHR-data/emap/blob/develop/docs/technical_overview/waveforms/pipeline.md) + +## How to rebuild the system + +It depends on what you have changed! You could take the +sledgehammer approach which is rather similar to +[the initial setup in the main README](../README.md): +* Emap: `emap docker down --volumes` to take down the containers and delete the rabbitmq data +* Delete all Emap tables in `star_dev` as per Emap deployment instructions. +* Waveform: `docker compose down` to bring everything down +* git pull and rebuild containers for the two repos. +* Change config if necessary +* Bring it all up again + +This is mostly going to be unnecessary though, because eg. the +Emap ADT processing is unlikely to have changed. + +Let's go for a more granular approach. Each step is potentially +optional, so read carefully. + +### Stop the Emap waveform-reader +> ![TIP] +> Refer to the Emap deployment guide at +> https://github.com/SAFEHR-data/emap/blob/main/docs/SOP/release_procedure.md + +If you have made changes to the way we receive waveform HL7 +messages, you should stop this container with `emap docker stop waveform-reader`. + +This can take a while, because it will flush out any HL7 +data in memory to disk. + +This will stop listening on port 7777, and in the absence of buffering +on the Smartlinx server, we are now losing waveform data forever, so +try to minimise the amount of time it's in this state. +See https://github.com/SAFEHR-data/emap/issues/135 re buffering. + +Checkout the code you wish to deploy with eg. `(cd emap && git pull)`. + +Build the new version of the waveform-reader image with +`emap docker build waveform-reader`. + +Does any config need updating? See if any config params +have been added/removed +from the Emap global config, and re-run `emap setup -g` as appropriate. + +### Drain the rabbitmq queues +Observe the `waveform_emap` and `waveform_export` queues in rabbitmq. +They are consumed by Emap core and waveform-controller respectively. + +We stopped incoming messages in the previous step, but the queues +probably still contain messages that were generated with the old version of +waveform-reader, so we must decide what to do with them. + +One option is to wait for those consumers to finish their jobs and empty the queues. + +If for some reason the consumers are not running or are malfunctioning (perhaps +they are rejecting and requeueing the messages), then another option is to purge one +or both queues in the rabbitmq admin console. + +If the rabbitmq topology has changed, you might consider bringing down the entire +rabbitmq container and deleting its data volume. + + +### Emap DB and core processor +Less likely, you may have changed the Emap core processor or the +Emap star database. + +If so, you will want to stop and rebuild the `core` service: +``` +emap docker stop core +emap docker build core +``` +(we will bring it back up later) + +We don't have a framework for doing migrations when the database schema has changed, so +any migrations would have to be done on an ad hoc basis. +That's why we tend to delete the entire database and rebuild it. + +However, because no other tables depend on the `waveform` table +(ie. it is a "leaf" of the database schema), +it would be relatively easy to delete only that table and let hibernate rebuild it, +thus avoiding a full rebuild. +When the core service comes back up, it would continue to update the non-waveform data. + +### Waveform controller/exporter (ie. this repo) + +You may need to delete files in the host directory `waveform-export`, which +is bind mounted by the `waveform-controller` and `waveform-exporter` containers. + +Snakemake won't regenerate files if the timestamps of upstream +files suggest they don't need updating. Therefore, if you have made +a change that would affect the contents of those files and wish to +force a re-processing, you will need to manually delete those files. + +To force a re-upload only, delete files in `ftps-logs`. + +To force a reconversion from CSV to parquet (which includes pseudonymisation), +delete files in `pseudonymised` and `hash-lookups`. + +Files in `original-csv` are produced by the waveform-controller. +If you need to regenerate those, +you will need to replay HL7 messages (see later section). + +### Bring it all back up +It shouldn't matter what order things are brought back up in, so let's do it in the same order +it was brought down. + +Bring up any Emap services that we brought down: +Emap repo: `emap docker up -d` + +Bring up the waveform controller/export if you brought them down. +Waveform repo: `docker compose up -d` + +### Replay old HL7 data + +Not yet supported, see https://github.com/SAFEHR-data/emap/issues/139 diff --git a/docs/develop.md b/docs/develop.md index e970eac..8ee4c50 100644 --- a/docs/develop.md +++ b/docs/develop.md @@ -70,6 +70,19 @@ git commit -m "Making pre-commit pass." git push ``` +## Dev tips + +`waveform-exporter` normally runs via cron once per 24 hours. This is not very convenient for dev! +You can either set the cron frequency to every minute (`* * * * *`) and bring up the service in +the normal way, or manually run the one-shot command below when required. + +``` +# make sure hasher is up first +docker compose up -d waveform-hasher +# run +docker compose run --build --entrypoint /app/exporter-scripts/scheduled-script.sh waveform-exporter +``` + ## Testing Even though we are largely running in docker, you may wish to let your IDE have access to a venv for running tests in.