Conversation
Contributor
rw251
commented
Nov 28, 2025
- APCS report now gets the primary diagnosis counts in addition to the ones from the "all diagnoses" field.
- New output that does a similar thing for ONS deaths where we have a count for the primary cause of death, and another for all the supplementary cause of death codes
- A validation script that reports on any financial years, or ICD10 codes that don't match our expected regex. These might be ok, but should be a quick way to spot anything unusual - and eliminate the possibility of outputting erroneous patient identifiers in the large all diagnosis field.
- Now gives separate counts for occurrences in primary diagnosis and the all diagnoses field
Query to count the occurrences of ICD10 codes in the ONS death data. Counts the primary diagnosis, and the contributing factors separately. Rounds counts to 10, suppresses values <15, and excludes type 1 opt outs.
It occurs to me that we might get unexpected data. Validating this by the output checkers might be hard so let's add a report to help us: - The financial_year column in apcs might contain unexpected data as I don't think anyone has used this before. It should be in the format "202425" or maybe "2024-25", so if not we report it. - The ICD10 codes in ONS have, I think, already been validated, but I'm not 100% and the ones in the all_diagnoses field probably haven't. So we check each one against a regex and report on those that don't match. Some of these may be valid, in which case we can update the regex. But this removes the risk that somehow patient identifiable data appears in that field.
Jongmassey
approved these changes
Dec 2, 2025
Jongmassey
left a comment
There was a problem hiding this comment.
Couple of minor nits.
Also, having re-read the docs, there's a secondary diagnosis as well - we should probably take the opportunity to include this for completeness. The information as to how often primary/secondary appear in "all" or not is very useful on its own!
| # - A000 or A00X (4 chars without dot) | ||
| # - A00.00 or A00.0X (6 chars with dot) | ||
| # - A0000 or A000X (5 chars without dot) | ||
| ICD10_PATTERN = re.compile(r"^[A-Z][0-9]{2}\.?[0-9X]?[0-9]?$") |
There was a problem hiding this comment.
This seems to correctly match A00X0 and A00.X0 so I think the comment is wrong?
| Der_Financial_Year, | ||
| apcs.APCS_Ident, | ||
| apcs.Der_Financial_Year, | ||
| LTRIM(RTRIM(der.Spell_Primary_Diagnosis)) as primary_diagnosis, |
There was a problem hiding this comment.
der.Spell_Secondary_Diagnosis too for good measure?
Code indicating secondary diagnosis. This is a single code giving the first listed secondary diagnosis, but there may other secondary diagnoses listed in the all_diagnoses field below.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.