Linkage of Maternity Hospital Episode Statistics data to birth registration and notification records for births in England 2005–2014: Quality assurance of linkage of routine data for singleton and multiple births
MetadataShow full item record
Objectives: To quality assure a Trusted Third Party linked data set to prepare it for analysis. Setting: Birth registration and notification records from the Office for National Statistics for all births in England 2005–2014 linked to Maternity Hospital Episode Statistics (HES) delivery records by NHS Digital using mothers’ identifiers. Participants: All 6 676 912 births that occurred in England from 1 January 2005 to 31 December 2014. Primary and secondary outcome measures: Every link between a registered birth and an HES delivery record for the study period was categorised as either the same baby or a different baby to the same mother, or as a wrong link, by comparing common baby data items and valid values in key fields with stepwise deterministic rules. Rates of preserved and discarded links were calculated and which features were more common in each group were assessed. Results: Ninety-eight per cent of births originally linked to HES were left with one preserved link. The majority of discarded links were due to duplicate HES delivery records. Of the 4854 discarded links categorised as wrong links, clerical checks found 85% were false-positives links, 13% were quality assurance false negatives and 2% were undeterminable. Births linked using a less reliable stage of the linkage algorithm, births at home and in the London region, and with birth weight or gestational age values missing in HES were more likely to have all links discarded. Conclusions: Linkage error, data quality issues, and false negatives in the quality assurance procedure were uncovered. The procedure could be improved by allowing for transposition in date fields, and more discrimination between missing and differing values. The availability of identifiers in the datasets supported clerical checking. Other research using Trusted Third Party linkage should not assume the linked dataset is error-free or optimised for their analysis, and allow sufficient resources for this.
- Population Health