Show simple item record

dc.contributor.authorCarrasco-Zanini, J
dc.contributor.authorPietzner, M
dc.contributor.authorKoprulu, M
dc.contributor.authorWheeler, E
dc.contributor.authorKerrison, ND
dc.contributor.authorWareham, NJ
dc.contributor.authorLangenberg, C
dc.date.accessioned2024-07-17T09:57:26Z
dc.date.available2024-04-19
dc.date.available2024-07-17T09:57:26Z
dc.date.issued2024-06-19
dc.identifier.citationJulia Carrasco-Zanini, Maik Pietzner, Mine Koprulu, Eleanor Wheeler, Nicola D Kerrison, Nicholas J Wareham, Claudia Langenberg, Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study, The Lancet Digital Health, Volume 6, Issue 7, 2024, Pages e470-e479, ISSN 2589-7500, https://doi.org/10.1016/S2589-7500(24)00087-6. (https://www.sciencedirect.com/science/article/pii/S2589750024000876) Abstract: Summary Background Broad-capture proteomic technologies have the potential to improve disease prediction, enabling targeted prevention and management, but studies have so far been limited to very few selected diseases and have not evaluated predictive performance across multiple conditions. We aimed to evaluate the potential of serum proteins to improve risk prediction over and above health-derived information and polygenic risk scores across a diverse set of 24 outcomes. Methods We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person-years of follow-up. Participants were middle-aged individuals (aged 40–79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and December, 1997. We selected participants who developed one of ten less common diseases within 10 years of follow-up; we also subsampled a randomly drawn control subcohort, which also served to investigate 14 more common outcomes (n>70), including all-cause premature mortality (death before the age of 75 years; case numbers 71–437; controls 608–1556). Individuals were excluded from the current study owing to failed genotyping or proteomic quality control, relatedness, or missing information on age, sex, BMI, or smoking status. We used a machine learning framework to derive sparse predictive protein models for the onset of the the 23 individual diseases and all-cause premature mortality, and to derive a single common sparse multimorbidity signature that was predictive across multiple diseases from 2923 serum proteins. Findings Participants who developed one of ten less common diseases within 10 years of follow-up included 482 women and 507 men, with a mean age at baseline of 64·56 years (8·08). The random subcohort included 990 women and 769 men, with a mean age of 58·79 years (9·31). As few as five proteins alone outperformed polygenic risk scores for 17 of 23 outcomes (median dfference in concordance index [C-index] 0·13 [0·10–0·17]) and improved predictive performance when added over basic patient-derived information models for seven outcomes, achieving a median C-index of 0·82 (IQR 0·77–0·82). This included diseases with poor prognosis such as lung cancer (C-index 0·85 [+/− cross-validation error 0·83–0·87]), for which we identified unreported biomarkers such as C-X-C motif chemokine ligand 17. A sparse multimorbidity signature of ten proteins improved prediction across seven outcomes over patient-derived information models, achieving performances (median C-index 0·81 [IQR 0·80–0·82]) similar to those of disease-specific signatures. Interpretation We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and to benchmark these against clinical assays, which are required to understand the translational potential of these findings. Funding Medical Research Council, Health Data Research UK, UK Research and Innovation–National Institute for Health and Care Research, Cancer Research UK, and Wellcome Trust.en_US
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/98189
dc.description.abstractBACKGROUND: Broad-capture proteomic technologies have the potential to improve disease prediction, enabling targeted prevention and management, but studies have so far been limited to very few selected diseases and have not evaluated predictive performance across multiple conditions. We aimed to evaluate the potential of serum proteins to improve risk prediction over and above health-derived information and polygenic risk scores across a diverse set of 24 outcomes. METHODS: We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person-years of follow-up. Participants were middle-aged individuals (aged 40-79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and December, 1997. We selected participants who developed one of ten less common diseases within 10 years of follow-up; we also subsampled a randomly drawn control subcohort, which also served to investigate 14 more common outcomes (n>70), including all-cause premature mortality (death before the age of 75 years; case numbers 71-437; controls 608-1556). Individuals were excluded from the current study owing to failed genotyping or proteomic quality control, relatedness, or missing information on age, sex, BMI, or smoking status. We used a machine learning framework to derive sparse predictive protein models for the onset of the the 23 individual diseases and all-cause premature mortality, and to derive a single common sparse multimorbidity signature that was predictive across multiple diseases from 2923 serum proteins. FINDINGS: Participants who developed one of ten less common diseases within 10 years of follow-up included 482 women and 507 men, with a mean age at baseline of 64·56 years (8·08). The random subcohort included 990 women and 769 men, with a mean age of 58·79 years (9·31). As few as five proteins alone outperformed polygenic risk scores for 17 of 23 outcomes (median dfference in concordance index [C-index] 0·13 [0·10-0·17]) and improved predictive performance when added over basic patient-derived information models for seven outcomes, achieving a median C-index of 0·82 (IQR 0·77-0·82). This included diseases with poor prognosis such as lung cancer (C-index 0·85 [+/- cross-validation error 0·83-0·87]), for which we identified unreported biomarkers such as C-X-C motif chemokine ligand 17. A sparse multimorbidity signature of ten proteins improved prediction across seven outcomes over patient-derived information models, achieving performances (median C-index 0·81 [IQR 0·80-0·82]) similar to those of disease-specific signatures. INTERPRETATION: We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and to benchmark these against clinical assays, which are required to understand the translational potential of these findings. FUNDING: Medical Research Council, Health Data Research UK, UK Research and Innovation-National Institute for Health and Care Research, Cancer Research UK, and Wellcome Trust.en_US
dc.format.extente470 - e479
dc.languageeng
dc.publisherElsevieren_US
dc.relation.ispartofLancet Digit Health
dc.rightsPublished by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.
dc.subjectHumansen_US
dc.subjectMiddle Ageden_US
dc.subjectMachine Learningen_US
dc.subjectMaleen_US
dc.subjectFemaleen_US
dc.subjectProspective Studiesen_US
dc.subjectBiomarkersen_US
dc.subjectProteomicsen_US
dc.subjectAgeden_US
dc.subjectAdulten_US
dc.subjectEnglanden_US
dc.subjectRisk Assessmenten_US
dc.subjectRisk Factorsen_US
dc.titleProteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study.en_US
dc.typeArticleen_US
dc.rights.holder© 2024 The Author(s).
dc.identifier.doi10.1016/S2589-7500(24)00087-6
pubs.author-urlhttps://www.ncbi.nlm.nih.gov/pubmed/38906612en_US
pubs.issue7en_US
pubs.notesNot knownen_US
pubs.publication-statusPublisheden_US
pubs.volume6en_US
dcterms.dateAccepted2024-04-19
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
qmul.funderMultimorbidity Mechanism and Therapeutics Research Collaborative::UK Research and Innovationen_US
qmul.funderMultimorbidity Mechanism and Therapeutics Research Collaborative::UK Research and Innovationen_US
qmul.funderMultimorbidity Mechanism and Therapeutics Research Collaborative::UK Research and Innovationen_US
rioxxterms.funder.projectb215eee3-195d-4c4f-a85d-169a4331c138en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record