Leveraging real-world biomarker data: Statistical methods for investigating missingness and longitudinal information for patient risk assessment

dc.contributor.advisorIckstadt, Katja
dc.contributor.authorHunsdieck, Berit
dc.contributor.refereeRahnenführer, Jörg
dc.date.accepted2025-07-07
dc.date.accessioned2025-07-18T05:46:05Z
dc.date.available2025-07-18T05:46:05Z
dc.date.issued2025
dc.description.abstractThe increasing reliance on big data within the pharmaceutical industry underscores significant challenges related to noise and missingness, which can adversely impact data quality and the interpretation of patient outcomes. Noise arises from various sources across different data types, with genomic and transcriptomic data influenced by genetic and environmental factors, while proteomic and metabolomic data are affected by a wider array of variables, complicating the extraction of meaningful insights. Additionally, missing data—whether classified as missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR)—further complicates analyses, particularly in longitudinal studies. To address these challenges, this thesis emphasises the importance of replicating real-world conditions, enabling researchers to better understand data behaviour and develop robust statistical methodologies. The thesis comprises three papers that tackle these challenges from multiple perspectives. The first paper focusses on missingness in metabolomics data, proposing a novel clustering method that integrates missingness information, rather than relying solely on imputed values, to enhance clustering accuracy. This two-step clustering procedure aims to improve patient clustering outcomes, particularly when data are MNAR, and demonstrates superior performance compared to standard methods as confirmed by external validation measures. The second paper addresses the complexities of utilising longitudinal Electronic Health Record (EHR) data for health risk assessment by employing joint models that integrate longitudinal and survival data. This study simulates realistic longitudinal EHR data, incorporating noise, sample size, and cohort homogeneity, to analyse how various data quality characteristics impact model performance. The findings reveal conditions under which joint models outperform traditional Cox models in risk prediction. The third paper explores the modelling and prediction of blood pressure trajectories using data from wearable devices in hypertensive patients. For that, a framework is developed for simulating realistic blood pressure trajectories. This framework is used for evaluating the performance of novel statistical approaches for the prediction of treatment effects of antihypertensive drugs. Together, these studies contribute to a deeper understanding of the challenges posed by noise and missingness in pharmaceutical big data, while offering innovative methodologies to enhance data analysis and patient outcomes.en
dc.identifier.urihttp://hdl.handle.net/2003/43811
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-25585
dc.language.isoen
dc.subject.ddc310
dc.subject.rswkMedizinische Statistikde
dc.subject.rswkEreignisdatenanalysede
dc.subject.rswkBiomarkerde
dc.subject.rswkSimulationde
dc.subject.rswkRobuste Statistikde
dc.titleLeveraging real-world biomarker data: Statistical methods for investigating missingness and longitudinal information for patient risk assessmenten
dc.typeText
dc.type.publicationtypePhDThesis
dcterms.accessRightsopen access
eldorado.secondarypublicationfalse

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_Hunsdieck.pdf
Size:
5.66 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.82 KB
Format:
Item-specific license agreed upon to submission
Description: