| Literature DB >> 31790148 |
Kathryn A McGurk1,2,3, Arianna Dagliati4, Davide Chiasserini2, Dave Lee2, Darren Plant5, Ivona Baricevic-Jones2, Janet Kelsall2, Rachael Eineman2, Rachel Reed2, Bethany Geary2, Richard D Unwin1,2, Anna Nicolaou3, Bernard D Keavney1, Anne Barton5,6, Anthony D Whetton2, Nophar Geifman4.
Abstract
MOTIVATION: Data-independent acquisition mass spectrometry allows for comprehensive peptide detection and relative quantification than standard data-dependent approaches. While less prone to missing values, these still exist. Current approaches for handling the so-called missingness have challenges. We hypothesized that non-random missingness is a useful biological measure and demonstrate the importance of analysing missingness for proteomic discovery within a longitudinal study of disease activity.Entities:
Mesh:
Year: 2020 PMID: 31790148 PMCID: PMC7141869 DOI: 10.1093/bioinformatics/btz898
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary statistics of the study participants
| Trait | Mean (range) | ||
|---|---|---|---|
| DA | High | 2° High | Low |
|
| 12 | 20 | 32 |
| Age (years) | 63 (21–82) | 66 (37–81) | 59 (28–81) |
| Gender | 8% male | 20% male | 19% male |
| BMI | 32.29 (22.27–50.32) | 26.61 (17.04–35.80) | 29.43 (19.71–43.70) |
Note: The mean and range for each trait that describes the 64 participants included in the proteomic analyses are shown.
BMI, body mass index; DA, disease activity; n, sample size.
BMI is measured at baseline.
Fig. 1.The assessment of missingness for each protein over collection time points and by disease activity status. (A) The correlation between protein missingness over three time points showed a strong relationship; the magnitude of missingness for each protein measured at baseline correlated with those measured at 3 months and with those at 6 months to R = 0.95 (CI = 0.94–0.96). The protein missingness measured at 3 months correlated with those measured at 6 months to R = 0.97 (CI = 0.96–0.97). (B) The correlation between protein missingness separated by response status to treatment. The magnitude of missingness for each protein identified in the high disease activity group correlated with those measured in secondary high disease activity group to R = 0.84 (CI = 0.82–0.86), and to those measured in low disease activity group to R = 0.82 (CI = 0.79–0.84). The protein missingness measured in secondary high disease activity group correlated with that of low disease activity group to R = 0.94 (CI = 0.93–0.95)
Fig. 2.Identification of an example outlier protein (in the square) as a predictor of disease activity from the assessment of missing values. The outlier protein is identified as an outlier due to increased missingness count in low disease activity participants when compared to both types of high disease activity groups. The protein’s missingness does not separate those at particularly high disease activity participants from secondary high disease activity participants. The shaded area is a line parallel to the linear regression line, expanded in size
Missingness summary statistics of an outlier protein
| Time |
| HD | 2°HD | LD |
| % (miss) in LD |
|---|---|---|---|---|---|---|
| Baseline | 58 | 5 (42%) | 9 (47%) | 21 (77%) | 35 | 60% |
| 3 months | 47 | 0 (0%) | 4 (27%) | 18 (78%) | 22 | 82% |
| 6 months | 44 | 4 (57%) | 2 (13%) | 15 (71%) | 21 | 71% |
| Total | 149 | 9 (32%) | 15 (30%) | 54 (76%) | 78 | 69% |
Note: High disease participants (HD), secondary high disease participants (2°HD), and low disease participants (LD) are described. HD and 2°HD had low levels of missingness at each time point of collection, while LD showed >2-fold levels of missingness at all time points. n = count of participants at each time point; % indicates missing values of the total participants with that outcome at that time point; n(miss) = the count of total missing values at each time point; and %(miss) in LD is the % of total missingness for the outlier protein found in LD at each collection time point. The total contains a sum of the counts and the mean % in brackets.