| Literature DB >> 35468749 |
Chirayu Goswami1, Michael Sheldon1, Christian Bixby1, Mehdi Keddache2, Alexander Bogdanowicz1, Yihe Wang1, Jonathan Schultz1, Jessica McDevitt1, James LaPorta1, Elaine Kwon1, Steven Buyske3, Dana Garbolino1, Glenys Biloholowski1, Alex Pastuszak4, Mary Storella1, Amit Bhalla1, Florence Charlier-Rodriguez1, Russ Hager1, Robin Grimwood1, Shareef A Nahas5.
Abstract
BACKGROUND: The Centers for Disease Control and Prevention contracted with laboratories to sequence the SARS-CoV-2 genome from positive samples across the United States to enable public health officials to investigate the impact of variants on disease severity as well as the effectiveness of vaccines and treatment. Herein we present the initial results correlating RT-PCR quality control metrics with sample collection and sequencing methods from full SARS-CoV-2 viral genomic sequencing of 24,441 positive patient samples between April and June 2021.Entities:
Keywords: COVID-19; Centers for Disease Control; Cycle threshold; Lineage; Next generation sequencing; Reverse transcription polymerase chain reaction; SARS-CoV-2; Variant
Mesh:
Year: 2022 PMID: 35468749 PMCID: PMC9035976 DOI: 10.1186/s12879-022-07374-7
Source DB: PubMed Journal: BMC Infect Dis ISSN: 1471-2334 Impact factor: 3.667
Fig. 1CDC tracking of emerging variants through the pipeline for genomic surveillance (https://www.cdc.gov/coronavirus/2019-ncov/variants/cdc-role-surveillance.html). As part of the CDC National SARS-CoV-2 Strain Surveillance (NS3) System, contracted laboratories select for sequencing a set of deidentified specimens that were previously subjected to SARS-CoV-2 RT-PCR testing and determined to be positive. Generally, the samples then undergo a three-step process for generating sequence data. Specimen preparation and sequencing: SARS-CoV-2 RNA is extracted and converted to complimentary DNA, enriched, and loaded into the next-generation sequencers. Sequence reads are aligned to SARS-COV-2 reference strain using the k-mer detection method. Aligned reads are then used to generate the consensus sequence, call variants and lineage determination. The information along with sequencing quality control statistics are transferred to the CDC repository electronically. Published data are made available to scientists around the world through public repositories
Totals and percentages of samples sequenced and characteristics
| Total samples | 24,441 |
| Sample type | |
| SA | 24,237 |
| NP | 131 |
| OP | 73 |
| Geographic location | |
| MN | 16,485 |
| NJ | 1684 |
| CA | 386 |
| Other U.S. states and territories | 5886 |
| Avg age years (range) | |
| Males | 32.35 (0–94) |
| Females | 32.17 (0–99) |
| Undisclosed | 34.23 (6–74) |
| Mean | 32.29 (0–99) |
| Sex (% samples sequenced) | |
| Male | 11,418 (46.72) |
| Female | 9450 (38.66) |
| Not identified | 3573 (14.62) |
| Ethnicity (% samples sequenced) | |
| Hispanic or Latino | 2518 (10.30) |
| Non-Hispanic or Latino | 12,112 (49.56) |
| Others/unknown not disclosed | 9811 (40.14) |
| Vaccination status (% yes/no/unknown) | |
| Yes | 1757 (7.18) |
| No | 21,572 (88.26) |
| Unknown | 1113 (4.55) |
Fig. 2Genome coverage and ambiguity rates between sequencing instruments. a Distribution of average coverage over genome in samples sequenced using NextSeq550 and NovaSeq600. b Distribution of percent ambiguous nucleotides (masked) in consensus sequence. c Fraction of consensus sequence masked due to nucleotide ambiguity in samples sequenced on NextSeq550 and NovaSeq6000 instruments sequenced on NextSeq550 and NovaSeq600 instruments
Fig. 3Sequence data quality between sample collection methods. a Average percentage of consensus sequence masked as ambiguous in the samples coming from three collection types (NP = Nasopharyngeal Collection, OP = Oropharyngeal collection, S = Saliva Collection). b Mean coverage (depth) of sequencing attained for samples coming from three collection types (NP = Nasopharyngeal Collection, OP = Oropharyngeal collection, S = Saliva Collection)
Fig. 4Association between sequencing-based detection of SARS-CoV-2 virus and baseline RT-PCR Ct Values. Correlation of detecting SARS-CoV-2 viral content in each sample tested with N gene RTPCR Ct values. A lower N gene Ct value was associated with enhanced detection of viral SARS-CoV-2 than those with a higher N gene Ct value as quantitated using RTPCR
Mean Ct value ranges for three SARS-CoV-2 genes in detected and not detected samples
| # Samples | x̅ of N gene Ct | x̅ of S gene Ct | x̅ of ORF1Ab Ct | |
|---|---|---|---|---|
| Not detected | 385 | 24.56 | 24.75 | 24.79 |
| Detected | 24,056 | 22.57 | 22.47 | 22.64 |
| Total | 24,441 | 22.60 | 22.54 | 22.68 |
Fig. 5Association between Ct values and genome coverage. a Mean genome coverage of samples run on the two sequencing platforms is associated with baseline N gene Ct values of the samples. b Distribution of mean genome coverage achieved amongst samples from various Ct groups. c Ct values of 26.2 for NextSeq (red line) and 27.9 for NovaSeq (blue line) samples were identified as a selection resulting high quality (mean coverage > 100×, ambiguous nucleotide fraction in consensus sequence < 10%) sequences
Incidence of CDC SARS-CoV-2 variants of concern in population of samples sequenced
| Lineage | #Samples | % Of total sequenced |
|---|---|---|
| B.1.1.7 | 15,806 | 65.86 |
| B.1.526 | 1330 | 5.54 |
| B.1.429 | 1015 | 4.23 |
| B.1.2 | 950 | 3.96 |
| B.1.1.519 | 730 | 3.04 |
| B.1.526.2 | 488 | 2.03 |
| P.1 | 435 | 1.81 |
| B.1.427 | 411 | 1.71 |
| B.1 | 396 | 1.65 |
Fig. 6Trends in evolution of CDC designated variants of concern in our dataset from April 2021 to February 2022. a A total of 161 lineages of the SARS-CoV-2 variants were detected in the study from April 2021 to June 2021. The prevalence by percent total samples tested are indicated by each data point for the month through February 2022. The top 3 variants of concern by percent of total samples processed are depicted above. b Phylogenetic analysis shows the prevalence of variants as a % of positive samples sequenced in January of 2022 to March of 2022. Omicron lineage BA.16 in January 2022 accounted for ~ 61% of positive samples sequenced, where is in March of 2022 99% of all variants detected were lineages of Omicron (BA.1, BA.1.1, and BA.2)