Literature DB >> 35875697

Genomic characterisation reveals a dominant lineage of SARS-CoV-2 in Papua New Guinea.

Theresa Palou, Mathilda Wilmot¹, Sebastian Duchene², Ashleigh Porter², Janlyn Kemoi³, Dagwin Suarkia⁴, Patiyan Andersson¹, Anne Watt¹, Norelle Sherry¹, Torsten Seemann¹, Michelle Sait¹, Charlie Turharus⁵, Son Nguyen⁶, Sanmarié Schlebusch⁶, Craig Thompson⁶, Jamie McMahon⁶, Stefanie Vaccher⁷, Chantel Lin¹, Danoi Esoram⁸, Benjamin P Howden¹, Melinda Susapu⁸.

Abstract

The coronavirus disease pandemic has highlighted the utility of pathogen genomics as a key part of comprehensive public health response to emerging infectious diseases threats, however, the ability to generate, analyse, and respond to pathogen genomic data varies around the world. Papua New Guinea (PNG), which has limited in-country capacity for genomics, has experienced significant outbreaks of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with initial genomics data indicating a large proportion of cases were from lineages that are not well defined within the current nomenclature. Through a partnership between in-country public health agencies and academic organisations, industry, and a public health genomics reference laboratory in Australia a system for routine SARS-CoV-2 genomics from PNG was established. Here we aim to characterise and describe the genomics of PNG's second wave and examine the sudden expansion of a lineage that is not well defined but very prevalent in the Western Pacific region. We generated 1797 sequences from cases in PNG and performed phylogenetic and phylodynamic analyses to examine the outbreak and characterise the circulating lineages and clusters present. Our results reveal the rapid expansion of the B.1.466.2 and related lineages within PNG, from multiple introductions into the country. We also highlight the difficulties that unstable lineage assignment causes when using genomics to assist with rapid cluster definitions.

Entities: Chemical

Keywords: AU.1; AU.3; B.1.459; B.1.466.2; PNG Covid-19; PNG SARS-CoV-2; PNG genomic sequencing; PNG lineage; Pacific Islands SARS-CoV-2; Pacific lineage; Papua New Guinea SARS-CoV-2; genomic sequencing capacity

Year: 2022 PMID： 35875697 PMCID： PMC9278129 DOI： 10.1093/ve/veac033

Source DB: PubMed Journal: Virus Evol ISSN： 2057-1577

Introduction

Coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has resulted in 2.7 million infections and over 40,000 deaths across the Western Pacific region (World Health Organisation 2021b). Papua New Guinea (PNG) was one of the first countries in the region to report a COVID-19 case in March 2020, with 21,896 reported cases and 243 deaths as of 6 October 2021 (World Health Organisation 2021b). PNG experienced the first wave of infection and community transmission since April 2020, with the PNG Government moving rapidly to implement a range of public health measures, resulting in successful reduction and control of the first wave of infection by August 2020 (The World Bank 2021). Despite this, a rapid increase of COVID-19 cases was detected in PNG in early 2021 resulting in a second wave of infection that saw cases rise from 1,583 confirmed cases at the start of March 2021 to 17,774 by the end of July, even with renewed public health control measures. PNG has a population of approximately 8.8 million people living across 22 provinces on the mainland and islands, with 87 per cent of Papua New Guineans living in rural areas. The geographical spread of the population creates significant logistical challenges for diagnostic testing and epidemiological investigation to monitor the introduction and transmission of lineages, and surveillance of disease trends over time. Access to diagnostic testing has been variable across the country and hampered by staffing and logistical issues (Smaghi et al. 2021), impacting the ability to monitor and rapidly implement public health measures to reduce the expansion of disease spread. The detection of cases and the collection of samples for sequencing by PNG’s health system is therefore predominantly from the National Capital District, which encompasses the capital Port Moresby and from the most populous province, Morobe, in which the second-largest city in the country, Lae, is located (Fig. 1, Table 1). The majority of cases reported in the country, however, are identified through private testing carried out by Ok Tedi Mining Ltd, based in the Western province. As such, despite having only 2.8 per cent of the population, the distribution of cases in PNG is heavily biased to the Western province.

Figure 1.

Map of PNG showing administrative provinces and the proportion of samples originating from each, in this dataset.

Table 1.

PNG samples sent to Australia for sequencing by province of collection and proportion of the population the resides in each province for comparison.

Region	Number of samples sent for sequencing	Population by % of PNG total^a
Highlands Provinces
Eastern Highlands	1 (0.03%)	8.00%
Enga	6 (0.2%)	5.90%
Hela	27 (0.9%)	3.40%
Jiwaka	6 (0.3%)	4.70%
Simbu (Chimbu)	38 (1.3%)	5.20%
Southern Highlands	41 (1.4%)	7.00%
Western Highlands	41 (1.4%)	5.00%
Momase Region
East Sepik	1 (0.03%)	6.20%
Madang	1 (0.03%)	6.80%
Morobe	137 (4.6%)	9.30%
Sandaun (West Sepik)	0	3.40%
Southern Region
Central	36 (1.2%)	3.70%
Gulf	15 (0.5%)	2.20%
Milne Bay	0	3.80%
National Capital District	496 (16.6%)	5.00%
Northern Province (Oro)	1 (0.03%)	2.60%
Western Province	1812 (60.8%)	2.80%
Island Regions
Bougainaville (Autonomous Region)	9 (0.3%)	3.40%
East New Britain	95 (3.2%)	4.50%
Manus	3 (0.1%)	0.80%
New Ireland	16 (0.5%)	2.70%
West New Britain	18 (0.6%)	3.60%

Based on 2011 census data (National Statistical Office of Papua New Guinea 2011).

Map of PNG showing administrative provinces and the proportion of samples originating from each, in this dataset. PNG samples sent to Australia for sequencing by province of collection and proportion of the population the resides in each province for comparison. Based on 2011 census data (National Statistical Office of Papua New Guinea 2011). The current typing nomenclature for SARS-COV-2 involves the assignment of lineages that reflect evolutionary relationships and are hierarchically organised following the phylogenetic tree structure. This nomenclature system describes major lineages with letters of the alphabet (e.g. A, B, etc.), with sub- and sub-sub-lineages being numbered and separated by dots (‘.’). Thus, sub-lineage B.1.466.2 is contained within sub-lineage B.1.466, which is itself part of lineage B.1 and the direct parent lineage, B. For readability, only three sub-levels are recorded under this nomenclature system and sub-lineages beyond this level will be shortened by aliases using the next available alpha symbol. For instance, B.1.466.2.1 has been assigned the alias AU.1. A PANGO lineage of SARS-CoV-2 may be designated as a variant of concern (VOC) if there is evidence for epidemiological, pathological, or immunological features of concern (Public Health England 2021). These may be designated by international bodies, or potentially observed and designated as VOCs locally. Currently, the WHO classifies four lineages as VOCs: B.1.1.7, B.1.351 (and sub-lineages), P.1, and B.1.617.2 (World Health Organisation 2021a). All four variants display an unusually high number of mutations, including a number of variations in the genomic region encoding the spike protein thought to have the potential to increase transmissibility or confer immune evasion properties. Emerging VOCs and rapid virus evolution require access to genomic surveillance to support the control and management of the pandemic. Genomic sequencing of SARS-CoV-2 allows for detection and identification of new and emerging lineages and VOCs, assists with the identification of outbreaks and transmission events to contribute to public health interventions, and allows for an estimate of trends and expansion of disease spread. Here, we aim to characterise the circulating SARS-CoV-2 lineages in PNG and describe the dynamics of a genomic dataset that is unique in the region.

Methods

Genomic and epidemiological data

Positive SARS-CoV-2 samples from cases in PNG were submitted from the PNG Central Public Health Laboratory (CPHL) and Ok Tedi Mining Limited (OTML) to the Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL), at the Doherty Institute, Melbourne, for genome sequencing, analysis, and integrated reporting. OTML operates predominantly in the Western Province of PNG, a remote, sparsely populated area bordering Indonesia. While only 2.8 per cent of PNG’s population resides in the Western Province, OTML routinely transports workers in and out of the mining sites, and sends samples collected as part of their workplace testing programme, to Australia for diagnostic testing. All positive samples were referred to MDU PHL for sequencing. Samples referred from CPHL represent a subset of available samples, selected on the bases of temporal and geographic diversity and sample quality, and were sent directly to MDU PHL. Forensic Scientific Services (FSS) at Queensland Health also performed sequencing on additional PNG samples, submitted by CPHL. These sequences were shared with MDU PHL as part of a collaborative analysis agreement under the governance of the PNG NCC. Limited epidemiological data were provided alongside the samples by the PNG NCC and by OTML. There are currently a number of challenges with COVID-19 data collection and recording in PNG and with the epidemiological data, resulting from incomplete or manually transcribed epidemiological records. OTML provided information on case nationality, whether a case was tested on arrival to the mining site (inbound), or whether they were tested whilst working on-site (outbound and monitoring). For non-OTML cases, the PNG NCC provided data on the geographical location of a case, including province, region within a province, and town/village as well as information on symptoms, case contact (where known), and occupation. A case was assigned to a geographical province within PNG based on the data provided by the PNG NCC, or where that was unavailable, from the data provided by OTML. Detailed genomics methods are described in Seemann et al. 2020 and Lane et al. (2021). Briefly, RNA extracted from SARS-CoV-2 reverse-transcriptase polymerase chain reaction (RT-PCR) positive samples underwent tiled amplicon PCR using either ARTIC version 1 or 3 primers (‘ARTIC-Ncov2019/Primer_schemes/NCoV-2019/V3 at Master ARTIC-Network/ARTIC-Ncov2019 GitHub’ n.d.), following published protocols (‘NCoV-2019 Sequencing Protocol’ n.d.). Reads were aligned to the reference genome (Wuhan Hu-1; GenBank MN908947.3) and consensus sequences were generated. Quality control (QC) metrics on consensus sequences included requiring ≥50 per cent genome recovered (≥95 per cent in the FSS pipeline setting), ≤50 single nucleotide polymorphisms from the reference genome, and ≤50 ambiguous or missing bases. Genomic clusters were defined as two or more related sequences using a complete-linkage hierarchical clustering algorithm of pairwise genetic distances derived from a maximum likelihood phylogenetic tree. SARS-CoV-2 genomic lineages were defined using the PANGO lineage nomenclature (Rambaut et al. 2020; SARS-CoV-2 Lineages).

Genomic epidemiology and phylodynamics

To quantify the dynamics of introductions, we used a set of 1,587 genome samples from PNG (Supplementary Appendix B). This dataset included the genomes generated in this study with sufficient sequence quality and associated date of collection, and a sample of global genomic diversity focussed on the region of Oceania by using the latest NextStrain Oceania build, that included 489 genomes from other countries (as of 20 March 2021). We aligned the sequences using MAFFT v7 (Katoh and Standley 2013). We use a previous approach (Duchene et al. 2020b) to obtain a time-scaled phylogenetic tree (To et al. 2016; Duchene et al. 2020a). We defined ‘genomic importation clusters’ as monophyletic groups of at least two genomes sampled from PNG, whereas a ‘singleton’ is a genome sampled from PNG that sits within a group of genomes sampled elsewhere. An importation cluster, therefore, corresponds to a putative introduction event that led to ongoing transmission, whereas a singleton represents a situation where there is no evidence of ongoing transmission (du Plessis et al. 2021). Importantly, whether an importation cluster corresponds to a single importation event is contingent on the data at hand. If the geographic area of interest is sampled at a much higher intensity than other areas, as is the case here, the number of importation clusters will tend to be an underestimate of the number of importation events that gave rise to the data, such that they should be considered as a lower bound. We calculated a range of genomic importation clusters statistics from the time-scaled tree. We focussed on the number of importation lineages, their detection date, first introduction, putative importation date, and the detection lag (the time from the origin of the importation cluster to the date of collection of the first genome). For the largest four genomic importation clusters, we fit a coalescent exponential model in a Bayesian framework in BEAST2.5 (Bouckaert et al. 2019) to infer their exponential growth rate, sampling proportion, and doubling time. The xml file, dated tree, and GISAID accession numbers are available at https://github.com/sebastianduchene/png_sars_cov_2_analyses.

Results

We sequenced 2,981 positive samples at MDU PHL and FSS, collected up to 13 July 2021, yielding 1,797 sequences that met internal QC measures. Sequences used in this study are listed in Supplementary Appendix A. In total, 1,184 samples failed internal QC and were not included in the phylogenetic analyses. From the 1,797 samples that passed QC, 1,672 were successfully linked to the epidemiological metadata provided by OTML and the PNG NCC. Of the samples with available epidemiological data, 59 per cent (1,053 of 1,672 samples) were from the Western Province in PNG (the location of OTML operations), 14 per cent (259 of 1,672 samples) from the National Capital District, 6 per cent (113 of 1,672 samples) from Morobe, and 4 per cent (69 of 1,672 samples) from East New Britain (Fig. 1). The remaining samples (178 of 1,672 samples) span 16 other provinces (Supplementary Appendix A).

Lineages

PANGO lineage assignment on the 1,797 samples from PNG was found to be highly unstable, with constant shifts in the assignment of large numbers of samples across Pangolin versions, particularly across three highly related lineages, B.1.466/B.1.459/AU lineages (Table 2). Samples were frequently reassigned across and within the lineage groups, regardless of genome coverage or sequence quality.

Table 2.

Number of samples and mutational profile of lineages in PNG dataset.

Lineage	Samples (n)	Characteristic mutations^a
Lineage	Samples (n)	Gene	Amino acid
AU.1	507	N	T205I
		ORF1a	A776V
		ORF1a	P804L
		ORF3a	Q57H
		ORF8	S84L
		S	D614G
		ORF1b	P314L
		ORF1a	P1640L
		ORF1a	T1168I
		ORF10	P10S
		ORF1b	R2308C
		ORF1a	A690V
AU.3	444	N	T205I
		ORF1a	T2615I
		ORF1b	P314L
		ORF8	S84L
		S	P681R
		N	D348H
		ORF1a	S944L
		ORF1a	P1640L
		ORF1b	S1182L
		S	D614G
		ORF3a	Q57H
		ORF1a	L3644F
		ORF1a	T1168I
		ORF1b	T2040I
		S	N439K
B	20	ORF8	S84L
B.1	23	ORF8	S84L
		S	D614G
		ORF1b	P314L
B.1.459	532	ORF8	S84L
		S	D614G
		ORF1b	P314L
		ORF1a	P1640L
		ORF3a	Q57H
B.1.466.2	148	N	T205I
		S	D614G
		S	N439K
		ORF1b	P314L
		ORF8	S84L
		ORF3a	Q57H
		ORF1a	T1168I
		ORF1a	P1640L
		ORF1b	S1182L
		S	P681R
		ORF1a	S944L
		ORF1a	L3644F
B.6	95	ORF8	S84L
		ORF1b	A88V
		ORF1a	T2016K
		N	P13L
		ORF1a	L3606F
B.6.8	2	N	P13L
		ORF1a	T2016K
		ORF1b	A88V
		ORF8	L95F
		ORF8	S84L

Data from GISAID and Outbreak.info (Mullen et al. 2022).

Number of samples and mutational profile of lineages in PNG dataset. Data from GISAID and Outbreak.info (Mullen et al. 2022). Eighty-eight percent (1580/1797) of PNG sequences were identified as either AU.1, AU.3, B.1.466.2, or B.1.459 (Table 2). These five, highly related, lineage groups are associated with the Pacific and Southeast Asian region, particularly Indonesia, Malaysia, PNG, and Australia, with the B.1.466.2 clade first proposed for definition by FSS and Queensland Health, after a rise in cases in returned travellers from PNG (16). AU.1/AU.2/AU.3 are all aliases of the B.1.466.2 sub-lineages, whilst B.1459 appears highly related to B.1.466.2/AU on phylogeny. Additionally, 2.4 per cent (43/1797) of sequences typed as B, the first major haplotype to be discovered, and B.1 (Table 2), a large European lineage linked to the Northern Italian outbreak in 2020 (17). The assignment of these recent samples to an early lineage is likely the result of limited analysis and sample representation in this area of the global tree and not the true persistence of such early versions of the virus. Five per cent (94/1797) of sequences typed as B.6/B.6.8, early lineages were predominantly seen in India (B.6) and PNG (B.6.8). Despite the surge in cases seen in PNG during this period and the large ongoing outbreak, only one sample had been identified as a VOC (Delta- B.1.617.2) by 29 July 2021. Lineages for the remaining sequences are available in Appendix B.

Phylogenetic clusters

We performed a phylogenetic analysis and included publicly available sequences from the Solomon Islands, the Philippines, Guam, Timor-Leste, Australia, and Indonesia as well as publicly available PNG sequences, for context (Supplementary Appendix C, Fig. 2). Five broad clusters were identified (Fig. 3), containing a mix of lineages including intermingling of the AU, B.1.466.2, B.1.459, and B.1 samples within clusters, and closely related samples typing as different PANGO lineages (Fig. 4).

Figure 2.

Figure 3.

PNG province of sequence origin, by phylogenetic cluster and date of collection. The described phylogenetic clusters are represented by different colours, with the size of the circle proportional to the number of samples collected in each province on that day. Note; WP = Western Province; WNB= West New Britain; WHP = Western Highlands Province; SHP = Southern Highlands Province; NOP = Northern (Oro) Province; NIP = New Ireland Province; NCD = National Capital District; MOR = Morobe; Man = Manus; MAD = Madang; JIW = Jiwaka; HLP = Hela Province; GF= Gulf; ESP = East Sepik; ENB = East New Britain; CHI = Chimbu (Simbu); CEP = Central Province; AROB = Autonomous Region of Bouganville.

Figure 4.

Timeline of each of the described phylogenetic clusters identified in the PNG sequence dataset as 29 July 2021. The different lineages identified in each cluster are represented by colour, while the size of the circle is proportional to the number of samples in each cluster collected on that day.

Phylogenetic tree showing PNG samples in the context of publicly available international sequences from the Solomon Islands, the Philippines, Guam, Timor, Australia, and Indonesia. PNG sequences generated at MDU PHL and FSS are shown by the circle tips. PNG province of sequence origin, by phylogenetic cluster and date of collection. The described phylogenetic clusters are represented by different colours, with the size of the circle proportional to the number of samples collected in each province on that day. Note; WP = Western Province; WNB= West New Britain; WHP = Western Highlands Province; SHP = Southern Highlands Province; NOP = Northern (Oro) Province; NIP = New Ireland Province; NCD = National Capital District; MOR = Morobe; Man = Manus; MAD = Madang; JIW = Jiwaka; HLP = Hela Province; GF= Gulf; ESP = East Sepik; ENB = East New Britain; CHI = Chimbu (Simbu); CEP = Central Province; AROB = Autonomous Region of Bouganville. Timeline of each of the described phylogenetic clusters identified in the PNG sequence dataset as 29 July 2021. The different lineages identified in each cluster are represented by colour, while the size of the circle is proportional to the number of samples in each cluster collected on that day. Analysis of the temporal distribution of the phylogenetic clusters and PANGO lineages shows a shift from the B.6/B.6.8 lineages in mid-2020, to the described B.1 and AU/B.1.466.2/B.1.459 lineages in early 2021 (Fig. 4). All B.6 and B.6.8 sequences identified in this data set cluster together (‘cluster 1’, Fig. 2) and were collected between 17 June 2020 and 24 March 2021 (Fig. 4). The majority (51 per cent) of samples within this cluster with a recorded collection date were collected prior to 21 December 2020. No other lineages were found in 2020 samples, either in the data described in this paper or in the publicly available PNG sequences. Despite the majority of samples in the data set originating in the Western Province or National Capital District, the phylogenetic clusters identified in this analysis were geographically diverse, with each of the clusters appearing concentrated in different areas of PNG (Fig. 3). The largest cluster, ‘cluster 2’ (Fig. 2), appears to be connected to the OTML mine sites and the Western Province, whilst the smaller clusters appear linked to the National Capital District and larger surrounding provinces (‘cluster 3’), the island of New Britain (‘cluster 5’) or spread from the highland provinces across to New Britain (‘cluster 4’).

Phylogenetic analysis of putative introductions

We estimate that there have been at least 55 introduction events into PNG based on the available genomic data (Supplementary Table S1; Fig. 5). Only three of these introductions consisted of a single case, with no evidence of ongoing transmission. Importantly, the importation clusters were largely consistent with the broad genomic clusters identified above. We found that 24 genome importation clusters had at least five sequences, with the largest having 926 sequences included.

Figure 5.

Phylogenetic analyses of importation clusters from maximum-likelihood dated trees. Top panel: bars corresponds to importation clusters, the y-axis denoting the number of genomes and their time span along the x-axis. Blue dots correspond to the first genome collected and green is the last genome from each cluster. Bottom panel: importation dynamics over time. The grey bars denote the number of importation events per month, while the orange bars show the detection lag; the number of days from the first inferred transmission event to the first collected genome. The first genomic importation cluster with at least five genomes was detected on 19 July 2020, while the last was detected on 9 March 2021. These estimated dates are likely to be later than the actual importation events, because the genomic signal lags behind actual introductions (du Plessis et al. 2021). Under this framework, we estimate that genome importation clusters with at least five genomes were introduced between February 2020 and March 2021. Their respective detection lags had a mean of 18 days (range from 1 day to 3 months). The largest cluster, with 926 sequences included, was probably introduced around mid-December 2020 and it was detected on 1 January 2021, with a detection lag of 20 days. The detection lag was shortest at the peak of the second wave in April and May 2021, with a mean of 1 day. We also estimate that most importations events occurred around March 2021. The largest genomic importation cluster mostly consisted of PANGO lineages B.1.466.2.1 (AU.1), B.1.459, and B.1.466.2.3 (AU.3), with 387, 256, and 198 genomes respectively, such that these three lineages represented over 90 per cent of all the genomes in the cluster.

Phylodynamic analyses of genomic importation clusters

We used a coalescent framework to infer population dynamic parameters for the four largest genome importation clusters. Our estimates of the coalescent growth rate were very similar among clusters at around 28 year−1, which roughly corresponds to a reproductive number, R, of 2.5. The 95 per cent credible interval of the four clusters excluded a 0, such that they all have evidence of epidemic growth. The corresponding doubling times overlapped for all genome importation clusters. The largest importation cluster, A, had the longest doubling time, at 9 days (95 per cent credible interval: 8–11), while the smallest cluster, B, had the shortest time, at 8 days (95 per cent credible interval: 7–10). We also estimated the sampling intensity, which is the number of genomes divided by the inferred infected population size when the last sample was collected. These estimates were very uncertain and below 0.02 (2 per cent), with cluster A having the highest sampling intensity, at 0.011 (95 per cent credible interval: 0.003–0.03). Although these estimates are very uncertain, probably due to the low genetic diversity, they suggest that genome sampling represents a very small proportion of the outbreak associated with each importation cluster (Fig. 6).

Figure 6.

Epidemiological estimates from top four importation clusters. Violin plots denote Bayesian posterior distributions of key parameters, the growth rate, epidemic doubling time, and the sampling intensity (number of genomes per infected case). In the first panel (growth rate) the dashed lines denote the corresponding values for reproductive numbers (Re) of 1.5 and 2.5 assuming a duration of infection of 10 days.

Discussion

In total, 1,797 sequences generated by MDU PHL and FSS from PNG SARS-CoV-2 cases underwent PANGO lineage assignment and phylogenetic analysis to characterise the lineage distribution and genomic relatedness of SARS-CoV-2 in PNG. Analysis of the lineages within this data set found only one VOC sample present, however, the lineages that have been identified are not well characterised by the Pangolin nomenclature, with the intermingling of multiple lineages in the phylogenetic tree, and closely related samples and clusters containing numerous assignments. Phylogenetic analysis of clusters and importations in the data generated at MDU PHL shows a marked shift in the lineage distribution and has identified 55 importation clusters, the majority of which resulted in multiple cases. Due to natural sampling biases in our data, the actual number of viral introductions is likely much higher. These importation clusters are consistent with the broad clusters we have described and the substructure within each of these. The results suggest that while the first introduction in July 2020 resulted in a large B.6/B.6.8 cluster (‘cluster 1’) this was rapidly replaced in 2021 with four distinct clusters made up of B.1 and AU sub-lineages, likely from multiple introductions. However, phylodynamic analysis of the data suggests that the sequences presented here represent a very small proportion of the likely cases associated with each cluster. This correlates with the known testing and sampling challenges within PNG and with the reported epidemiology of the COVID-19 outbreak, where a peak and then drop in case numbers in mid-late 2020 was followed by a sudden increase in early 2021, leading to the large-scale outbreak from which these sequences were predominantly sampled (World Health Organisation 2021b). This data suggests that there has been rapid expansion and geographical spread of lineages in PNG (B.1.459, B.1.466.2, and AU) that are not recognised as a VOC or VOI and that there was an effective replacement of B.6/B.6.8 with the currently circulating PANGO lineages. Publicly available sequences suggest that these lineages identified in PNG are also commonly observed in other countries in the region, particularly Indonesia (Cahyani et al. 2022; Zainulabid et al. 2021) which may explain why the B.1.466.2 and AU lineages are persisting and present in all unrelated clusters, despite multiple introductions into the country. The presence of only one VOC sample in this dataset suggests that at the end of July 2021, the burden of disease in PNG was still predominantly caused by the B.1.466.2, B.1.459, and AU lineages. However, the sampling issues described above mean this is possibly an under-representation of the level of Delta present within the community at this time. The characterisation of lineage distribution in PNG is made difficult by the described issues in lineage assignment and stability in this area of the tree. Large numbers of the PNG sequences type as early lineages (‘B.1’) and lineage assignment frequently, including a large proportion of samples that routinely switch between AU/B.1.466.2 and B.1.459. This impacts the utility of the genomics and prevents PNG from tracking the spread and transmission of SARS-CoV-2, without detailed genomic investigation, a process that is difficult given resource constraints within PNG. We would therefore argue for a closer examination of this area of the global SARS-CoV-2 phylogeny, to resolve the classification issues for lineages routinely seen in the Western Pacific region (O’Toole et al. 2021). This dataset provides a significant amount of new genomic data in an under-sampled region (Chong et al. 2020) where attempts at representative sequencing have been hampered by resource and logistical issues (Kabuni 2020; Smaghi et al. 2021). The data presented here is relevant to the entire Western Pacific region as it shows how quickly lineages in the region can take hold, regardless of official VOC status and how issues related to under-representation in databases like PANGO, can impact work being done in countries like PNG. However, we acknowledge the limitations of this data, including; the high sequencing failure rate, possibly due to the age of samples on arrival in Australia, samples with low viral load, or issues with sample storage during transport; bias in sampling sites and regions within PNG; and the impact that limited testing has on the representativeness of this dataset. Our analysis was also impacted by the limited epidemiological data available to provide context for phylogenetic clusters, the time lag from collection to sequencing, and the logistical constraints that mean only a small proportion of swabs from an already under-sampled population can be sent for sequencing. The genome sequencing and bioinformatic analyses for this programme of work were undertaken offshore at MDU PHL in Australia, however, significant consideration was given to the training opportunities that this model of work afforded. While a longer-term goal will be in-country deployment of sequencing capacity, during this programme of work, significant training in genomic sampling strategies, genomic and epidemiological data governance, combined genomic and epidemiological data analysis, and genomic reporting for public health were undertaken. The international referral of samples was identified as the only rapid, short-term solution for rapid generation of genome sequence data early in the pandemic from PNG, however, the partnership between the laboratories and National Coordinating Centre in PNG and the offshore counterparts has significantly improved knowledge on the approach and use of genome sequence data which will inform future in-country strategies and improve the likelihood of success. Analysis of a small set of sequences from SARS-CoV-2 cases in PNG has provided insight into how quickly lineages can take hold in a country or region, particularly where testing and response resources are limited. The ongoing sequencing work with PNG also highlights the need for curation of PANGO lineages in all areas of the global SARS-CoV-2 tree to ensure stability in linage assignment, enabling countries with limited ability to undertake detailed genomic analysis to still utilise this important public health tool for outbreak and cluster characterisation. This has also demonstrated the value of equitable access to advanced technologies, including genomic sequencing, for informing public health decisions, particularly when necessary to rapidly identify or characterise certain pathogens. Click here for additional data file.

13 in total

1. Genome Profiling of SARS-CoV-2 in Indonesia, ASEAN and the Neighbouring East Asian Countries: Features, Challenges and Achievements.

Authors: Inswasti Cahyani; Eko W Putro; Asep M Ridwanuloh; Satrio Wibowo; Hariyatun Hariyatun; Gita Syahputra; Gilang Akbariani; Ahmad R Utomo; Mohammad Ilyas; Matthew Loose; Wien Kusharyoto; Susanti Susanti
Journal: Viruses Date: 2022-04-08 Impact factor: 5.818

2. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors: Kazutaka Katoh; Daron M Standley
Journal: Mol Biol Evol Date: 2013-01-16 Impact factor: 16.240

3. Barriers and enablers experienced by health care workers in swabbing for COVID-19 in Papua New Guinea: a multi-methods cross-sectional study, November - December 2020.

Authors: Bernnedine S Smaghi; Julie Collins; Rosheila Dagina; Gilbert Hiawalyer; Stefanie Vaccher; James Flint; Tambri Housen
Journal: Int J Infect Dis Date: 2021-05-12 Impact factor: 3.623

4. Fast Dating Using Least-Squares Criteria and Algorithms.

Authors: Thu-Hien To; Matthieu Jung; Samantha Lycett; Olivier Gascuel
Journal: Syst Biol Date: 2015-09-30 Impact factor: 15.683

5. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis.

Authors: Remco Bouckaert; Timothy G Vaughan; Joëlle Barido-Sottani; Sebastián Duchêne; Mathieu Fourment; Alexandra Gavryushkina; Joseph Heled; Graham Jones; Denise Kühnert; Nicola De Maio; Michael Matschiner; Fábio K Mendes; Nicola F Müller; Huw A Ogilvie; Louis du Plessis; Alex Popinga; Andrew Rambaut; David Rasmussen; Igor Siveroni; Marc A Suchard; Chieh-Hsi Wu; Dong Xie; Chi Zhang; Tanja Stadler; Alexei J Drummond
Journal: PLoS Comput Biol Date: 2019-04-08 Impact factor: 4.475

6. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.

Authors: Andrew Rambaut; Edward C Holmes; Áine O'Toole; Verity Hill; John T McCrone; Christopher Ruis; Louis du Plessis; Oliver G Pybus
Journal: Nat Microbiol Date: 2020-07-15 Impact factor: 17.745

7. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK.

Authors: Louis du Plessis; John T McCrone; Alexander E Zarebski; Verity Hill; Christopher Ruis; Moritz U G Kraemer; Andrew Rambaut; Oliver G Pybus; Bernardo Gutierrez; Jayna Raghwani; Jordan Ashworth; Rachel Colquhoun; Thomas R Connor; Nuno R Faria; Ben Jackson; Nicholas J Loman; Áine O'Toole; Samuel M Nicholls; Kris V Parag; Emily Scher; Tetyana I Vasylyeva; Erik M Volz; Alexander Watts; Isaac I Bogoch; Kamran Khan; David M Aanensen
Journal: Science Date: 2021-01-08 Impact factor: 47.728

8. Temporal signal and the phylodynamic threshold of SARS-CoV-2.

Authors: Sebastian Duchene; Leo Featherstone; Melina Haritopoulou-Sinanidou; Andrew Rambaut; Philippe Lemey; Guy Baele
Journal: Virus Evol Date: 2020-08-19

9. Genomics-informed responses in the elimination of COVID-19 in Victoria, Australia: an observational, genomic epidemiological study.

Authors: Courtney R Lane; Norelle L Sherry; Ashleigh F Porter; Sebastian Duchene; Kristy Horan; Patiyan Andersson; Mathilda Wilmot; Annabelle Turner; Sally Dougall; Sandra A Johnson; Michelle Sait; Anders Gonçalves da Silva; Susan A Ballard; Tuyet Hoang; Timothy P Stinear; Leon Caly; Vitali Sintchenko; Rikki Graham; Jamie McMahon; David Smith; Lex Ex Leong; Ella M Meumann; Louise Cooley; Benjamin Schwessinger; William Rawlinson; Sebastiaan J van Hal; Nicola Stephens; Mike Catton; Clare Looker; Simon Crouch; Brett Sutton; Charles Alpren; Deborah A Williamson; Torsten Seemann; Benjamin P Howden
Journal: Lancet Public Health Date: 2021-07-10

10. Near-Complete Genome Sequences of Nine SARS-CoV-2 Strains Harboring the D614G Mutation in Malaysia.

Authors: Ummu Afeera Zainulabid; Norhidayah Kamarudin; Ahmad Hafiz Zulkifly; Han Ming Gan; Darren Dean Tay; Shing Wei Siew; Aini Syahida Mat Yassim; Sharmeen Nellisa Soffian; Ahmad Afif Mohd Faudzi; Ahmad Mahfuz Gazali; Gaanty Pragas Maniam; Hajar Fauzan Ahmad
Journal: Microbiol Resour Announc Date: 2021-08-05