Literature DB >> 33180819

Blood group typing from whole-genome sequencing data.

Julien Paganini¹, Peter L Nagy², Nicholas Rouse², Philippe Gouret¹, Jacques Chiaroni³, Chistophe Picard³, Julie Di Cristofaro³.

Abstract

Many questions can be explored thanks to whole-genome data. The aim of this study was to overcome their main limits, software availability and database accuracy, and estimate the feasibility of red blood cell (RBC) antigen typing from whole-genome sequencing (WGS) data. We analyzed whole-genome data from 79 individuals for HLA-DRB1 and 9 RBC antigens. Whole-genome sequencing data was analyzed with software allowing phasing of variable positions to define alleles or haplotypes and validated for HLA typing from next-generation sequencing data. A dedicated database was set up with 1648 variable positions analyzed in KEL (KEL), ACKR1 (FY), SLC14A1 (JK), ACHE (YT), ART4 (DO), AQP1 (CO), CD44 (IN), SLC4A1 (DI) and ICAM4 (LW). Whole-genome sequencing typing was compared to that previously obtained by amplicon-based monoallelic sequencing and by SNaPshot analysis. Whole-genome sequencing data were also explored for other alleles. Our results showed 93% of concordance for blood group polymorphisms and 91% for HLA-DRB1. Incorrect typing and unresolved results confirm that WGS should be considered reliable with read depths strictly above 15x. Our results supported that RBC antigen typing from WGS is feasible but requires improvements in read depth for SNV polymorphisms typing accuracy. We also showed the potential for WGS in screening donors with rare blood antigens, such as weak JK alleles. The development of WGS analysis in immunogenetics laboratories would offer personalized care in the management of RBC disorders.

Entities: Chemical Disease Gene Mutation Species

Year: 2020 PMID： 33180819 PMCID： PMC7660531 DOI： 10.1371/journal.pone.0242168

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Whole-genome data has become more accessible thanks to techniques being made easier, the availability of sequencing machines or contractors, and the release of public data. Only a small part of these entire genomes are exploited beyond the scope of their initial purposes. Amplicon-based next-generation sequencing (NGS) assays have in many ways laid the groundwork for whole-genome analyses as they require equivalent reagents, equipment and experimental skills. Much software for amplicon-based NGS has been developed, validated and certified in clinical fields. More particularly, most immunogenetics labs are equipped with amplicon-based NGS for HLA typing and some have also developed and validated such techniques for human platelet and RBC antigens [1-3]. Many questions can be explored in various fields thanks to WGS resources and their integrative investigation; first in population genetics where such data may improve understanding of natural selection, local adaptation, demographic history and early human migration [4,5]. Then in evolutionary genetics where it can address more fundamental issues such as gene evolution and functional investigation [4,5] thanks to haplotype reconstruction or the localization of new variants. Finally, these rapidly evolving techniques have now made their entry into analysis on an individual scale, for example in forensics and for clinical purposes [3,6]. However many issues raised by WGS handling limit the implementation of these techniques both in Research and Clinical laboratories working within regulatory approved frameworks e.g. Council of Europe (CE): software availability, database accuracy and editing, coverage and read depth quality indicators [5,7]. Thus, many whole-genome experiments designed for one scientific purpose are not used for any further analyses. The aim of this study was to overcome these limitations and to estimate the feasibility of RBC antigen typing from WGS data. We analyzed whole-genome data from 79 individuals from Central Asia [8] for the highly polymorphic HLA-DRB1 gene and for 9 blood group antigen. The same samples had previously been typed for HLA-DRB1 by amplicon-based monoallelic sequencing and for blood group bi-allelic polymorphisms using SNaPshot analysis [9]. WGS data was analyzed with software validated for HLA typing from NGS data [10]. This software relies on an allele alignments database; whereas the HLA system has a very convenient database and consensus on allele naming [11] with monthly updates, genetic polymorphism of RBC antigens are provided in Portable Document Format (pdf) and need to be converted. Most importantly, the software used in this study allows phasing of variable positions to define alleles or haplotypes. In a second analysis, the software was set up to search WGS data for new alleles. Indeed, previous investigations for blood groups but also for specific anthropogenic analyses revealed that this cohort presented a singular genetic mosaic of components from various geographic regions of Eurasian ancestry [12].

Materials and methods

DNA samples

Seventy-nine samples were used in this study formerly analyzed for anthropogenic markers and described in [12]. All samples were obtained from unrelated male Afghan volunteers after obtaining written informed consent. The study protocol was registered by the Ministere de l’Enseignement Superieur et de la Recherche in France (committee 208C06, decision AC-2008-232). Institutional review board Ministere de l’Enseignement Superieur et de la Recherche in France committee 208C06, (decision AC-2008-232) specifically approved this study.

Blood group genotyping by SNaPshot analysis

Samples were analyzed for main RBC antigens and results have been previously published [9]. DNA was genotyped for the Kell (KEL), Duffy (FY), Kidd (JK), Cartwright (YT), Dombrock (DO), Indian (IN), Colton (CO), Diego (DI) and Landsteiner-Wiener (LW) systems by SNaPshot analysis (corresponding genes according to ISBT nomenclature: KEL, ACKR1 (FY), SLC14A1 (JK), ACHE (YT), ART4 (DO), AQP1 (CO), CD44 (IN), SLC4A1 (DI) and ICAM4 (LW) [13]. Determination of blood group antigens, other than those of the ABO, RH and MNS systems, depends mainly on the presence of one or more SNPs in the coding sequence. Fourteen SNPs were analyzed corresponding to bi-allelic polymorphism (KEL p.Thr 193Met (KEL:1,-2), KEL p.Leu597Pro (KEL:6,-7), FY p.Gly42Asp (FY:2), FY p.Arg89Cys (Fya+w), FY c.-67T>C (Fy(a-b-) erythroid cells only), JK p.Asp280Asn (JK:2), YT p.His353Asn (YT:-1,2), DO p.Asn265Asp (DO:2), DO p.Gly108Val (DO:-4), DO p.Thr117Ile (DO:-5), IN p.Arg46Pro (IN:1,-2), CO p.Ala45Val (CO:2), DI p.Pro854Leu (DI:1,-2) and LW p.Gln100Arg (LW:7) (https://www.isbtweb.org).

HLA-DRB1 typing by monoallelic sequencing

HLA-DRB1 was typed by monoallelic sequencing using Protrans HLA SBT S3 (Protrans) according to manufacturer's instructions. This kit relies on locus specific amplification followed by monoallelic Sanger sequencing.

Whole-genome NGS library preparation and data acquisition

Detailed description of WGS procedure is given in [8]. DNA samples were sonicated using a Covaris S220 Ultrasonicator to yield fragments with a median fragment length of 300 bps according to the manufacturer’s recommendations. Low molecular weight DNA (<300 bps) enrichment from all samples was performed using AMPure XP beads (NEB). The library was prepared using the TruSeq Nano DNA LT kit (Illumina) according to the manufacturer’s recommendations. Library size and quality was confirmed with Fragment Analyzer (Advanced Analytical) and quantitative PCR (Biorad S1000; CFX96 Real Time System). Paired-end sequencing (2x150 bps) was performed on the Illumina NovaSeq 6000 System (Illumina) following the manufacturer’s recommendations.

Whole genome data analysis

Pre-alignment processing

Demultiplexing of runs was performed in BaseSpace (www.illumina.com/BaseSpaceApps). Prior to analysis, quality and adapter trimming was performed by Trim Galore (Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/trim_galsore/) on all fastq files from all runs. Low quality bases with a Phred score below 20 (Q20) were removed from the 3 prime end of the reads followed by the removal of any Illumina adapter contamination (minimum adapter match of 3 with an allowed matching error rate of 0.1). Reads of less than 40 after quality and adapter trimming were removed and only properly paired-end read data were retained and analyzed.

Sequencing data quality assessment

Sequencing performance relies mainly on genome coverage and read depth [14]. WGS data quality was assessed by the quantity of reads obtained per sample. Mean read depth of genome was estimated for each sample by the total number of reads X read size [150 bps] / genome size [2,867,437,753 bps]. Mean read depth for each gene was also estimated by the number of reads mapped X read size [150 bp] / gene size.

Statistical analyses

Statistical analyses were performed with GRAPH PAD Prism 5 software (California USA, www.graphpad.com). Number of reads are presented as mean and range [min, max]. Differences among number of reads according to typing gene status were tested using Kruskal-Wallis one-way ANOVA for three values and Mann Whitney test for two values. Threshold for significance (alpha) was set at 0.05.

Blood group typing and HLA-DRB1 allelic assignment

PolyPheMe software (Xegen, France) was used to perform all typing from WGS data. WGS data were directly aligned to each gene as reference, no human genome was used for read mapping. Alignments were generated by PolyPheMe software with a Bowtie tool [15,16]. Genetic polymorphisms of RBC antigens described in [3] and International Society of Blood Transfusion (ISBT; http://www.isbtweb.org) were used for genetic alignment construction. Reference alleles were generated for KEL, ACKR1 (FY), SLC14A1 (JK), ACHE (YT), ART4 (DO), AQP1 (CO), CD44 (IN), SLC4A1 (DI) and ICAM4 (LW) genes. The other blood group database for which updates were stopped in 2017 was not used for this study [17,18]. 1648 variable positions (68 for KEL, 228 for FY, 904 for SLC14A1, 4 for ACHE, 21 for ART4, 387 for CO, 5 for IN, 20 for DI and 11 for LW were analyzed with PolyPheMe v1.2 on WGS data. All positions analyzed and their corresponding alleles are given in S1 Table. A minimum threshold was defined at 5 reads per position analyzed. The PolyPheMe software can phase heterozygous positions and identify haplotypes when reads overlap. WGS analysis validation was based on a comparison with the 14 positions described by SNaPshot assays. In a second phase, potential new alleles were estimated by previously unidentified combinations of known polymorphisms but also by polymorphisms unmapped in the ISBT database. For these unreported alleles, WGS data was re-analyzed and polymorphisms were taken into consideration if they had a minimum threshold of 10 reads per position combined with a minimum of 5 occurrences. HLA-DRB1 was typed at second field resolution with specific parameters for HLA systems previously described [10] using the IMGT 3.39.0 database [11] as reference. This analysis used allele typing according to polymorphisms described in the database. A second analysis was performed on WGS data to find potential new polymorphisms.

Results

Sequencing data quality

Sequencing data are available at http://www.ncbi.nlm.nih.gov/bioproject/662371. Genome sequencing displayed a mean of 34 Gb [16-53].The mean read depth of the genome, estimated for each sample by the total number of reads X read size / genome size, was 11.8x [5.5x-18.4x]. Mean read depth for each gene, estimated by the number of reads mapped X read size [150 pb] / gene size, are given in S2 Table.

Blood group analyses

Blood group genotyping analyzed by SNaPshot are given in Table 1. Most analyses focused on one SNP leading to bi-allelic results, except for KEL, DO and FY systems for which 2 or 3 SNPs were analyzed. Most antigens displayed low or no allelic diversity except for DO (p.Asn265Asp), FY (p.Gly42Asp), JK (p.Asp280Asn) and YT (p.His353Asn).

Table 1

Blood group typing by SNaPshot analysis.

Polymorphism	Allele	Ho wt	Ho mt	He	ND
KEL (578C>T)	KEL*02	75		4
KEL (1790T>C)	KEL*02.06	72			7
ACKR1 (-67T>C)	FY*01N.01	72			7
ACKR1 (125G>A)	FY*02	19	16	30	14
ACKR1 (265C>T)	FY*01W.01	65			14
SLC14A1 (838G>A)	JK*01	17	24	37	1
ACHE (1057C>A)	YT*02	68		11
ART4 (793A>G)	DO*02	12	27	33	7
ART4 (323G>T)	DO*02.–04	72			7
ART4 (350C>T)	DO*01.–05	79
CD44 (137G>C)	IN*01	72			7
AQP1 (134C>T)	CO*02	79
SLC4A1 (2561C>T)	DI*02	77		2
ICAM4 (299A>G)	LW*07	72			7
Total		851	67	117	71

Blood groups genotyped by SNaPshot analysis (Ho wt: Homozygous wild type, Ho mt: Homozygous mutated, He: Heterozygous, ND: Not defined).

Blood groups genotyped by SNaPshot analysis (Ho wt: Homozygous wild type, Ho mt: Homozygous mutated, He: Heterozygous, ND: Not defined). Group typing based on WGS analysis was performed targeting all of the variable positions described in S1 Table. Sixty-three alleles out of 1035 described by SNaPshot could not be resolved (6.1%) by WGS analysis. For all genes analyzed, typing resolution was associated with the number of reads mapped on their genetic sequence (S3 Table). WGS-based typing showed 100% of concordance for homozygous SNPs analyzed by SNaPshot (N = 865 SNPs) and 95.3% for heterozygous positions (N = 102/107 SNPs; Table 2).

Table 2

Blood group typing by WGS analysis.

Gene	Polymorphism	Homozygous positions			Heterozygous positions
Gene	Polymorphism	Correct typing	Incorrect typing	ND	Correct typing	Incorrect typing	ND
KEL	KEL (578C>T)	70		5	2	1	1
KEL	KEL (1790T>C)	69		3
ACKR1 (FY)	ACKR1 (-67T>C)	64		8
	ACKR1 (125G>A)	32		3	26	1	3
	ACKR1 (265C>T)	61		4
SLC14A1 (JK)	SLC14A1 (838G>A)	37		4	32	1	4
ACHE (YT)	ACHE (1057C>A)	60		8	10		1
ART4 (DO)	ART4 (793A>G)	36		3	31	1	1
	ART4 (323G>T)	68		4
	ART4 (350C>T)	76		3
CD44 (IN)	CD44 (137G>C)	70		2
AQP1 (CO)	AQP1 (134C>T)	75		4
SLC4A1 (DI)	SLC4A1 (2561C>T)	76		1	1	1
ICAM4 (LW)	ICAM4 (299A>G)	71		1
Total		865		53	102	5	10

Validation of WGS-based blood group typing according to SNaPshot results (N: Number, ND: Not defined, unresolved).

Validation of WGS-based blood group typing according to SNaPshot results (N: Number, ND: Not defined, unresolved). 98.6% of WGS-based typing results were concordant with SNaPshot results for KEL (p.Met193Thr), one heterozygous sample was not correctly typed (KEL*02) and 5 samples remained unresolved. The monomorphic position KEL (p.Pro597Leu) was 100% concordant, 3 samples were unresolved. 100% of concordance was observed for the monomorphic positions FY (p.Arg89Cys) and FY -67T>C, 4 and 8 samples remained unresolved respectively. 98.4% of WGS-based typing results were concordant with SNaPshot results for FY (p.Gly42Asp); among the 30 heterozygous samples, 1 was typed FY*01. Six samples were not resolved. 98.6% of WGS-based results were concordant with SNaPshot results for JK (p.Asp280Asn), with 1 incorrect typing for a heterozygous sample (JK*02). Eight samples were not resolved. 100% of concordance was observed for the monomorphic positions DO (p.Gly108Val) and DO (p.Thr117Ile); 4 and 3 samples remained unresolved respectively. 98.6% of WGS-based results were concordant with SNaPshot results for DO (p.Asn265Asp), 1 heterozygous sample was incorrectly typed (DO*02). Four samples were not resolved. 100% of concordance was observed for YT (p.His353Asn) (9 samples were unresolved), IN (p.Pro46Arg) (2 samples were not resolved), CO (p.Ala45Val) (4 samples were not resolved,) and LW (p.Gln100Arg) (1 sample not resolved). 98.7% of concordance was observed for DI (p.Pro854Leu) with 1 incorrect typing for a heterozygous sample (DI*02); 2 samples remained unresolved. WGS-based typing targeting all of the variable positions (described in S1 Table) led to ambiguities (described in S4 Table) but also to more precise typing. WGS analysis allowed typing of JK*01W.01 allele corresponding to JK:1WK phenotype in 28 samples [19]; 10 samples were JK*02/JK*01W.01, 6 were JK*01/JK*01W.01 and 1 sample was homozygous for JK*01W.01. FY*02 allele associated with c.298G>A (p.Ala100Thr) was found in 18 samples [20]. No SNaPshot results were available to confirm or refute these typing results. WGS data analysis also revealed polymorphisms that were unmapped in the ISBT database. A total of 267 previously unidentified polymorphisms covered with a minimum depth of 10x and observed in a minimum of 5 samples were found (S5 Table). Among these, 5 SNPs were in exonic regions but none led to amino-acid changes. Two SNPs in the DO gene were observed in 18 and 21 samples, 2 SNPs in IN were observed in respectively 37 and 41 samples and, in the JK gene, one SNP was observed in 37 individuals (S6 Table).

HLA-DRB1 analyses

Thirty-four HLA-DRB1 alleles were defined at maximum resolution by amplicon-based monoallelic sequencing, 5 samples could not be analyzed (Table 3).

Table 3

HLA-DRB1 analysis by monoallelic sequencing.

HLA-DRB1	No.
*07:01:01	16
*11:04:01	16
*03:01:01	15
*13:01:01	13
*01:01:01	11
*15:01:01	11
*14:01:01/14:54	9
*15:02:01	8
*11:01:01/11:01:08	7
*10:01:01	5
*01:02:01	3
*04:01:01	3
*11:03	3
*15:06	3
*16:02:01	3
*04:03:01	2
*04:04:01	2
*04:05:01	2
*11:01:01	2
*14:04	2
*04:01:03	1
*04:02	1
*07:05	1
*08:01:01/08:01:03	1
*08:03:02	1
*11:42	1
*12:01:01/12:06/12:10/12:17	1
*12:02:01	1
*13:02:01	1
*13:03:01	1
*14:01:03	1
*14:07:01	1
*14:10	1
*15:02:02	1
ND	10

HLA-DRB1 allelic typing results using monoallelic sequencing (No.: Number of alleles; ND: Not defined).

HLA-DRB1 allelic typing results using monoallelic sequencing (No.: Number of alleles; ND: Not defined). Ninety-one percent of WGS-based HLA-DRB1 typing, i.e. 135 out of 148 alleles, showed an exact match with typing defined by monoallelic sequencing at second field resolution. Most discordances were due to insufficient coverage and low read numbers leading to differences in 3rd and 4th digits; two samples (counting for 4 alleles) could not be typed. No novel polymorphism could be detected in HLA-DRB1 during the second analysis of the WGS data.

Discussion

In this study we explored diploid markers in WGS data generated for Y-chromosome analysis from 79 individuals [8]. Analyses were performed with Polypheme software validated for HLA typing from NGS data [10] and set up for RBC analysis. HLA-DRB1 gene and 9 blood group antigens were typed (KEL, ACKR1 (FY), SLC14A1 (JK), ACHE (YT), ART4 (DO), AQP1 (CO), CD44 (IN), SLC4A1 (DI) and ICAM4 (LW)) according to standard nomenclature (IMGT 3.39.0 database [11], ISBT (http://www.isbtweb.org) and RBC antigens [3]). Whereas targeted strategies, such as PCR followed by sequencing or SnaPshot, circumvent specificity issues of genes with structural changes and hybrids such as RHCE/RHD and GPA/GPB; their analysis from WGS data have requires specific bioinformatic approaches including CNV (copy number variation) analysis. Therefore, such systems were not included in this study. Our results showed that blood group typing deduced from WGS were correct at 99.5% compared to SNaPshot analysis (967 SNP correctly identified out of 972 typed); 93% when taking into account ambiguous typing. In a clinical or research context however, ambiguous RBC results need to be reanalyzed. HLA-DRB1 typing from WGS showed 91% of concordance with those obtained by amplicon-based monoallelic sequencing. These performances on RBC antigens were similar to those presented in a former study on WGS from donor data [3] which included the typing of highly complex genes such as MNS, RHD/RHCE and ABO systems. WGS data quality is assessed by the estimation of read depth. A former study conducted on WGS data established a minimum of 15x for RBC antigen typing in the clinical field [3,14]. Here, mean read depth of the genome was estimated at 11.8x [5.5x-18.4x] and read depth for each gene reached higher values. For each gene, typing resolution was significantly associated with the number of reads mapped on its sequence and ambiguous and incorrect typing showed low numbers of reads corresponding to the missing allele and read depth equal to or below 15x. Our study thus confirms that RBC typing from WGS should be considered reliable with read depths strictly above 15x. To reach this goal, genome sequencing of one human (3Gb) should be analyzed with at least 45 Gb of data, here mean data was 34 Gb [16-53]. In our study, WGS data analysis allowed refined typing, identification of both potential new alleles and haplotypes as PolyPheMe software used here allowed phasing of polymorphisms subject to sufficient coverage and variable positions. We were able to type the JK*01W.01 allele [19] and the FY*02 allele associated with c.298G>A [20]. The weak JK allele may present a risk of hemolytic transfusion reactions [21] as it has been shown that among samples screened as JK:-1,-2, a fraction was JK:1WK [22]. JK*01W.01 has been reported in Caucasian, Asian and Chinese individuals [19] but there is a lack of description of this allele among different populations. Given the frequency found here, our results strongly support the need of a better description of this allele, particularly in Asia. Serological typing is the gold standard for blood group analysis but in particular situations molecular analysis can provide valuable information. In hematology laboratories, molecular biology based on sequence analysis was superseded by ready-to-use closed systems mainly based on SNPs analysis and validated for clinical purposes. Whereas unthinkable for routine patient care, some situations would gain from WGS such as screening donors for rare blood antigens and the management of RBC disorders. In this regards, our results showing rare and potential new alleles are particularly relevant in diseases such as Sickle Cell disease for example, where allo-immunization is a major complication [23]. Research of minor alleles and their potential role in allo-immunization in these patients would be a major advance in personalized medicine. In a second analysis, WGS data were screened for new polymorphisms. 262 new variable positions in intronic regions and 5 polymorphisms in exons were identified, none led to non-synonymous mutations. An insight of their frequencies in populations described as being related to the Afghan population would contribute to refining their origins [12]. Molecular testing for the HLA system has been integrated in immunogenetics laboratories for a long time and evolves according to new technologies. Amplicon-based NGS is suitable for donor HLA typing, with robust and certified protocols, high throughput and highly resolutive typing results. These protocols can be performed with methods requiring several days and are also suitable for patients, for whom typing results are rarely impatiently awaited. Immunogenetics laboratories are thus quite prepared to integrate WGS in their pipeline and use it to analyze other immune markers. Patients with auto-immune diseases, solid organ and HSC transplantation, or inflammatory diseases would benefit from personalized care with specific typing of non-classical HLA, FC receptors, KIR or LILRs [24-27]. In conclusion, the implementation of WGS can serve many purposes, from anthropogenic integrative studies to handling specific diseases in clinical fields.

Positions analyzed in blood group genes.

Positions analyzed for blood group typing and their corresponding alleles. (XLSX) Click here for additional data file.

Number of reads and read depth.

Mean [min-max] number of reads and estimated read depth for each blood group gene analyzed. For each locus, gene size and effective size (i.e. sequence without repeated patterns in intronic sequences) are given. (DOCX) Click here for additional data file.

Typing resolution and number of reads.

Typing status according to number of reads (mean [min-max]) (No.: Number) and read depth (mean [min-max]); (Incorrectly typed samples could not be included in the statistical analysis (N = 1)). (DOCX) Click here for additional data file.

Typing ambiguities.

WGS blood group results ambiguities (No.: Number). (DOCX) Click here for additional data file.

Unreported polymorphisms.

Number of polymorphisms revealed by whole-genome analysis but not described in the ISBT database (observed in at least 5 samples with a minimum coverage of 10 reads). (DOCX) Click here for additional data file.

New polymorphisms in exons.

Description of exonic SNPs revealed by whole-genome analysis. Note that mutations in IN are located after the codon stop (exon 9) in IN isoform 4 described in ISTB. (DOCX) Click here for additional data file. 6 Aug 2020 PONE-D-20-17434 BLOOD GROUP TYPING FROM WHOLE-GENOME SEQUENCING DATA PLOS ONE Dear Dr. Di Cristofaro, Thank you for submitting your manuscript to PLOS ONE. I apologize for the long time that it has taken for the review process. I have received comments from two reviewers so far, and, after carefully considering them, feel that while the manuscript has merit, it does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, I invite you to submit a revised version of the manuscript that addresses all the points that have been raised by the reviewers. Please submit your revised manuscript by Sep 20 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Santosh K. Patnaik, MD, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please clarify what type of consent you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). 3. To comply with PLOS ONE submission guidelines, in your Methods section, please provide additional information regarding your statistical analyses. For more information on PLOS ONE's expectations for statistical reporting, please see https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting. 4.Thank you for stating the following in the Financial Disclosure section: [The author(s) received no specific funding for this work.]. We note that one or more of the authors are employed by a commercial company: Xegen, Gemenos and Praxis Genomics LLC Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form. Please also include the following statement within your amended Funding Statement. “The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.” If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. 2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The submitted manuscript, by Paganini et al, describes and validates a method to determine HLA-DRB1 and 9 blood group genotypes from whole genome Next Generation Sequencing (NGS) data. The described approach is technically sound, with appropriate description of the methods and of the important bioinformatic parameters in the results. Blood group genotyping is validated by comparing with previously-published SNaPshot data, and HLA-DRB1 typing is compared with amplicon-based monoallelic sequencing. Ethics approval is properly documented. This research contribution demonstrates one of the advantages of employing NGS to predict red blood cell phenotypes: the capacity to also detect novel blood group alleles. The software employed includes physical phasing capabilities, allowing for unambiguous haplotype determination in many cases. The authors provide sufficient supplementary data for replication of this approach with other datasets, but the actual source NGS sequences do not appear to be publicly available. Although the authors provide a thorough description of the bioinformatic pre-alignment process, there is no mention of the aligner used, the specific aligning parameters, and the human genome build employed in this study. Is this because the alignment is done by the PolyPheMe software as well? The manuscript reports a potential novel weak FY allele, which on Table 1 is described as c.125A (p.Gly42Asp – the Fyb antigen) with c.298G>A (p.Ala100Thr) in cis. The presence of the Ala100Thr variant alone in a Fy(b) background was in fact reported previously in the literature (Olsson et al, BJH 1998, 103, 1184-1191) with no reported weakening of FY expression. Four novel missense variants are reported by the study. For the Dombrock blood group the authors report p.Asp265Asn as a novel change; however, this actually corresponds to the known Do(a/b) antithetical antigens. This is however, one of the instances where the human genome build reference nucleotide does not match the ISBT reference table. In hg38, the reference nucleotide in chr12: 14840505 is a C, which leads to Asp265 (ART4 is coded in the minus strand). However, for immunohematologists and in the ISBT database, Asn265 is considered the reference. The manuscript also lists p.Pro179Arg as a novel missense change in the JK blood group, however this variant is listed as part of JK*02N.13 in the most recent ISBT JK database (v6.0 01-MAR-2020 v2.0). Was this variant identified in cis with p.Asp280Asn (Jkb)? Was it identified without concomitant p.Met167Val, which is part of the definition of JK*02N.13? The manuscript clearly reports the number of cases were heterozygosity was missed, and mentions that this was associated with low depth. It would be useful to provide the range of depth that was associated with this phenomenon, to aid in the determination of a minimum depth for this particular approach. Minor revisions: - Page 11, line, ‘targetingall’ is missing a space. - Page 15, second paragraph, first line – the subject is singular (‘laboratory’) while the verb is plural (‘are’). Reviewer #2: Manuscript number PONE-D-20-17434 The study aimed to validate red blood cell antigen typing from WGS data using a software validated for HLA typing. Seventy – nine (79) samples, representing male Afghan volunteers, were used for the study. The samples had been tested by SNaPshot analysis. This SNaPshot analysis genotyped for SNVs associated with major blood group antigen polymorphisms for the nine blood groups analysed in this study. The WGS data was compared to the previous published SNaPshot genotyping data. WGS also revealed other variants including the presence of JK allelic variants associated with weak Jk antigen expression as well five SNVs in exonic regions for DO, IN and JK variants that have been described in Genebank but for which the blood group phenotype association is unknown. The limitation of the study is that no serology was performed and the authors note this in the discussion. The study also revealed a number of intronic variants which could be informative in future population studies. Major comments 1 The NGS data correlated with SNaPshot in defining alleles that were homozygous for the target SNV however there were five (102/107) incorrect typing results at the heterozygous positions. Of note, also a large number of positions (both homozygous and heterozygous) could not be called. The authors should discuss whether the incorrect typing at heterozygous positions as well as this level of calls “not defined” is acceptable. The mean read depth of the genome was 11.8 which is not high and a reflection of the WGS approach used. Is this arguing for a need for a more targeted and efficient approach to obtain a higher read depth? 2 The approach of using the HLA software to interpret the blood group (Other than ABO) data appears novel although the study appears more a feasibility study than a validation study at this stage. 3 Please check the bi-allelic polymorphisms listed in the Material and Methods under Blood group genotyping e.g KEL, JK , etc. as a few are now not matching the most recently revised ISBT reference tables. Other comments: Results Page 11 Line 210 should the number be ‘148’ not ‘146’? Please also check the ISBT terminology used e.g. use Jk (a+w ) phenotype Discussion First paragraph, last sentence: Please consider including an explanation for not including MNS because this was included in the study SNaPshot study. The authors state in the discussion that the “results showed 99.5% of concordance for blood group polymorphisms” – please show how they arrived at this figure. Editorial Page 11 Line 189 ‘WGS-based typing targeting all…’ Space between targeting and all Same paragraph – last sentence – do the authors mean ‘confirm or refute’…? ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: PONE-D-20-17434.docx Click here for additional data file. 21 Sep 2020 Dr. Di Cristofaro UMR7268 Faculté de Médecine Timone Marseille France Santosh K. Patnaik, MD, PhD Academic Editor PLOS ONE PONE-D-20-17434 BLOOD GROUP TYPING FROM WHOLE-GENOME SEQUENCING DATA PLOS ONE To the Editor and the Reviewers, Thank you for having considered our manuscript for publication to PLOS ONE. We are grateful to both reviewers for their careful review. Accordingly, we are submitting a revised version of our manuscript that addresses the points raised during the review process. Please, find in this letter responses to each point raised by the editor and the reviewers. Editor comments: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. > File naming and authors’ affiliations have been modified according to PLOS ONE style requirements. 2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please clarify what type of consent you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). > Type of consent obtained from participants was clarified in the ethics statement in the Methods and online submission information as follows: « All samples were obtained from unrelated male Afghan volunteers after obtaining written informed consent. The study protocol was registered by the Ministere de l’Enseignement Superieur et de la Recherche in France (committee 208C06, decision AC-2008-232). Institutional review board Ministere de l’Enseignement Superieur et de la Recherche in France committee 208C06, (decision AC-2008-232) specifically approved this study. » 3. To comply with PLOS ONE submission guidelines, in your Methods section, please provide additional information regarding your statistical analyses. > Additional information regarding statistical analyses was added in the Methods section as follows : « Statistical analyses were performed with GRAPH PAD Prism 5 software (California USA, www.graphpad.com). Number of reads are presented as mean and range [min, max]. Differences among number of reads according to typing gene status were tested using Kruskal-Wallis one-way ANOVA for three values and Mann Whitney test for two values. Threshold for significance (alpha) was set at 0.05. » 4. Financial Disclosure section, Funding Statement and Competing Interests Statement modifications >Funding, Financial disclosure, disclosure statement and authors’ contributions sections have been added to the manuscript as follows: FUNDING No funding was received for this research. The funder provided support in the form of salaries for authors JP, PG and PN, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are detailed in the ‘author contributions’ section. Financial Disclosure The authors received no specific funding for this work. Authors Julien Paganini and Philippe Gouret are employed by a commercial company: Xegen, Gemenos, France. Author Peter L. Nagy is employed by a commercial company: Praxis Genomics LLC, Atlanta, Georgia, USA. Disclosure Statement The authors have no conflicts of interest to declare. Commercial affiliation of JP, PG and PN does not alter our adherence to all PLOS ONE policies on sharing data and materials. Author Contributions Conceptualization, Methodology, Visualization and Writing – Original Draft Preparation : JP and JDC; Data Curation and Formal Analysis : JP, PN, NR, PG and JDC ; Investigation Resources : CP, JC and JDC ; Project Administration : CP and JC ; Funding Acquisition : PN and JC ; Software : JP and PG ; Supervision and Validation: JC and CP ; Writing – Review & Editing : all authors Reviewers’ comments: The authors would like to thank both reviewers for their accurate review and kindness, their knowledge of blood groups have improved our manuscript. Reviewer #1 Actual source NGS sequences do not appear to be publicly available. > Authors agree with the reviewer. The reference of NGS has been added in the introduction and material section (Determination of the phylogenetic origins of the Árpád Dynasty based on Y chromosome sequencing of Béla the Third. Nagy PL, et al. Eur J Hum Genet. 2020 Jul 7. PMID: 32636469). The link for uploaded data has been added in the results section: “Sequencing data are available at http://www.ncbi.nlm.nih.gov/bioproject/662371”. This link will become public once the paper is published. A reviewer link is also normally created once the data set is fully processed as well. Although the authors provide a thorough description of the bioinformatic pre-alignment process, there is no mention of the aligner used, the specific aligning parameters, and the human genome build employed in this study. Is this because the alignment is done by the PolyPheMe software as well? > Authors agree with the reviewer. Additional information regarding alignment process was added in the Material and Methods section as follows: “WGS data were directly aligned to each gene as reference, no human genome was used for read mapping. Alignments were generated in PolyPheMe software with a Bowtie tool (Langmead et al. 2009; Langmead 2010).” The manuscript reports a potential novel weak FY allele, which on Table S1 is described as c.125A (p.Gly42Asp – the Fyb antigen) with c.298G>A (p.Ala100Thr) in cis. The presence of the Ala100Thr variant alone in a Fy(b) background was in fact reported previously in the literature (Olsson et al, BJH 1998, 103, 1184-1191 9886340) with no reported weakening of FY expression. > Authors thank the reviewer. Mention of weak FY phenotype was deleted concerning this FY allele with c.125A and c.298G>A in cis. Olsson et al. has been added as reference. Results and Tables were modified accordingly: “FY*02 allele associated with c.298G>A (p.Ala100Thr) was found in 18 samples (Olsson et al.; 1998).” Discussion was modified as follows: “ We were able to type the JK*01W.01 allele (Jk (a+w)) (19) and the FY*02 allele associated with c.298G>A (p.Ala100Thr) (Olsson et al.; 1998).” Four novel missense variants are reported by the study. For the Dombrock blood group the authors report p.Asp265Asn as a novel change; however, this actually corresponds to the known Do(a/b) antithetical antigens. This is however, one of the instances where the human genome build reference nucleotide does not match the ISBT reference table. In hg38, the reference nucleotide in chr12: 14840505 is a C, which leads to Asp265 (ART4 is coded in the minus strand). However, for immunohematologists and in the ISBT database, Asn265 is considered the reference. The manuscript also lists p.Pro179Arg as a novel missense change in the JK blood group, however this variant is listed as part of JK*02N.13 in the most recent ISBT JK database (v6.0 01-MAR-2020 v2.0). Was this variant identified in cis with p.Asp280Asn (Jkb)? Was it identified without concomitant p.Met167Val, which is part of the definition of JK*02N.13? > Authors thank the reviewer. We made a mistake when we used genebank sequences for translation. The coordinate was checked and corrected. DO and JK new polymorphisms induce no amino-acid change. Authors apologize for these mistakes. Mutations in IN gene were double checked. They are located after the codon stop (exon 9) in IN isoform 4 described in ISTB. This information has been added to the Supporting Table S6 legend. Text and tables have been modified accordingly throughout the manuscript. The manuscript clearly reports the number of cases were heterozygosity was missed, and mentions that this was associated with low depth. It would be useful to provide the range of depth that was associated with this phenomenon, to aid in the determination of a minimum depth for this particular approach. > Authors thank the reviewer. Range of depth associated with incorrectly typed heterozygous samples was added in Supporting Table S3. Discussion has been modified accordingly: ”WGS data quality is assessed by the estimation of read depth. A former study conducted on WGS data established a minimum of 15x for RBC antigen typing in the clinical field (6, 14). Here, mean read depth of the genome was estimated at 11.8x [5.5x-18.4x] and read depth for each gene reached higher values. For each gene, typing resolution was significantly associated with the number of reads mapped on its sequence and ambiguous and incorrectly typing showed low numbers of reads corresponding to the missing allele and read depth equal to or below 15x. Our study thus confirms that RBC typing from WGS should be considered reliable with read depths strictly above 15x. To reach this goal, genome sequencing of one human (3Gb) should be analyzed with at least 45 Gb of data, here mean data was 34 Gb [16-53]”. Also, this sentence has been added in the abstract: “Our study confirms that RBC typing from WGS should be considered reliable with read depths strictly above 15x.” Minor revisions: - Page 11, line, ‘targetingall’ is missing a space. - Page 15, second paragraph, first line – the subject is singular (‘laboratory’) while the verb is plural (‘are’). > Authors thank the reviewer, typos have been corrected. Reviewer #2 Major comments 1 The authors should discuss whether the incorrect typing at heterozygous positions as well as this level of calls “not defined” is acceptable. The mean read depth of the genome was 11.8 which is not high and a reflection of the WGS approach used. Is this arguing for a need for a more targeted and efficient approach to obtain a higher read depth? > Authors agree with the reviewer, the corresponding paragraph has been modified as follows: ” WGS data quality is assessed by the estimation of read depth. A former study conducted on WGS data established a minimum of 15x for RBC antigen typing in the clinical field (6, 14). Here, mean read depth of the genome was estimated at 11.8x [5.5x-18.4x] and read depth for each gene reached higher values. For each gene, typing resolution was significantly associated with the number of reads mapped on its sequence and ambiguous and incorrectly typing showed low numbers of reads corresponding to the missing allele and read depth equal to or below 15x. Our study thus confirms that RBC typing from WGS should be considered reliable with read depths strictly above 15x. To reach this goal, genome sequencing of one human (3Gb) should be analyzed with at least 45 Gb of data, here mean data was 34 Gb [16-53].” This sentence has been added in the abstract: “Our study confirms that RBC typing from WGS should be considered reliable with read depths strictly above 15x.” 2 The approach of using the HLA software to interpret the blood group (Other than ABO) data appears novel although the study appears more a feasibility study than a validation study at this stage. > Authors agree with the reviewer, abstract and introduction have been modified accordingly. 3 Please check the bi-allelic polymorphisms listed in the Material and Methods under Blood group genotyping e.g KEL, JK , etc. as a few are now not matching the most recently revised ISBT reference tables. > Authors agree with the reviewer, ISBT Terminology has been checked. Other comments: Results Page 11 Line 210 should the number be ‘148’ not ‘146’? Please also check the ISBT terminology used e.g. use Jk (a+w ) phenotype. > Authors agree with the reviewer, the number has been corrected, also abstract, results and discussion sections have been modified accordingly: “Our results showed 93% of concordance for blood group polymorphisms and 91% for HLA-DRB1.” ISBT Terminology has been checked. Discussion First paragraph, last sentence: Please consider including an explanation for not including MNS because this was included in the study SNaPshot study. > Authors agree with the reviewer. As for RHD/RHCE system, MNS could not be included because of the existence of hybrids. The sentence has been modified accordingly: “Whereas targeted strategies, such as PCR followed by sequencing or SnaPshot, circumvent specificity issues of genes with structural changes and hybrids such as RHCE/RHD and GPA/GPB ; their analysis from WGS data requires specific bioinformatic approaches including CNV (copy number variation) analysis. Therefore, such systems were not included in this study.” The authors state in the discussion that the “results showed 99.5% of concordance for blood group polymorphisms” – please show how they arrived at this figure. > WGS analyses allow identification of 972 SNPs; 967 SNP were correctly identified (i.e. 99.5%). 93.4% of concordance was obtained when taking into account ambiguous typing. The sentence in the discussion was modified as follows: “Our results showed that blood group typing deduced from WGS were correct at 99.5% compared to SNaPshot analysis (967 SNP correctly identified out of 972 typed); 93% of concordance was obtained when taking into account ambiguous typing. In a clinical or research context however, ambiguous RBC results need to be reanalyzed. HLA-DRB1 typing from WGS showed 91% of concordance with those obtained by amplicon-based monoallelic sequencing.” Editorial Page 11 Line 189 ‘WGS-based typing targeting all…’ Space between targeting and all Same paragraph – last sentence – do the authors mean ‘confirm or refute’…? > Authors thank the reviewer, modifications have been made. The authors would like to thank reviewers for their critical review, we feel that our manuscript has been improved. We look forward to the editor and reviewers’ comments and the editorial board’s decision. Please do not hesitate to contact us if you require any further information. Yours sincerely, Submitted filename: reply to rebuttal letter PONE D2017434_092120.docx Click here for additional data file. 21 Oct 2020 PONE-D-20-17434R1 BLOOD GROUP TYPING FROM WHOLE-GENOME SEQUENCING DATA PLOS ONE Dear Dr. Di Cristofaro, Thank you for submitting your revised manuscript to PLOS ONE. It has now been examined by one of the two referees who had reviewed the original submission. Based on the new review, I am making a decision for 'major revision' and requesting you to submit a revised version of the manuscript that addresses the points raised by the reviewer. Please submit your revised manuscript by Dec 05 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Santosh K. Patnaik, MD, PhD Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: PLOS ONE-D-20-1743R1 The authors have clarified some of the queries with regard to Incorrect Typing and with regard to Blood Group Nomenclature however anomalies still remain in both regards. Abstract the Conclusion: Page 2 Line 31 – 32, The authors should reconsider the conclusion which is too general for the results presented in this study. The stated aim includes estimating the feasibility of RBC antigen typing from WGS data. Table 2 shows the authors find incorrect typing results for 5 samples and correct results for 102 samples with heterozygous calls. They also find a large number of not determined (unresolved) results with 53 homozygous samples and 10 heterozygous not resolved. The authors elsewhere show that the read depth is insufficient at 11.8 which supports previous findings that read depth needs to be above 15. The conclusion is that RBC antigen typing is feasible however improvements in read depth are needed to improve the accuracy in typing for SNV polymorphisms. The paper does show the potential for WGS in detecting other alleles, such as the weak JK alleles, with potential to be used to screen for rare donors. Nomenclature: Materials and Methods: Page 5 line 84 & 85 the bracketed list gives the internationally HUGO blood group gene names with the Blood Group System symbols in brackets. However the gene names given for FY, IN, CO DI and LW are incorrect. The gene names for these Blood Group Systems are ACKR1 (FY), CD44 (IN), AQP1 (CO), SLC4A1 (DI) and ICAM4 (LW). Reference http://www.isbtweb.org/fileadmin/user_upload/Table_of_blood_group_systems_v6.0_6th_August_2019.pdf Page 5 Line 87 to 89: After “Fourteen SNPs were analysed corresponding to bi-allelic polymorphisms” the authors show the following amino acid changes (phenotypes). The Duffy needs editing to match ISBT Tables as follows: Page 5 Line 89: The phenotype for (FY p.Arg89Cys) is Fya+w, and the phenotype for the FY promoter change (p.0) with the FYc-67T>C change is (Fy(a-b-) erythroid cells only) Presentation of data in Table 2: There are ambiguities in the presentation of data in Table 2. For example for FY the third column and fourth column indicate that the FY*01N.01 allele showed Homozygous Correct Typing for 64 cases with 8 not defined. This is generally a rare allele. The next line shows for the FY*02 allele (the alternate for the FY*01) that 32 were Homozygous. Yet there are only 79 individuals in the study and these numbers as written exceed the number in the study (64+ 32). This is not counting the 26 Heterozygous cases and 7 incorrect or undefined. The next line indicates for FY*0W.01 there are a further 61 cases homozygous (now there are 64 + 32 + 61). Again in Table 2 The authors also report 70 Homozygous for IN*01 which is indeed a very rare allele and should, if confirmed, be commented on. References Introduction Line 46: Suggest include the papers by Lane et al with (1,2) as well as in the next paragraph. Minor Edits: Introduction: Line 3: Research and CE-environment labs – suggest Research and Clinical laboratories working within regulatory approved frameworks e.g. Council of Europe (CE) . ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 26 Oct 2020 To the Editor and the Reviewer, Thank you for having considered our revised manuscript for publication to PLOS ONE. We are grateful to reviewer for her/his second review. Accordingly, we are submitting a revised version of our manuscript. Please, find in this letter responses to each point raised by the reviewer. ******* Reviewer #2 The authors have clarified some of the queries with regard to Incorrect Typing and with regard to Blood Group Nomenclature however anomalies still remain in both regards. Abstract the Conclusion:Page 2 Line 31 – 32, The authors should reconsider the conclusion which is too general for the results presented in this study. The stated aim includes estimating the feasibility of RBC antigen typing from WGS data. Table 2 shows the authors find incorrect typing results for 5 samples and correct results for 102 samples with heterozygous calls. They also find a large number of not determined (unresolved) results with 53 homozygous samples and 10 heterozygous not resolved. The authors elsewhere show that the read depth is insufficient at 11.8 which supports previous findings that read depth needs to be above 15. The conclusion is that RBC antigen typing is feasible however improvements in read depth are needed to improve the accuracy in typing for SNV polymorphisms. The paper does show the potential for WGS in detecting other alleles, such as the weak JK alleles, with potential to be used to screen for rare donors. > Authors agree with the reviewer, the abstract has been modified: “Incorrect typing and unresolved results confirm that WGS should be considered reliable with read depths strictly above 15x. Our results supported that RBC antigen typing from WGS is feasible but requires improvements in read depth for SNV polymorphisms typing accuracy. We also showed the potential for WGS in screening donors with rare blood antigens, such as weak JK alleles. The development of WGS analysis in immunogenetics laboratories would offer personalized care in the management of RBC disorders. Nomenclature: Materials and Methods: Page 5 line 84 & 85 the bracketed list gives the internationally HUGO blood group gene names with the Blood Group System symbols in brackets. However the gene names given for FY, IN, CO DI and LW are incorrect. The gene names for these Blood Group Systems are ACKR1 (FY), CD44 (IN), AQP1 (CO), SLC4A1 (DI) and ICAM4 (LW). Reference http://www.isbtweb.org/fileadmin/user_upload/Table_of_blood_group_systems_v6.0_6th_August_2019.pdf > Authors thank the reviewer, gene names have been corrected. Page 5 Line 87 to 89: After “Fourteen SNPs were analysed corresponding to bi-allelic polymorphisms” the authors show the following amino acid changes (phenotypes). The Duffy needs editing to match ISBT Tables as follows: Page 5 Line 89: The phenotype for (FY p.Arg89Cys) is Fya+w, and the phenotype for the FY promoter change (p.0) with the FYc-67T>C change is (Fy(a-b-) erythroid cells only) > Authors thank the reviewer, phenotypes have been corrected. Presentation of data in Table 2: There are ambiguities in the presentation of data in Table 2. For example for FY the third column and fourth column indicate that the FY*01N.01 allele showed Homozygous Correct Typing for 64 cases with 8 not defined. This is generally a rare allele. The next line shows for the FY*02 allele (the alternate for the FY*01) that 32 were Homozygous. Yet there are only 79 individuals in the study and these numbers as written exceed the number in the study (64+ 32). This is not counting the 26 Heterozygous cases and 7 incorrect or undefined. The next line indicates for FY*0W.01 there are a further 61 cases homozygous (now there are 64 + 32 + 61). Again in Table 2, The authors also report 70 Homozygous for IN*01 which is indeed a very rare allele and should, if confirmed, be commented on. > Authors thank the reviewer, ambiguities in table 2 was due to allele names instead of polymorphisms. No sample was typed FY*01N.01; among the 72 homozygous samples (wild type as shown in Table 1) at position ACKR1 (-67T>C), 64 samples were correctly defined and 8 samples were not defined. Among the 35 samples homozygous at position ACKR1 (125G>A), whether wild type or mutated, 32 were correctly defined and 3 were not defined. Among the 30 samples heterozygous at the same position, 26 samples were correctly typed, 1 sample was not correctly typed and 3 samples were not defined. No sample was typed IN*01. Among the 72 homozygous samples (wild type as shown in Table 1) at position CD44 (137G>C), 70 samples were correctly defined and 2 were not defined. Allele names have been replaced with polymorphisms in Table 2. References Introduction Line 46: Suggest include the papers by Lane et al with (1,2) as well as in the next paragraph. > Authors agree with the reviewer, reference has been added Minor Edits: Introduction: Line 3: Research and CE-environment labs – suggest Research and Clinical laboratories working within regulatory approved frameworks e.g. Council of Europe (CE). > Authors agree with the reviewer, text has been modified accordingly. The authors would like to thank the reviewer. We look forward to the editor and reviewer’s comments and the editorial board’s decision. Please do not hesitate to contact us if you require any further information. Yours sincerely, Submitted filename: reply to rebuttal letter PONE D2017434_R2_102620.docx Click here for additional data file. 28 Oct 2020 BLOOD GROUP TYPING FROM WHOLE-GENOME SEQUENCING DATA PONE-D-20-17434R2 Dear Dr. Di Cristofaro, Thank you for submitting the second revised version of your manuscript for our appraisal. The concerns raised by Referee #2 in the last review have been appropriately addressed with the revision. The manuscript is therefore scientifically suitable for publication and it will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Santosh K. Patnaik, MD, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 4 Nov 2020 PONE-D-20-17434R2 BLOOD GROUP TYPING FROM WHOLE-GENOME SEQUENCING DATA Dear Dr. Di Cristofaro: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Santosh K. Patnaik Academic Editor PLOS ONE

27 in total

1. Allelic genes of blood group antigens: a source of human mutations and cSNPs documented in the Blood Group Antigen Gene Mutation Database.

Authors: Olga O Blumenfeld; Santosh K Patnaik
Journal: Hum Mutat Date: 2004-01 Impact factor: 4.878

Review 2. Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Authors: S Manel; C Perrier; M Pratlong; L Abi-Rached; J Paganini; P Pontarotti; D Aurelle
Journal: Mol Ecol Date: 2015-12-17 Impact factor: 6.185

3. Characterization of Jk(a+(weak)): a new blood group phenotype associated with an altered JK*01 allele.

Authors: Elisabet S Wester; Jill R Storry; Martin L Olsson
Journal: Transfusion Date: 2011-02 Impact factor: 3.157

Review 4. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

Authors: Angela P Fuentes-Pardo; Daniel E Ruzzante
Journal: Mol Ecol Date: 2017-09-05 Impact factor: 6.185

Review 5. From next generation sequencing to now generation sequencing in forensics.

Authors: Peter de Knijff
Journal: Forensic Sci Int Genet Date: 2018-11-03 Impact factor: 4.882

6. Identification of individuals by trait prediction using whole-genome sequencing data.

Authors: Christoph Lippert; Riccardo Sabatini; M Cyrus Maher; Eun Yong Kang; Seunghak Lee; Okan Arikan; Alena Harley; Axel Bernal; Peter Garst; Victor Lavrenko; Ken Yocum; Theodore Wong; Mingfu Zhu; Wen-Yun Yang; Chris Chang; Tim Lu; Charlie W H Lee; Barry Hicks; Smriti Ramakrishnan; Haibao Tang; Chao Xie; Jason Piper; Suzanne Brewerton; Yaron Turpaz; Amalio Telenti; Rhonda K Roby; Franz J Och; J Craig Venter
Journal: Proc Natl Acad Sci U S A Date: 2017-09-05 Impact factor: 11.205

7. Genotyping and serotyping profiles showed weak Jk^a presentation for previously typed as Jk_null donors.

Authors: Ping Chun Wu; Tsui-Wei Chyan; Shu-Hui Feng; Ming-Hung Chen; Shun-Chung Pai
Journal: Vox Sang Date: 2019-02-28 Impact factor: 2.144

8. BGMUT: NCBI dbRBC database of allelic variations of genes encoding antigens of blood group systems.

Authors: Santosh Kumar Patnaik; Wolfgang Helmberg; Olga O Blumenfeld
Journal: Nucleic Acids Res Date: 2011-11-13 Impact factor: 16.971

9. High-Throughput Screening of Blood Donors for Twelve Human Platelet Antigen Systems Using Next-Generation Sequencing Reveals Detection of Rare Polymorphisms and Two Novel Protein-Changing Variants.

Authors: Stephanie Maria Vorholt; Nele Hamker; Hagen Sparka; Jürgen Enczmann; Thomas Zeiler; Tanja Reimer; Johannes Fischer; Vera Balz
Journal: Transfus Med Hemother Date: 2020-01-08 Impact factor: 3.747

10. Immune diversity sheds light on missing variation in worldwide genetic diversity panels.

Authors: Laurent Abi-Rached; Philippe Gouret; Jung-Hua Yeh; Julie Di Cristofaro; Pierre Pontarotti; Christophe Picard; Julien Paganini
Journal: PLoS One Date: 2018-10-26 Impact factor: 3.240

1 in total

Review 1. Blood Group Testing.

Authors: Hong-Yang Li; Kai Guo
Journal: Front Med (Lausanne) Date: 2022-02-11

1 in total