| Literature DB >> 36035270 |
Andrea A Cabrera1, Alba Rey-Iglesia1, Marie Louis1,2, Mikkel Skovrind1, Michael V Westbury1, Eline D Lorenzen1.
Abstract
Accurate sex identification is crucial for elucidating the biology of a species. In the absence of directly observable sexual characteristics, sex identification of wild fauna can be challenging, if not impossible. Molecular sexing offers a powerful alternative to morphological sexing approaches. Here, we present SeXY, a novel sex-identification pipeline, for very low-coverage shotgun sequencing data from a single individual. SeXY was designed to utilize low-effort screening data for sex identification and does not require a conspecific sex-chromosome assembly as reference. We assess the accuracy of our pipeline to data quantity by downsampling sequencing data from 100,000 to 1000 mapped reads and to reference genome selection by mapping to a variety of reference genomes of various qualities and phylogenetic distance. We show that our method is 100% accurate when mapping to a high-quality (highly contiguous N50 > 30 Mb) conspecific genome, even down to 1000 mapped reads. For lower-quality reference assemblies (N50 < 30 Mb), our method is 100% accurate with 50,000 mapped reads, regardless of reference assembly quality or phylogenetic distance. The SeXY pipeline provides several advantages over previously implemented methods; SeXY (i) requires sequencing data from only a single individual, (ii) does not require assembled conspecific sex chromosomes, or even a conspecific reference assembly, (iii) takes into account variation in coverage across the genome, and (iv) is accurate with only 1000 mapped reads in many cases.Entities:
Keywords: bioinformatics; low‐coverage; molecular sexing; sex assessment; sex chromosome
Year: 2022 PMID: 36035270 PMCID: PMC9405501 DOI: 10.1002/ece3.9185
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 3.167
FIGURE 2Sex determination of beluga and polar bear individuals using four reference genome assemblies (RefGEN), one combination of reference sex‐chromosome assembly (RefX and RefY) for each target species, and various numbers of mapped reads. The ten beluga and ten polar bear individuals tested both comprised five females (red) and five males (blue). X axis shows number of mapped reads (square) and average number of raw reads necessary to obtain the required number of mapped reads (triangle). Y axis shows comparison of X chromosome and autosome coverage (X:A ratio) for each combination of RefGEN, RefX and RefY (CowX and HumanY, DogX and DogY), and number of mapped reads. Individuals were determined as females if their X:A ratio was ≥0.8, and as males if their X:A ratio was ≤0.7. Grey shaded horizontal bars indicate an X:A ratio of 0.7–0.8, which we interpreted as undetermined sex.
FIGURE 1Schematic representation of the data sets and reference assemblies (RefGEN, RefX, RefY) analyzed for the two target species: beluga and polar bear. Each branch of the flowchart shows the evaluated combination of (a) reference genome assembly (RefGEN) used as mapping reference for the raw reads of each target species, (b) number of mapped reads of the target species (representing six independent data sets), and (c) reference sex‐chromosome assembly (RefX and RefY) used to identify the sex‐linked scaffolds (synteny). Total number of evaluated data sets per branch of the flow chart is shown at the bottom of the figure.
Summary table showing percentage of correct sex determination across tested combinations of reference genome assembly (RefGEN), reference sex‐chromosome assembly (RefX and RefY), and number of mapped reads. Results are shown for the beluga data and the cetacean/cow RefGEN assemblies tested (left columns) and for the polar bear data and the bear/dog RefGEN assemblies tested (right columns). The value below each RefGEN indicates the assembly N50. For cells with two estimates, the left value indicates estimates including both incorrectly determined and undetermined sex, and the right value indicates estimates including incorrectly determined sex only (excluding undetermined sex). Only one value is included if both estimates were the same. Percentages in each cell are based on 10 sample individuals: five females and five males. Sex determination for each indvidual was calculated using the average value of 10 replicates. Individuals were determined as females if their X:A ratio was ≥ 0.8, and as males if their X:A ratio was ≤0.7. We interpreted an X:A ratio of 0.7–0.8 as undetermined sex. Corresponding summary table for tests using HumanX and HumanY as RefX and RefY, respectively, is provided in Table S7.
| Number of mapped reads | Beluga | Polar bear | ||||||
|---|---|---|---|---|---|---|---|---|
| Beluga v1 | Beluga v3 | Orca | Cow | Polar bear v1 | Polar bear v1 HiC | Panda | Dog | |
| 161 kb | 31 Mb | 13 Mb | 103 Mb | 16 Mb | 71 Mb | 129 Mb | 64 Mb | |
| CowX and HumanY | DogX and DogY | |||||||
| 100,000 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| 50,000 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| 10,000 | 50 | 100 | 90/100 | 100 | 100 | 100 | 100 | 100 |
| 5000 | 50/56 | 90/100 | 80/89 | 100 | 100 | 100 | 100 | 100 |
| 2500 | 100 | 100 | 50/63 | 80/100 | 100 | 100 | 90/100 | 100 |
| 1000 | 50 | 80/100 | 60 | 80/89 | 70 | 100 | 80/89 | 100 |