| Literature DB >> 31701028 |
Paschalis Natsidis1,2, Alexandros Tsakogiannis1, Pavlos Pavlidis3, Costas S Tsigenopoulos1, Tereza Manousaki1.
Abstract
Sparidae (Teleostei: Spariformes) are a family of fish constituted by approximately 150 species with high popularity and commercial value, such as porgies and seabreams. Although the phylogeny of this family has been investigated multiple times, its position among other teleost groups remains ambiguous. Most studies have used a single or few genes to decipher the phylogenetic relationships of sparids. Here, we conducted a thorough phylogenomic analysis using five recently available Sparidae gene-sets and 26 high-quality, genome-predicted teleost proteomes. Our analysis suggested that Tetraodontiformes (puffer fish, sunfish) are the closest relatives to sparids than all other groups used. By analytically comparing this result to our own previous contradicting finding, we show that this discordance is not due to different orthology assignment algorithms; on the contrary, we prove that it is caused by the increased taxon sampling of the present study, outlining the great importance of this aspect in phylogenomic analyses in general.Entities:
Keywords: Computational biology and bioinformatics; Molecular evolution; Phylogenetics; Phylogeny
Mesh:
Substances:
Year: 2019 PMID: 31701028 PMCID: PMC6825128 DOI: 10.1038/s42003-019-0654-5
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1a The two protogynous species used in this study, Pagellus erythrinus and Pagrus pagrus. The other three Sparidae used in this study are gonochoristic (Dentex dentex), obligatory protandrous (Sparus aurata) and rudimentary protandrous (Diplodus puntazzo). Image copyrights: Alexandros Tsakogiannis. b–d The main workflow divided into three main components: Taxon sampling and quality assessment, Orthology assignment and MSA, Phylogenomics analysis respectively
Preprocessing of the four Sparidae transcriptomes
| Species | Transcripts in assemblies | Total number of ORFs found with length >50 a.a. | Transcripts with at least one ORF (% of transcripts with ORF) | Number of coding genes used in the final analysis |
|---|---|---|---|---|
|
| 129,012 | 1,272,493 | 113,208 (87.75%) | 83,527 |
|
| 118,258 | 1,285,298 | 113,684 (96.13%) | 78,451 |
|
| 141,309 | 1,416,980 | 129,523 (91.66%) | 89,124 |
|
| 98,012 | 1,264,706 | 91,787 (93.64%) | 62,116 |
For each transcriptome we present the number of sequences contained, the number of open reading frames (ORF) found, the number of transcripts with at least one ORF and the final proteome included in the analysis after keeping the longest ORF per gene
List of species included in the phylogenomic analysis
| Species | Series (for Percomorphaceae) | Source | Reference | #of proteins |
|---|---|---|---|---|
|
| (Ostariophysi) | Ensembl database |
[ | 22,998 |
|
| Gobiaria | NCBI ftp server |
[ | 21,541 |
|
| Anabantaria | GigaDB |
[ | 20,541 |
|
| Carangaria | NCBI ftp server |
[ | 24,489 |
|
| (Ostariophysi) | Ensembl database |
[ | 25,644 |
|
| Eupercaria | in-house sequenced | PRJNA481721 | 83,527 |
|
| Eupercaria | species database |
[ | 26,719 |
|
| Eupercaria | in-house sequenced |
[ | 78,451 |
|
| (Paracanthopterygii) | Ensembl database |
[ | 19,978 |
|
| Eupercaria | Ensembl database |
[ | 20,625 |
|
| Syngnatharia | GigaDB |
[ | 20,788 |
|
| Ovalentaria | NCBI ftp server |
[ | 25,257 |
|
| Eupercaria | NCBI ftp server |
[ | 28.009 |
|
| Carangaria | NCBI ftp server |
[ | 22,221 |
|
| (Holostei) | Ensembl database |
[ | 18,304 |
|
| Eupercaria | GigaDB |
[ | 26,539 |
|
| Eupercaria | GigaDB |
[ | 19,605 |
|
| Anabantaria | NCBI ftp server |
[ | 24,943 |
|
| Eupercaria | NCBI ftp server |
[ | 25,937 |
|
| Ovalentaria | Ensembl database |
[ | 21,383 |
|
| Ovalentaria | Ensembl database |
[ | 19,603 |
|
| Eupercaria | in-house sequenced |
[ | 89,124 |
|
| Eupercaria | in-house sequenced |
[ | 62,116 |
|
| Eupercaria | provided by authors |
[ | 32,713 |
|
| Ovalentaria | Ensembl database |
[ | 23,315 |
|
| Carangaria | NCBI ftp server | Araki et al., unpublished | 24,000 |
|
| Eupercaria | in-house sequenced |
[ | 61,850 |
|
| Eupercaria | Ensembl database |
[ | 18,433 |
|
| Eupercaria | Ensembl database |
[ | 19,511 |
|
| Pelagiaria | species database |
[ | 26,433 |
|
| Ovalentaria | Ensembl database |
[ | 20,343 |
For each species we indicate the series (or another distinct taxonomic group for the non-Percomorphaceae) they belong to, the sources of the proteomes used, the reference paper and the number of the protein sequences contained in each proteome
Fig. 2Quality assessment using BUSCO. The five Sparidae proteomes are shown in the top five bars
Comparison of the two orthology inference tools and the respective superalignments
| Software | Groups of orthologs returned | Single-copy groups with at least 27 taxa | Average aligned group length (a.a.) | Concatenated alignment length (a.a.) | Filtered alignment length (a.a.) |
|---|---|---|---|---|---|
| OrthoFinder | 45,730 | 793 | 591.06 | 468,718 | 231,078 |
| PorthoMCL | 42,693 | 533 | 603.56 | 321,695 | 141,608 |
OrthoFinder provided greater number of orthogroups than PorthoMCL both initially and after filtering for 1-1 groups with representation from at least 27 species
Fig. 3Maximum likelihood (RAxML) tree of 793 concatenated OrthoFinder groups using JTT + F + Γ model and 100 bootstrap replicates. The spotted gar (L. oculatus) was used as an outgroup
Comparison of the topology presented here, with Tetraodontiformes as closest group to Sparidae, and the topology suggested in[25], with croaker and seabass as closest group to Sparidae, using CONSEL
| Tree | obs | au | np | bp | kh | sh | wkh | wsh |
|---|---|---|---|---|---|---|---|---|
| OrthoFinder | ||||||||
| Nats | −558.7 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Paul | 558.7 | 4e-07 | 2e-06 | 0 | 0 | 0 | 0 | 0 |
| PorthoMCL | ||||||||
| Nats | −345.8 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Paul | 345.8 | 1e-50 | 2e-17 | 0 | 0 | 0 | 0 | 0 |
The table shows the p-values of various statistical tests. We may reject the possibility that a topology is the most likely to be the true when au < 0.05 at the significance level 0.05[55]. Nats: present study; Paul[25], in press
obs observed log-likelihood difference, au approximately unbiased test, np multiscale bootstrap probability, bp usual bootstrap probability, kh Kishino-Hasegawa test, sh Shimodaira-Hasegawa test, wkh weighted Kishino-Hasegawa test, wsh weighted Shimodaira-Hasegawa test