| Literature DB >> 30486781 |
Bette J Hecox-Lea1,2, David B Mark Welch3.
Abstract
BACKGROUND: Bdelloid rotifers are the oldest, most diverse and successful animal taxon for which males, hermaphrodites, and traditional meiosis are unknown. Their degenerate tetraploid genome, with 2-4 copies of most loci, includes thousands of genes acquired from all domains of life by horizontal transfer. Many bdelloid species thrive in ephemerally aquatic habitats by surviving desiccation at any life stage with no loss of fecundity or lifespan. Their unique genomic diversity and the intense selective pressure of desiccation provide an exceptional opportunity to study the evolution of diversity and novelty in genes involved in DNA repair.Entities:
Keywords: APLF; AlkD; Blm; Fpg; Ku 70/80; Ligase K; NHEJ; Polymerase lambda; UVDE; XRCC4
Mesh:
Substances:
Year: 2018 PMID: 30486781 PMCID: PMC6264785 DOI: 10.1186/s12862-018-1288-9
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fig. 1Gene Copy Number per Gene by DDR Category. Metazoan genes are indicated with open circles; non-metazoan genes by solid green diamonds. DR: Direct Reversal; BER: Base Excision Repair; NER: Nucleotide Excision Repair; AER: Alternate Excision Repair; MMR: Mismatch Repair; HR: Homologous Recombination; NHEJ: Non-homologous End Joining; TLS: Translesion Synthesis; Pol: Polymerase
Major Conserved Metazoan DDR Genes
| Gene | Gene Description | KO | # | Cel | Dme | Hsa |
|---|---|---|---|---|---|---|
| Direct Reversal (DR) | ||||||
| PHRB | deoxyribodipyrimidine photo lyase | K01669 | 4 | X | ||
| ALKBH1 | alkylated DNA repair protein | K10765 | 1 | X | X | X |
| MGMT | alkylated DNA repair protein | K00567 | 1 | X | X | X |
| Base Excision Repair (BER) | ||||||
| FPG | formamidopyrimidine-DNA glycosylase | K10563 | 2 | |||
| NEIL1/2/3 | endonuclease VIII like 1,2, or 3 | K10567/8/9 | 0 | X | ||
| MPG | DNA-3-methyladenine glycosylase | K03652 | 0 | X | ||
| OGG1 | 8-oxoguanine DNA glycosylase | K03660 | 3 | X | X | |
| AlkD | DNA alkylation repair enzyme | K00000 | 2 | |||
| MUTYH | mutY homolog | K03575 | 2 | X | ||
| SMUG1 | ss-selective monofunctional uracil DNA glycosylase | K10800 | 0 | X | ||
| MBD4 | methyl-CpG-binding domain protein 4 | K10801 | 0 | X | ||
| NTHL1 | nth endonuclease III-like 1 | K10773 | 1 | X | X | X |
| TDG/MUG | TDG/ MUG DNA glycosylase family protein | K20813 | 4 | X | X | |
| UNG1 | mitochondrial uracil DNA glycosylase | K03648 | 2 | X | X | |
| UNG2 | nuclear uracil DNA glycosylase | K03648 | 4 | X | X | |
| APEX1 | AP endonuclease 1 | K10771 | 4 | X | X | X |
| APEX2 | AP endonuclease 2 | K10772 | 2 | X | ||
| PARP1 | poly (ADP-ribose) polymerase family, member 1 | K10798 | 4 | X | X | X |
| XRCC1 | X-ray repair complementing defective repair in Chinese hamster cells 1 | K10803 | 2 | X | X | |
| TDP1 | tyrosyl-DNA phosphodiesterase 1 | K10862 | 2 | X | X | |
| FEN1 | flap endonuclease-1 | K04799 | 3 | X | X | X |
| POLB | DNA polymerase beta | K02330 | 2 | X | ||
| LIG3 | DNA ligase 3 | K10776 | 2 | X | ||
| Nucleotide Excision Repair (NER) | ||||||
| RBX1 | RING-box protein 1 | K03868 | 4 | X | X | X |
| CUL4 | cullin 4 | K10609 | 4 | X | X | X |
| DDB1, XPE | xeroderma pigmentosum group E-complementing protein | K10610 | 3 | X | X | X |
| DDB2 | DNA damage binding protein 2 | K10140 | 2 | X | ||
| CSA, ERCC8 | excision repair cross-complementing rodent repair deficiency, complementation group 8 | K10570 | 2 | X | X | X |
| XPC | xeroderma pigmentosum group C-complementing protein | K10838 | 2 | X | X | X |
| RAD23 | UV excision repair protein RAD23 | K10839 | 4 | X | X | X |
| CETN2 | centrin-2 | K10840 | 4 | X | ||
| XPA | xeroderma pigmentosum group A-complementing protein | K10847 | 2 | X | X | X |
| ERCC1 | excision repair cross-complementing rodent repair deficiency, complementation group 1 | K10849 | 2 | X | X | X |
| ERCC2, XPD | xeroderma pigmentosum group D-complementing protein | K10844 | 2 | X | X | X |
| ERCC3, XPB | xeroderma pigmentosum group B-complementing protein | K10843 | 3 | X | X | X |
| ERCC4, XPF | xeroderma pigmentosum group F-complementing protein | K10848 | 2 | X | X | X |
| ERCC5, XPG | xeroderma pigmentosum group G-complementing protein | K10846 | 2 | X | X | X |
| ERCC6, CSB | excision repair cross-complementing rodent repair deficiency, complementation group 6 | K10841 | 2 | X | X | |
| TFIIH1 | transcription initiation factor TFIIH subunit 1 | K03141 | 1 | X | X | X |
| TFIIH2 | transcription initiation factor TFIIH subunit 2 | K03142 | 2 | X | X | X |
| TFIIH3 | transcription initiation factor TFIIH subunit 3 | K03143 | 1 | X | X | X |
| TFIIH4 | transcription initiation factor TFIIH subunit 4 | K03144 | 1 | X | X | X |
| MMS19 | DNA repair/transcription protein MET18/MMS19 | K15075 | 1 | X | X | X |
| RPB1 | DNA-directed RNA polymerase II subunit RPB1 | K03006 | 4 | X | X | X |
| RPB2 | DNA-directed RNA polymerase II subunit RPB2 | K03010 | 2 | X | X | X |
| RPB3 | DNA-directed RNA polymerase II subunit RPB3 | K03011 | 1 | X | X | X |
| RPB4 | DNA-directed RNA polymerase II subunit RPB4 | K03012 | 2 | X | X | X |
| RPB5 | DNA-directed RNA polymerases I, II, and III subunit RPABC1 | K03013 | 4 | X | X | X |
| RPB6 | DNA-directed RNA polymerases I, II, and III subunit RPABC2 | K03014 | 4 | X | X | X |
| RPB7 | DNA-directed RNA polymerase II subunit RPB7 | K03015 | 2 | X | X | X |
| RPB8 | DNA-directed RNA polymerases I, II, and III subunit RPABC3 | K03016 | 2 | X | X | X |
| RPB11 | DNA-directed RNA polymerase II subunit RPB11 | K03008 | 2 | X | X | X |
| RPB12 | DNA-directed RNA polymerases I, II, and III subunit RPABC4 | K03009 | 2 | X | X | X |
| Alternative Excision Repair (AER) | ||||||
| UVDE | UV DNA damage endonuclease | K13281 | 2 | |||
| SMC6 | structural maintenance of chromosomes 6 | K22804 | 4 | X | X | |
| RAD51 | DNA repair protein RAD51 | K04482 | 4 | X | X | X |
| RAD54 | DNA repair and recombination protein RAD54 and RAD54-like protein | K10875 | 2 | X | X | X |
| RAD54L2 | RAD54-like protein 2 | K10876 | 3 | X | X | X |
| EXO1 | exonuclease 1 | K10746 | 2 | X | X | X |
| FEN1 | flap endonuclease-1 | K04799 | 3 | X | X | X |
| Cross Pathway | ||||||
| LIG1 | DNA ligase 1 | K10747 | 2 | X | X | X |
| LIGK | DNA ligase (ATP) | K00000 | 4 | |||
| PCNA | proliferating cell nuclear antigen | K04802 | 4 | X | X | X |
| PNKP | bifunctional polynucleotide phosphatase/kinase | K08073 | 4 | X | X | X |
| HMGB1 | high mobility group protein B1 | K10802 | 2 | X | X | |
| RFC1 | replication factor C subunit 1 | K10754 | 4 | X | X | X |
| RFC2 | replication factor C subunit 2 of RFC2_4 | K10755 | 4 | X | X | X |
| RFC4 | replication factor C subunit 4 of RFC2_4 | K10755 | 2 | X | X | X |
| RFC3_5 | replication factor C subunit 3_5 | K10756 | 2 | X | X | X |
| RPA1 | replication protein A1 | K07466 | 4 | X | X | X |
| RPA2 | replication protein A2 | K10739 | 3 | X | X | X |
| RPA3 | replication protein A3 | K10740 | 2 | X | X | |
| Mismatch Repair (MMR) | ||||||
| MSH2 | DNA mismatch repair protein MSH2 | K08735 | 1 | X | X | X |
| MSH3 | DNA mismatch repair protein MSH3 | K08736 | 0 | X | ||
| MSH6 | DNA mismatch repair protein MSH6 | K08737 | 2 | X | X | X |
| MSH4 | DNA mismatch repair protein MSH4, canonically meiotic | K08740 | 2 | X | X | X |
| MSH5 | DNA mismatch repair protein MSH5, canonically meiotic | K08741 | 2 | X | X | X |
| PMS2 | DNA mismatch repair protein PMS2 | K10858 | 2 | X | X | X |
| MLH1 | DNA mismatch repair protein MLH1 | K08734 | 2 | X | X | X |
| MLH3 | DNA mismatch repair protein MLH3 | K08739 | 0 | X | ||
| EXO1 | exonuclease 1 | K10746 | 2 | X | X | X |
| Homologous Recombination (HR) | ||||||
| RAD50 | DNA repair protein RAD50 | K10866 | 4 | X | X | X |
| MRE11 | double-strand break repair protein MRE11 | K10865 | 2 | X | X | X |
| NBS1 | nibrin | K10867 | 0 | X | X | |
| ATM | ataxia telangectasia mutated family protein | K04728 | 2 | X | X | X |
| RAD51 | DNA repair protein RAD51 | K04482 | 4 | X | X | X |
| RAD51L1 | RAD51-like protein 1 | K10869 | 2 | X | ||
| RAD51L2 | RAD51-like protein 2 | K10870 | 2 | X | X | |
| RAD52 | DNA repair protein RAD52 | K10873 | 0 | X | ||
| BRCA2 | breast cancer 2 susceptibility protein | K08775 | 0 | X | ||
| RAD54 | DNA repair and recombination protein RAD54 and RAD54-like protein | K10875 | 2 | X | X | X |
| RAD54L2 | RAD54-like protein 2 | K10876 | 3 | X | X | X |
| EME1 | crossover junction endonuclease EME1 | K10882 | 1 | X | X | |
| MUS81 | crossover junction endonuclease MUS81 | K08991 | 2 | X | X | X |
| RECQ1 | ATP-dependent DNA helicase Q1 | K10899 | 4 | X | X | |
| BLM | Bloom’s syndrome DNA helicase | K10901 | 5 | X | X | X |
| RECQL5 | ATP-dependent DNA helicase Q5 | K10902 | 2 | X | X | X |
| RMI1 | RecQ-mediated genome instability protein 1 | K10990 | 1 | X | X | |
| TOP3A | DNA topoisomerase 3 alpha | K03165 | 2 | X | X | X |
| TOP3B | DNA topoisomerase 3 beta | K03165 | 2 | X | X | X |
| SLX1 | structure specific endonuclease subunit SLX1 | K15078 | 2 | X | X | X |
| SLX4 | structure-specific endonuclease subunit SLX4 | K10484 | 2 | X | ||
| GEN1 | Gen homolog 1, endonuclease | K15338 | 2 | X | X | X |
| Non-Homologous End Joining (NHEJ) | ||||||
| KU70 | ATP-dependent DNA helicase 2 subunit 1 | K10884 | 4 | X | X | X |
| KU80 | ATP-dependent DNA helicase 2 subunit 2 | K10885 | 4 | X | X | X |
| DNAPKcs | DNA-dependent protein kinase catalytic subunit | K06642 | 3 | X | ||
| ARTEMIS | DNA cross-link repair 1C protein | K10887 | 4 | X | ||
| APTX | aprataxin | K10863 | 2 | X | X | |
| APLF | aprataxin and PNK-like factor | K13295 | 8 | X | X | |
| POLL | DNA polymerase lambda | K03512 | 6 | X | ||
| XLF | non-homologous end-joining factor 1 | K10980 | 2 | X | ||
| XRCC4 | DNA-repair protein XRCC4 | K10886 | 3 | X | ||
| LIG4 | DNA ligase 4 | K10777 | 1 | X | ||
| Replicative Polymerases (Pol) | ||||||
| POLA1 | DNA polymerase alpha subunit A | K02320 | 2 | X | X | X |
| POLA2 | DNA polymerase alpha subunit B | K02321 | 2 | X | X | X |
| POLD1 | DNA polymerase delta subunit 1 | K02327 | 2 | X | X | X |
| POLD2 | DNA polymerase delta subunit 2 | K02328 | 2 | X | X | X |
| POLE1 | DNA polymerase epsilon subunit 1 | K02324 | 2 | X | X | X |
| POLE2 | DNA polymerase epsilon subunit 2 | K02325 | 2 | X | X | X |
| POLG1 | DNA polymerase gamma 1 | K02332 | 2 | X | X | X |
| POLG2 | DNA polymerase gamma 2 | K02333 | 2 | X | X | |
| Translesion Synthesis (TLS) Polymerases (Pol) | ||||||
| POLH | DNA polymerase eta | K03509 | 2 | X | X | X |
| POLK | DNA polymerase kappa | K03511 | 2 | X | X | |
| POLQ | DNA polymerase theta | K02349 | 1 | X | X | X |
| REV1 | DNA polymerase zeta, Rev1 subunit | K03515 | 2 | X | X | X |
| REV7 | DNA polymerase zeta, Rev7 subunit | K03508 | 2 | X | ||
| REV3L | DNA polymerase zeta, Rev3-like subunit | K02350 | 2 | X | X | X |
Genes are classified into the ten categories shown in Fig. 1; some genes appear in more than one category. See Additional file 1 for specific A. vaga locus identifiers and other details. KO: KEGG Ontology accession; #: copy number in A. vaga; Cel: C. elegans; Dme: D. melanogaster; Hsa: H. sapiens. An “X” indicates an ortholog is present in the KEGG Orthology Database for the indicated species
Fig. 2Fpg. a Phylogeny of Fpg and Nei genes; Adineta vaga Fpg is in red. Clades with greater than 70% RAxML bootstrap support or 90% MrBayes posterior probability are marked with red and blue asterisks, respectively. Complete trees and accession numbers and species names of OTUs are available in Additional file 2. b The 8-oxoG capping loop region of Fpg (DNA shown in black). Left, A. vaga (AvFpg) in bronze threaded onto Geobacillus stearothermophilus Fpg (BstFpg, from Bacillus basonym, PDB 1R2Y) in blue. The BstFpg αF-β9/10 loop (purple) extends down to cover and trap the 8-oxoG, but AvFpg αF-β9/10 loop (red), is predicted to be too short to fully cover an 8-oxoG in the binding pocket. Right, Arabidopsis thaliana Fpg (AthFpg, PDB: 3TWK) in bronze overlaying BstFpg in blue, as shown in [43]. Here, the much shorter AthFpg αF-β9/10 loop (orange) cannot trap 8-oxoG in the binding pocket as the BstFpg loop (purple) can
Fig. 3Phylogeny of UVDE. Adineta vaga and Trichuris ssp. Fpg are in red. Clades with greater than 70% RAxML bootstrap support or 90% MrBayes posterior probability are marked with red and blue asterisks, respectively. Complete trees and species names and accession numbers of OTUs are available in Additional file 2
Fig. 4Phylogeny of AlkD. Adineta vaga AlkD is in red, well-separated from the clade of metazoan sequences, shaded in orange. Clades with greater than 70% RAxML bootstrap support or 90% MrBayes posterior probability are marked with red and blue asterisks, respectively. Complete trees and species names accession numbers of OTUs are available in Additional file 2
Fig. 5Ligase K. a Simplified phylogenetic tree of Ligase K with Ligase III as an outgroup; Adineta vaga Ligase K is in red. Clades with greater than 70% RAxML bootstrap support or 90% MrBayes posterior probability are marked with red and blue asterisks, respectively. Complete trees and species names and accession numbers of OTUs are available in Additional file 2. b Domain models of A. vaga Ligase K copies A1 and B1, with domain models of Ligase K peptides from other species (Mortierella verticillata KFH62561.1, Rhizophagus irregularis ESA15105.1, Capitella teleta ELT89513.1, Salpingoeca rosetta XP_004993722.1, Lottia gigantean XP_009061413.1, Aplysia californica XP_005100834.1, Blastopirellula marina WP_002650560.1, Tetrahymena thermophila XP_001011861.1). c) Comparison of A. vaga Ligase K PBZ domains with sequence logo of the Pfam model. d, e Dotplots generated with EMBOSS dotmatcher of copies B1 vs A1 and B1 to itself; lines along the diagonal indicate regions of similarity between the compares sequences. f Differential expression of A. vaga Ligase K ohnologs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1
Fig. 6Bloom helicases. a Domain models of a Bloom helicase from Homo sapiens (XP_011520183) and Bloom-like helicases from Dictyostelium purpureum (XP_003287311.1, “hypothetical protein”), Triticum monococcum (AGH18689.1, “PHD-finger family protein”), Symbiodinium microadriaticum (OLP97093.1, “ATP-dependent RNA helicase DHH1”), Stentor coeruleus (OMJ79001.1, “hypothetical protein”), Daldinia sp. (OTB17292.1, “hypothetical protein”). b Domain models of the five copes of the Blm-like helicase from A. vaga. c Sliding window analysis of nonsynonymous (Ka) difference (solid line) and ratio of nonsynonymous to synonymous differences (Ka/Ks, dashed line) between copies C1 and C2. d Differential expression of A. vaga Blm ohnologs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1
Fig. 7Ku70 and Ku80. a Domain model of A. vaga Ku70 A and B ohnologs, with sliding window analysis of nonsynonymous (Ka) difference (solid line) and ratio of nonsynonymous to synonymous differences (Ka/Ks, dashed line) between AvKu70A1 and B1. The alignment on the upper left shows the region where Ka/Ks > 1 near the N-terminus; the alignment to the lower right shows the SAP domains compared to human Ku70. Predicted sumoylation sites are in red, predicted acetylation sites are highlighted in blue. b Domain model of A. vaga Ku80 A and B ohnologs, with sliding window analysis of Ka and Ka/Ks. The alignment in the upper left shows the Q3E4Q8 track at the terminus of the α/β domain present in copy A and not in B. c Crystal structure PDB 1JEY, human Ku70 (yellow) Ku80 (red) heterodimer complexed with DNA (grey). d Three views of the superposition of the predicted structure of A. vaga Ku70A1 (purple) and A. vaga Ku70B1 (green). The N terminal region under putative positive selection and the SAP domain are indicated in red and blue for Ku70 A1 and B1, respectively. First orientation is the same as in (c), second is an elevated view (45° rotation along horizontal axis), third is a top view (90° rotation along horizontal axis). e Superposition of the predicted structures of A. vaga Ku80A1 (blue) and Ku80B1 (copper) in the same orientation as in (c). f Differential expression of A. vaga Ku70 A and B and Ku80 A and B ohnologs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1
Fig. 8DNAPKcs. Domain model of A. vaga DNAPKcs A (top) and DNAPKcs B (bottom) ohnologs with sliding window analysis of nonsynonymous difference (Ka, solid line) and ratio of nonsynonymous to synonymous differences (Ka/Ks, dashed line) between A1 and B1
Fig. 9Artemis. a Domain models of A. vaga Artemis A (top) and B (bottom) ohnologs showing predicted DNAPKcs phosphorylation sites (blue circles indicate sites conserved between A and B peptides, yellow circles indicate unique sites on each peptide) with sliding window analysis of nonsynonymous difference (Ka, solid line) and ratio of nonsynonymous to synonymous differences (Ka/Ks, dashed line) between A1 and B1. b Differential expression of A. vaga Artemis A and B ohnologs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1. c Normalized read counts of the two ohnologs under hydrated, entering, and recovering conditions
Fig. 10XRCC4. a Domain models of AvXRCC4 A (top) and B (bottom) ohnologs showing the three regions of XRCC4, the total charge of each region, and predicted DNAPKcs phosphorylation sites (blue circles indicate sites conserved between A and B peptides, yellow circles indicate unique sites on each peptide) with sliding window analysis of nonsynonymous difference (Ka, solid line) and ratio of nonsynonymous to synonymous differences (Ka/Ks, dashed line) between copies A1 and B1. b Superposition of the predicted structures of A. vaga XRCC4A1 (blue) and XRCC4B1 (copper) showing the conserved structure of the N-terminal head region, the central helix with Ligase 4 binding regions shown in purple (XRCC4A1) and red (XRCC4B1), and the poorly conserved C terminus with 22 residues present in A1 but not in B1 shown in magenta. c Alignment of the Ligase 4 binding region in A1, B1, and human XRCC4; colons (:) indicate residues involved in binding [116]. d Differential expression of A. vaga XRCC4 A and B ohnologs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1
Fig. 11Ohnologs of Polλ. a Domain structure of the three types of polymerase λ in A. vaga. Boundaries of domains defined by hmmscan of PfamA are shown above (start) and below (end); the disordered Ser/Pro-rich region (SP) is not a defined domain. The three residues that make up the phosphate binding pocket in the 8kD domain are shown (RRK or RSK). The position of residues encoded by codons determined to be under positive selection in the lineage leading to AvPolA are shown above the A. vaga A structure clustered by diamonds for each domain. b Secondary structure of the BRCT domain in polymerase λ as determined by Phyre. Beta sheets are shown as blue arrows, alpha helices as pink cylinders. The region identified as the Pfam domain BRCT_2 by hmmscan is shown with domain-specific expectation value; positions of structure boundaries outside of the predicted BRCT domain are indicated. c Alignment of the disordered SP region in A. vaga copies of polymerase λ. Numbering is to AvPolLA1. Serine and proline residues are highlighted in blue and yellow, respectively. d Unrooted gene tree of the six copies of Pol λ in A. vaga and three additional rotifer species used for codeml tests of selection. 1, Seison sp.; 2, Brachionus manjavacas; 3, Brachionus calyciflorus. e Differential expression of the three paralogs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1
Fig. 12Ohnologs of APLF. a Domains and interactions of H. sapiens APLF assigned using Uniprot Q81W19 as template with domain annotation and function refined with reference to [60,83,84,93,119]. b Domain models and phylogenetic relationship of the two pairs of A. vaga APLF ohnologs, with cladograms showing the relationship of gene copies and ohnologs. c Alignment of the tandem PBZ domains. Each PBZ domain has a conserved C(M/P)Y and CRY motif, highlighted in aqua along with nearby conserved residues, and these form a basic, hydrophobic pocket for ADP-ribose binding. APLF binds multiple ADP-ribose residues within PAR, and Y381/Y386 and Y423/Y428 are critical for interactions with the adenine ring, and R387/R429 coordinate the interactions with the phosphates [91]. All are marked with (*). The Y423F difference in D is found in some other metazoans. The C and D ohnologs substitute Q for P in the first PBZ domain, which would not be expected to maintain the characteristics of the basic, hydrophobic binding pocket. Both also terminate before the final H of the second PBZ domain, which undoubtedly alters the domain’s binding properties. d Alignment of the Ku80 Binding Domain (KBD) regions. Copies A, C and D retain R184 and W189, the residues found critical for Ku binding in mammals [117]; copy A lacks one of the conserved positively charged residues found in most KBD domains. e Differential expression of all four paralogs entering and recovering from desiccation, compared to hydrated controls. Values are log2 fold change of normalized counts, significance test values are listed in Additional file 1