Literature DB >> 29706541

The Egyptian Rousette Genome Reveals Unexpected Features of Bat Antiviral Immunity.

Stephanie S Pavlovich¹, Sean P Lovett², Galina Koroleva², Jonathan C Guito³, Catherine E Arnold², Elyse R Nagle², Kirsten Kulcsar², Albert Lee⁴, Françoise Thibaud-Nissen⁵, Adam J Hume⁶, Elke Mühlberger⁶, Luke S Uebelhoer³, Jonathan S Towner³, Raul Rabadan⁴, Mariano Sanchez-Lockhart⁷, Thomas B Kepler⁸, Gustavo Palacios⁹.

Abstract

Bats harbor many viruses asymptomatically, including several notorious for causing extreme virulence in humans. To identify differences between antiviral mechanisms in humans and bats, we sequenced, assembled, and analyzed the genome of Rousettus aegyptiacus, a natural reservoir of Marburg virus and the only known reservoir for any filovirus. We found an expanded and diversified KLRC/KLRD family of natural killer cell receptors, MHC class I genes, and type I interferons, which dramatically differ from their functional counterparts in other mammals. Such concerted evolution of key components of bat immunity is strongly suggestive of novel modes of antiviral defense. An evaluation of the theoretical function of these genes suggests that an inhibitory immune state may exist in bats. Based on our findings, we hypothesize that tolerance of viral infection, rather than enhanced potency of antiviral defenses, may be a key mechanism by which bats asymptomatically host viruses that are pathogenic in humans.

Entities: Chemical

Keywords: Chiroptera; antiviral immunity; filovirus; genome; innate immunity; natural killer cell receptors; type I interferon

Mesh：

Substances：

Year: 2018 PMID： 29706541 PMCID： PMC7112298 DOI： 10.1016/j.cell.2018.03.070

Source DB: PubMed Journal: Cell ISSN： 0092-8674 Impact factor: 66.850

Introduction

Bats, members of the large, diverse order Chiroptera, appear to harbor significantly more zoonotic viruses than other mammals and do so without overt pathology (Calisher et al., 2006, Olival et al., 2017). Such asymptomatic infection is especially noteworthy in the case of human pathogens such as henipaviruses (Nipah and Hendra viruses), coronaviruses (severe acute respiratory syndrome [SARS] and Middle East respiratory syndrome [MERS] coronaviruses), and filoviruses (Marburg virus [MARV]), which cause severe, and often lethal, systemic disease in humans and non-human primates (Calisher et al., 2006, Smith and Wang, 2013). This stark difference between bats and primates has motivated efforts to deeply characterize the genes involved in the immune system of bats and understand the antiviral immune mechanisms used to control viral infection. Genomic analyses of immune genes in bats have produced conflicting and surprising observations. The most thoroughly studied bat genome is that of Pteropus alecto (Zhang et al., 2013), a reservoir host of Hendra virus. Additional bat genomes have been studied to a more limited extent (Seim et al., 2013, Zhang et al., 2013, Parker et al., 2013). The most notable findings from these studies involve two large classes of immune genes: natural killer (NK) cell receptors and type I interferons (IFNs). Multiple studies have reported the absence of canonical NK cell receptors in bat genomes (Shaw et al., 2012, Zhang et al., 2013, Lee et al., 2015), although a few receptors were identified in the P. alecto transcriptome (Papenfuss et al., 2012). A few studies suggest that significant differences exist in type I IFNs between bats and humans (Zhang et al., 2017, Zhou et al., 2016, Kepler et al., 2010, De La Cruz-Rivera et al., 2018), although the precise nature of the differences remains unclear. For example, the type I IFN locus has contracted in Pteropus alecto (Zhou et al., 2016), but expanded in Pteropus vampyrus and Myotis lucifugus (Kepler et al., 2010). While these genome projects provide important insights into the unique biology of bats, the bat genomes currently available were generated with low-coverage sequencing or with only short-read next-generation sequencing (NGS) technologies (Zhang et al., 2013, Seim et al., 2013). These sequencing strategies impact the overall contiguity of genome assemblies and limit the ability to resolve repetitive genome loci where important immune gene loci reside. To overcome this limitation, we used a hybrid strategy combining both short- and long-read NGS technologies to generate a high-quality annotated genome for the Egyptian rousette bat (Rousettus aegyptiacus), an asymptomatic host of MARV (Towner et al., 2009). Here, we use this genome, the most contiguous bat genome available, to understand the evolution of immune genes and gene families in bats, and describe several observations relevant to defense against viruses. We report an unusual expansion of the KLRC (NKG2) and KLRD (CD94) gene families in R. aegyptiacus relative to other species and show genomic evidence of unique features and expression of these receptors that may result in a net inhibitory balance within bat NK cells. The expansion of NK cell receptors is matched by an expansion of potential MHC class I ligands, which are distributed both within and, surprisingly, outside the canonical MHC loci. We also observe that the type I IFN locus is considerably expanded and diversified in R. aegyptiacus. The IFN-ω subfamily contributes most to this expansion, and members of this subfamily are induced after viral infection and show antiviral activity. All these features strengthen the notion of the unique biology of bats and suggest the existence of a distinct immunomodulatory mechanism used to control viral infection.

Results

The Genome Assembly Has High Contiguity and Completeness

We used a hybrid paired-end short-read plus long-read strategy to generate 720 Gb of short-read data and 34.9 Gb of long-read data, for an approximate total coverage of 169×. We generated a draft genome (Raegyp2.0) comprising 1.91 Gb of sequence represented in 2,490 scaffolds (N50 = 2.007 Mb; NG50 = 1.811 Mb), which is ∼90% of the estimated genome size. Of 248 core eukaryotic genes, 88.31% were complete in the genome, while 92.34% were at least partially represented, suggesting that gene information is covered well. The genome size and N50 are comparable to the assembled genomes of other bats. However, Raegyp2.0 has a larger contig N50 and a lower total gap length than any other available bat genome, making it the most contiguous bat genome to date (Table 1 ; Figures S1 C and S1D). Whole-genome annotation was performed via the NCBI eukaryotic annotation pipeline using all R. aegyptiacus-specific transcript and RNA sequencing (RNA-seq) data in the NCBI databases, as well as proteins of other well-characterized species (Thibaud-Nissen et al., 2013). Similar to the number estimated by transcriptomic analysis (Lee et al., 2015, Hölzer et al., 2016) and the numbers in other bats (Table S2), 19,668 of the annotated genes in Raegyp2.0 are protein-coding genes with high support from transcriptomic or protein data.

Table 1

Contiguity and Coverage among Bat Genomes

Species (Assembly Name)	Scaffold N50 (Kb)	Contig N50 (Kb)	Total Length (Gb)	Total Gap Length (Mb)	Coverage/Sequencing Technology
Rousettus aegyptiacus (Raegyp2.0)	2,007.2	1,489.0	1.910	0.482	169.2× Illumina HiSeq 2500 and Pacific Biosciences RS IIa
Pteropus vampyrus (Pvam_2.0)	5,954.0	21.9	2.198	181.040	188.0× Illumina
Pteropus alecto (ASM32557v1)	15,954.8	31.8	1.986	41.334	110× Illumina HiSeq 2000
Myotis lucifugus (Myoluc2.0)	4,293.3	64.3	2.035	68.155	7× Sanger
Myotis brandtii (ASM32734v1)	3,225.8	23.3	2.107	125.473	120× Illumina HiSeq 2000
Myotis davidii (ASM32734v1)	3,454.5	15.2	2.060	181.338	110× Illumina HiSeq 2000
Miniopterus natalensis (Mnat.v1)	4,315.2	29.8	1.803	68.170	77.0× Illumina HiSeq
Eptesicus fuscus (EptFus1.0)	13,455.9	21.4	2.027	215.248	84× Illumina HiSeq
Megaderma lyra (ASM46534v1)	16.9	7.0	1.736	20.607	18.0× Illumina HiSeq
Eidolon helvum (ASM46528v1)	27.7	12.7	1.838	7.320	18.0× Illumina HiSeq
Rhinolophus ferrumequinum (ASM46549v1)	21.2	11.7	1.926	4.731	17.0× Illumina HiSeq
Rhinolophus sinicus (ASM188883v1)	3,754.4	37.8	2.073	58.015	146.44× Illumina HiSeq
Pteronotus parnellii (ASM46540v1)	22.7	9.5	1.960	13.7	17.0× Illumina HiSeq
Hipposideros armiger (ASM189008v1)	2,328.2	39.9	2.237	281.734	218.6× Illumina HiSeq

The contiguity statistics (scaffold and contig N50, total length, and total gap length) are reported for each bat genome available in the NCBI GenBank database (accessed on 8/8/17), along with the GenBank Assembly Name and the given coverage. Kb, kilobases; Mb, megabases; Gb, gigabases. See also Figure S1, Tables S1 and S2, and STAR Methods.

Approximately 145× coverage of Illumina HiSeq 2500 data and 24× coverage of Pacific Biosciences RS II data.

Figure S1

Genome Characteristics of Raegyp2.0, Related to Table 1

(A) K-mer frequency distribution in Raegyp2.0. The percentage (frequency, y axis) of all 25-mers that are present a given number of times (depth, x axis) in the Rousettus aegyptiacus genome sequence.

(B) Heterozygosity of Raegyp2.0. SNPs: Single-nucleotide polymorphisms; indels: insertions or deletions. Scaffold and contig.

(C and D) (C) Nx and (D) NGx plots in megabases (Mbp) for Raegyp2.0 with an estimated reference genome size of 2.11 Gb. See STAR Methods for detailed description of Nx and NGx plots.

Contiguity and Coverage among Bat Genomes The contiguity statistics (scaffold and contig N50, total length, and total gap length) are reported for each bat genome available in the NCBI GenBank database (accessed on 8/8/17), along with the GenBank Assembly Name and the given coverage. Kb, kilobases; Mb, megabases; Gb, gigabases. See also Figure S1, Tables S1 and S2, and STAR Methods. Approximately 145× coverage of Illumina HiSeq 2500 data and 24× coverage of Pacific Biosciences RS II data. Genome Characteristics of Raegyp2.0, Related to Table 1 (A) K-mer frequency distribution in Raegyp2.0. The percentage (frequency, y axis) of all 25-mers that are present a given number of times (depth, x axis) in the Rousettus aegyptiacus genome sequence. (B) Heterozygosity of Raegyp2.0. SNPs: Single-nucleotide polymorphisms; indels: insertions or deletions. Scaffold and contig. (C and D) (C) Nx and (D) NGx plots in megabases (Mbp) for Raegyp2.0 with an estimated reference genome size of 2.11 Gb. See STAR Methods for detailed description of Nx and NGx plots.

Several Gene Families Have Expanded Significantly during the Evolution of Chiroptera

To gain insight into the evolutionary relationship of R. aegyptiacus to other bats and to incidental hosts of MARV, we inferred homologous protein groups among R. aegyptiacus and 14 other mammals and constructed a maximum-likelihood phylogenetic tree of all 15 species using single-copy orthologous genes (Figure 1 ). As previously established, the closest taxon to bats among the taxa included in the analysis is the horse (Equus caballus) (Seim et al., 2013, Zhang et al., 2013). We performed gene family expansion and contraction analysis on inferred gene families and observed many more expanded families in microbats than megabats, similar to what has previously been shown (Zhang et al., 2013, Tsagkogeorga et al., 2017) (Figure 1). 55 gene families are significantly expanded in R. aegyptiacus compared to the megabat ancestor (Figure 1; Tables S3 and S4).

Figure 1

Gene Family Expansion and Contraction across a Phylogenetic Tree of 15 Mammalian Species

A maximum likelihood tree based on 2,400 orthologous proteins was generated and used to infer expansion and contraction of 7,698 gene families. The number of expanded and contracted gene families is in blue and red, respectively. Numbers in black are the bootstrap evidence for partitions based on 1,000 bootstrap replicates. Images used under a creative commons license. MRCA, most recent common ancestor.

See also Tables S3 and S4, Data S1, and STAR Methods.

Gene Family Expansion and Contraction across a Phylogenetic Tree of 15 Mammalian Species A maximum likelihood tree based on 2,400 orthologous proteins was generated and used to infer expansion and contraction of 7,698 gene families. The number of expanded and contracted gene families is in blue and red, respectively. Numbers in black are the bootstrap evidence for partitions based on 1,000 bootstrap replicates. Images used under a creative commons license. MRCA, most recent common ancestor. See also Tables S3 and S4, Data S1, and STAR Methods.

Several Key Immune Genes Have Experienced Positive Selection

To investigate whether evolution at the gene level could contribute to the unique phenotype of R. aegyptiacus compared to other mammals, we studied disease-relevant immune genes for selection pressures (Figure S2 ; Table S5). Multiple innate immune response genes experience positive selection pressures along the R. aegyptiacus branch, including ISG15, an interferon-stimulated gene, IFNAR1, a subunit of the type I IFN receptor, and SIKE1, a negative regulator of the interferon response (Table 2 ). Several more innate immune genes experience relaxed purifying selection, including JAK2 and STAT3, components of the JAK/STAT signaling cascade, DDX58 (RIG-I), a sensor of viral double-stranded RNA, and TLR8, a pathogen-sensing molecule (Table 2). These results are consistent with those from similar analyses in P. alecto and M. davidii (Zhang et al., 2013).

Figure S2

Model Selection Hierarchy for Positive and Purifying Selection Analysis, Related to Table 2 and STAR Methods

RA = R. aegyptiacus, Mega = megabats other than R. aegyptiacus, Micro = microbats, Non-bat = all species in non-bat branches (human, crab-eating macaque, rhesus macaque, mouse, dog, cow, pig, guinea pig, hamster, and horse). Arrows indicate nested models (e.g., an arrow pointing from Model 1a to 2a means that Model 1a is nested in Model 2a). For each applicable model, color indicates which branches were used to estimate which evolution rate (orange - ω1; green - ω2; purple - ω3; gray - ω0, i.e., the background rate of evolution). Model 0 was the best-fitting for 330 genes, Model 1a for 5 genes, Model 1b for 65 genes, Model 1c for 23 genes, Model 2a for 0 genes, model 2b for 32 genes, and Model 3 for 1 gene.

Table 2

Positive and Relaxed Purifying Selection in R. aegyptiacus Immune Genes

Symbol	Gene	ω₀	ω₁	FDR
SIKE1	Suppressor of IKBKE 1	0.117	1.107b	1.85E−09
ISG15	Interferon stimulated gene 15 ubiquitin-like modifier	0.151	1.841b	3.01E−08
PFN1	Profilin 1	0.161	2.003b	1.58E−05
CD48	CD48 molecule	0.638	2.074b	1.62E−04
IL1RL1	Interleukin 1 receptor like 1	0.417	1.246b	5.83E−04
NQO1	NAD(P)H quinone dehydrogenase 1	0.204	1.210c	6.23E−04
IL17A	Interleukin 17A	0.316	1.146b	2.22E−03
LEP	Leptin	0.360	2.289a	2.42E−03
OSMR	Oncostatin M receptor	0.534	1.053b	6.85E−03
TNFRSF1A	TNF receptor superfamily member 1A	0.346	1.133a	1.58E−02
IFNAR1	Interferon alpha and beta receptor subunit 1	0.527	1.193c	4.64E−02
DDX58	Dexd/h-box helicase 58	0.339	0.586b	5.77E−03
TLR8	Toll like receptor 8	0.346	0.608b	1.82E−03
NOD2	Nucleotide binding oligomerization domain containing 2	0.150	0.267b	1.02E−03
LTBR	Lymphotoxin beta receptor	0.314	0.803b	5.24E−04
NFKB2	Nuclear factor kappa B subunit 2	0.083	0.324b	6.67E−12
REL	REL proto-oncogene, NF-κB subunit	0.225	0.424b	1.11E−02
RELA	RELA proto-oncogene, NF-κB subunit	0.093	0.424b	4.10E−11
RELB	RELB proto-oncogene, NF-κB subunit	0.087	0.545c	8.31E−15
RNASEL	Ribonuclease L	0.538	0.906b	5.77E−03
JAK2	Janus kinase 2	0.090	0.303c	1.60E−03
STAT3	Signal transducer and activator of transcription 3	0.002	0.016d	4.72E−02

Evolution rates were estimated under several models, and the best fitting model was chosen. ω0 refers to the background rate of evolution and always includes non-bat species, but also includes some bat species under certain models. ω1 refers to the rate of evolution of the R. aegyptiacus gene but also includes other bat species under certain models. FDR, false discovery rate. See Figure S2 and STAR Methods for full description of the models. See also Table S5.

ω1 estimate under Model 1a.

ω1 estimate under Model 1b.

ω1 estimate under Model 1c.

ω1 estimate under Model 3.

Model Selection Hierarchy for Positive and Purifying Selection Analysis, Related to Table 2 and STAR Methods RA = R. aegyptiacus, Mega = megabats other than R. aegyptiacus, Micro = microbats, Non-bat = all species in non-bat branches (human, crab-eating macaque, rhesus macaque, mouse, dog, cow, pig, guinea pig, hamster, and horse). Arrows indicate nested models (e.g., an arrow pointing from Model 1a to 2a means that Model 1a is nested in Model 2a). For each applicable model, color indicates which branches were used to estimate which evolution rate (orange - ω1; green - ω2; purple - ω3; gray - ω0, i.e., the background rate of evolution). Model 0 was the best-fitting for 330 genes, Model 1a for 5 genes, Model 1b for 65 genes, Model 1c for 23 genes, Model 2a for 0 genes, model 2b for 32 genes, and Model 3 for 1 gene. Positive and Relaxed Purifying Selection in R. aegyptiacus Immune Genes Evolution rates were estimated under several models, and the best fitting model was chosen. ω0 refers to the background rate of evolution and always includes non-bat species, but also includes some bat species under certain models. ω1 refers to the rate of evolution of the R. aegyptiacus gene but also includes other bat species under certain models. FDR, false discovery rate. See Figure S2 and STAR Methods for full description of the models. See also Table S5. ω1 estimate under Model 1a. ω1 estimate under Model 1b. ω1 estimate under Model 1c. ω1 estimate under Model 3.

Natural Killer Cell Receptors in R. aegyptiacus Have Unusual Origins and Unique Interaction Motifs

In mammals, NK cell receptors are encoded in two distinct gene complexes: the natural killer complex (NKC) contains killer lectin-like receptors (KLRs) such as CD94 (KLRD), NKG2 (KLRC), and Ly49, while the leukocyte receptor complex (LRC) contains immunoglobulin superfamily proteins such as the ILT/LIR family and the killer cell immunoglobulin-like receptors (KIRs). Both complexes vary in gene content among species, and encode both inhibitory and activating receptors. We did not find functional KIRs in the LRC, but consistent with what was found in P. alecto (Papenfuss et al., 2012), we identified 14 NKG2A/B-like genes (referred to as NKG2-1 through NKG2-14, including four pseudogenes), one NKG2D-like gene, one NKG2C-like gene, and 5 CD94-like genes (Table S3). Remarkably, six of the ten putatively functional NKG2A/B-like genes simultaneously encode activating and inhibitory interaction motifs. Of the remaining genes, three encode only inhibitory motifs, and only one gene encodes an activating motif alone (Figures 2A and 2B). No other potential NK cell receptor genes were found. While inhibitory NKG2 receptors signal via ITIMs in their cytoplasmic domains, activating CD94/NKG2 receptors transmit signals via a positively charged residue in their transmembrane domains. This residue recruits adaptor molecules that contain activating motifs, with a lysine residue associated with DAP12 recruitment, and an arginine residue associated with DAP10 recruitment. R. aegyptiacus NKG2 proteins with putative activating function show a strong preference for arginine at this location, suggesting that these receptors favor an association with DAP10 (Figure 2B).

Figure 2

Expansion of the NKG2 Genes in R. aegyptiacus

(A) CD94 and NKG2 genes in the natural killer complex in Raegyp2.0. Each arrow designates a scaffold sequence in the Raegyp2.0 genome (see STAR Methods for accessions). Not pictured are pseudogenes and non-coding genes. The ellipse indicates the presence of additional non-NKG2 genes on the same scaffold.

(B) Multiple sequence alignments showing activating and inhibitory signaling motifs in NKG2 genes in humans and three bats. There were no putative functional bat NKG2 genes identified in in P. alecto or M. davidii except NKG2-D. ITIM (immunoreceptor tyrosine-based inhibition motif) residues are in red, and the signal anchor residue lysine (K) or arginine (R) are in green and blue respectively. Dashes represent gaps in the alignment.

(C) Expression of putative functional NKG2 genes in transcriptomic data from 10 tissues in a R. aegyptiacus bat. Rows are ordered by highest average expression of transcripts across all tissues for a given gene. Expression is reported in log2(TPM), where TPM refers to transcripts per million. Data analyzed from Lee et al. (2015).

(D) Maximum likelihood phylogenetic tree of bat NKG2 proteins and homologs in other species. NKG2 proteins from R. aegyptiacus are colored by predicted function. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if 65 or over.

See also Figures S3, S5, and S7 and STAR Methods.

Expansion of the NKG2 Genes in R. aegyptiacus (A) CD94 and NKG2 genes in the natural killer complex in Raegyp2.0. Each arrow designates a scaffold sequence in the Raegyp2.0 genome (see STAR Methods for accessions). Not pictured are pseudogenes and non-coding genes. The ellipse indicates the presence of additional non-NKG2 genes on the same scaffold. (B) Multiple sequence alignments showing activating and inhibitory signaling motifs in NKG2 genes in humans and three bats. There were no putative functional bat NKG2 genes identified in in P. alecto or M. davidii except NKG2-D. ITIM (immunoreceptor tyrosine-based inhibition motif) residues are in red, and the signal anchor residue lysine (K) or arginine (R) are in green and blue respectively. Dashes represent gaps in the alignment. (C) Expression of putative functional NKG2 genes in transcriptomic data from 10 tissues in a R. aegyptiacus bat. Rows are ordered by highest average expression of transcripts across all tissues for a given gene. Expression is reported in log2(TPM), where TPM refers to transcripts per million. Data analyzed from Lee et al. (2015). (D) Maximum likelihood phylogenetic tree of bat NKG2 proteins and homologs in other species. NKG2 proteins from R. aegyptiacus are colored by predicted function. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if 65 or over. See also Figures S3, S5, and S7 and STAR Methods.

Figure S3

Multiple Sequence Alignments and Locus Maps of NKG2 Proteins in Bats, Related to Figures 2 and 3

Dots in alignments represent identity to the human protein sequence.

(A and B) Alignment of human NKG2A and C and putative functional bat NKG2 protein sequences. (A) shows the NKG2 residues that are known to contact CD94 in the human NKG2A protein (in orange), and (B) shows the NKG2 residues that are known to contact HLA-E/peptide in human NKG2A protein (in blue) (Kaiser et al., 2008).

(C) Locus maps of the NKG2 and CD94 genes in the natural killer complex. Each arrow designates a scaffold sequence in corresponding bat genome (see STAR Methods for accessions). Not pictured are unrelated pseudogenes and non-coding genes. The ellipse indicates the presence of additional non-NKG2 genes on the same scaffold.

(D) Alignment of the transmembrane domain of putative functional NKG2D proteins in humans and bats. Dashes indicate positions of diversity in the consensus sequence. The conserved arginine residue that serves as a signal anchor is shown in red. NKG2D-1 is shown for M. davidii, and NKG2D-2 is shown for M. lucifugus.

Figure S5

Related to Figures 2 and 3

(A–C) Expanded maximum likelihood phylogenies of (A) NKG2, (B) CD94 proteins, and (C) NKG2D proteins. R. aegyptiacus proteins are marked by red dots. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65.

Figure S7

Diversity and Expression of Type I IFN Genes in R. aegyptiacus, Related to Figure 5

(A) Phylogeny of R. aegyptiacus type I IFN proteins. Maximum likelihood phylogenetic tree of bat type I IFN proteins. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65.

(B) Antiviral effect of recombinant R. aegyptiacus IFN-ω4. RoNi cells were treated with recombinant IFN-ω4 (rIFN-ω4), rIFN-β1, or an unrelated protein (rPA-D1) for 4 hr, infected with VSV-eGFP at an MOI of 0.05, and imaged for eGFP expression 1 day post infection. Higher concentrations of recombinant IFN-ω4 inhibit viral replication as demonstrated by the absence of eGFP expression in cells after multiple viral replication cycles. Brightness was increased by 20% on all images.

(C) Sendai virus (SeV) infection of RoNi cells elicits an IFN response, including IFN-ω. RoNi cell monolayers were infected with SeV strain Cantell at an MOI of 1.0 or mock infected, and harvested for total RNA extraction and sequencing at 3, 8, and 24 hr. Sequencing data were quantified by IFN subtype in transcripts per million (TPM). Values plotted are the mean ± standard deviation of three replicates for each time point. IFN-ε and IFN-δ were not expressed. Adj. p values from unpaired t test between SeV and mock: ∗ < 0.05, ∗∗ < 0.005, ∗∗∗ < 0.0005. See STAR Methods.

Multiple diverse NKG2 and CD94 genes presumably allow substantial combinatorial diversity among heterodimeric NKG2/CD94 receptors (Figures 2A, 3 , and S3 C). Four of the five CD94 genes in R. aegyptiacus lack two cysteines at position 58 and 59 that are highly conserved in multiple mammal species. These cysteines participate in the disulfide-mediated heterodimeric interaction between CD94 and NKG2 proteins in humans (Kaiser et al., 2008, Petrie et al., 2008), suggesting that these CD94 molecules might interact with their NKG2 co-receptors in an alternate way. Mouse and rat CD94 are capable of direct association with DAP12 or DAP10 via a lysine residue in their transmembrane domains (Koch et al., 2013), but the CD94 sequences from all bats we examined have no lysine residues in their transmembrane regions, so are unlikely to associate directly with DAP proteins (Figure S4 C).

Figure 3

Expression and Diversity of CD94 in R. aegyptiacus

(A) Multiple sequence alignments showing conserved cysteine residues in CD94 genes in humans and five bats. Pseudogenes are indicated with the letter p in protein name. Asterisks indicate missing residues from a partial P. vampyrus CD94.

(B) Expression of CD94 genes in transcriptomic data from ten tissues in an Egyptian rousette bat. Rows are ordered by highest average expression of transcripts across all tissues for a given gene. Expression is reported in log2(TPM), where TPM refers to transcripts per million. Data analyzed from Lee et al. (2015).

(C) Maximum likelihood phylogenetic tree of bat CD94 proteins and homologs in other species. CD94 proteins from R. aegyptiacus are marked by red dots. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65.

See also Figure S3, Figure S4, Figure S5 and STAR Methods.

Figure S4

Multiple Sequence Alignments of CD94 Proteins in Bats, Related to Figure 3

Dots in alignments represent identity to the human protein sequence, while dashes represent gaps in the alignment.

(A and B) Alignment of human CD94 and putative functional bat CD94 protein sequences. (A) shows the CD94 residues that are known to contact NKG2A in the human CD94 protein (in orange), and (B) shows the CD94 residues that are known to contact HLA-E/peptide in human CD94 protein (in blue) (Kaiser et al., 2008).

(C) Alignment of the transmembrane domain of CD94 in human, mouse, rat, and bats. The lysine (K) residue that serves as a signal anchor for DAP10 and DAP12 in rodents is shown in blue. This residue is not conserved in bat CD94 proteins.

(D) Alignment of the cytoplasmic domain of CD94 in human and bats.

Expression and Diversity of CD94 in R. aegyptiacus (A) Multiple sequence alignments showing conserved cysteine residues in CD94 genes in humans and five bats. Pseudogenes are indicated with the letter p in protein name. Asterisks indicate missing residues from a partial P. vampyrus CD94. (B) Expression of CD94 genes in transcriptomic data from ten tissues in an Egyptian rousette bat. Rows are ordered by highest average expression of transcripts across all tissues for a given gene. Expression is reported in log2(TPM), where TPM refers to transcripts per million. Data analyzed from Lee et al. (2015). (C) Maximum likelihood phylogenetic tree of bat CD94 proteins and homologs in other species. CD94 proteins from R. aegyptiacus are marked by red dots. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65. See also Figure S3, Figure S4, Figure S5 and STAR Methods. Multiple Sequence Alignments and Locus Maps of NKG2 Proteins in Bats, Related to Figures 2 and 3 Dots in alignments represent identity to the human protein sequence. (A and B) Alignment of human NKG2A and C and putative functional bat NKG2 protein sequences. (A) shows the NKG2 residues that are known to contact CD94 in the human NKG2A protein (in orange), and (B) shows the NKG2 residues that are known to contact HLA-E/peptide in human NKG2A protein (in blue) (Kaiser et al., 2008). (C) Locus maps of the NKG2 and CD94 genes in the natural killer complex. Each arrow designates a scaffold sequence in corresponding bat genome (see STAR Methods for accessions). Not pictured are unrelated pseudogenes and non-coding genes. The ellipse indicates the presence of additional non-NKG2 genes on the same scaffold. (D) Alignment of the transmembrane domain of putative functional NKG2D proteins in humans and bats. Dashes indicate positions of diversity in the consensus sequence. The conserved arginine residue that serves as a signal anchor is shown in red. NKG2D-1 is shown for M. davidii, and NKG2D-2 is shown for M. lucifugus. Multiple Sequence Alignments of CD94 Proteins in Bats, Related to Figure 3 Dots in alignments represent identity to the human protein sequence, while dashes represent gaps in the alignment. (A and B) Alignment of human CD94 and putative functional bat CD94 protein sequences. (A) shows the CD94 residues that are known to contact NKG2A in the human CD94 protein (in orange), and (B) shows the CD94 residues that are known to contact HLA-E/peptide in human CD94 protein (in blue) (Kaiser et al., 2008). (C) Alignment of the transmembrane domain of CD94 in human, mouse, rat, and bats. The lysine (K) residue that serves as a signal anchor for DAP10 and DAP12 in rodents is shown in blue. This residue is not conserved in bat CD94 proteins. (D) Alignment of the cytoplasmic domain of CD94 in human and bats. To determine whether CD94 and NKG2 genes are expressed, we queried our published transcriptomic data (Lee et al., 2015) and found that the majority of these genes are expressed at low levels in peripheral blood mononuclear cells and secondary lymphoid organs (Figures 2C and 3B), similar to human baseline expression. However, two receptors with inhibitory signaling motifs—NKG2-13 and NKG2-14—along with CD94-1, are expressed at higher levels in the same tissues, suggesting that inhibitory signaling dominates in uninfected bats. To determine whether NKG2/CD94 receptors are diversified across bats, we examined the NKC in additional bats, and found that all bats we studied have multiple NKG2-like genes, although some were not originally classified as C-type lectins because of missing exons; P. vampyrus and P. alecto also have multiple CD94-like genes (Figures 2, 3, and S3). Each bat has one CD94 gene with canonical cysteine residues at position 58 and 59, (Figures 3A and S4A) and like R. aegyptiacus, multiple bats have CD94 genes without these residues, suggesting that they may have alternative functions. Phylogenetic analysis shows that NKG2 genes have undergone considerable diversification before and after the speciation of megabats (Figures 2D and S5 ). As with R. aegyptiacus, the putative functional and truncated NKG2 genes in both megabats (P. alecto and P. vampyrus) and microbats (M. lucifugus and M. davidii) contain ITIMs or both ITIMs and a positively charged transmembrane residue, suggesting that both inhibitory and activating signaling in these receptors is common among all bats in our analysis (Figure 2B). Related to Figures 2 and 3 (A–C) Expanded maximum likelihood phylogenies of (A) NKG2, (B) CD94 proteins, and (C) NKG2D proteins. R. aegyptiacus proteins are marked by red dots. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65.

The MHC Class I Genes Are Located Both within and outside of the Canonical MHC Locus

While KIRs interact with classical MHC class I molecules (cMHCs), NKG2A/B/C and F receptors interact with HLA-E, a non-classical MHC class I molecule (ncMHC) that displays nonamers derived from the signal peptides of cMHCs. This mechanism is thought to allow NK cells to monitor the expression of cMHCs and deliver cytotoxic hits to cells lacking such expression (Yokoyama and Plougastel, 2003). We hypothesized that the expansion of the NKG2 receptor family would be matched by an expansion of cMHCs or ncMHCs. In humans, MHC class I genes are located in three areas referred to as the α, κ, and β duplication blocks, which are separated by framework regions containing non-HLA genes (Kulski et al., 2002). HLA-E and its functional equivalent in mice, H2-Qa1, are located in the κ block. Raegyp2.0 appears to lack the α and κ blocks (Figure 4 A), as do P. alecto and E. fuscus (Ng et al., 2016).

Figure 4

Characterization of the MHC Class I Region in Raegyp2.0

(A and B) Locus maps of (A) the MHC class I region and (B) MHC class I genes outside the canonical class I region in Raegyp2.0. Each arrow designates a scaffold sequence in the Raegyp2.0 genome (see STAR Methods for accessions). Not pictured are non-MHC pseudogenes and non-coding genes. The ellipse indicates the presence of additional genes on the same scaffold. Black, MHC class I genes; gray, non-MHC genes; dark blue, MICB; unfilled boxes, MHC class I pseudogenes. The α, κ, and β class I duplication blocks are shown in red, purple, and green, respectively.

(C) Expression of MHC class I genes in transcriptomic data from 10 tissues in an Egyptian rousette bat. Rows are ordered by highest average expression of transcripts across all tissues for a given gene. Expression is reported in log2(TPM), where TPM refers to transcripts per million. Data analyzed from Lee et al. (2015).

(D) Maximum likelihood phylogenetic tree of bat MHC class I proteins (R. aegyptiacus proteins in red) with human MHC class I proteins as an outgroup. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if 65 or over.

(E) Sequence logo plots showing the sequence diversity of predicted nonamer peptides derived from the signal sequences of MHC class I genes from human (H. sapiens), mouse (M. musculus), R. aegyptiacus, and the gray mouse lemur (M. murinus). The y axis shows information content in bits, and the x axis shows position in the nonamer.

Type I Interferons Are Expanded in R. aegyptiacus

We estimate 20 type I IFN genes in the megachiropteran ancestor and find 46 putative functional genes in Raegyp2.0, including 12 IFN-α genes, one of each of the IFN-β, -ε, and -κ genes, nine IFN-δ genes, and 22 IFN-ω genes (Figure 5 ; Table S3). The greatest expansion occurred in the IFN-ω subfamily, which has only one copy in humans.

Figure 5

Diversity of the Type I Interferons in Raegyp2.0

(A) Locus map of the type I IFNs in Raegyp2.0. Each arrow designates a scaffold sequence in the Raegyp2.0 genome (see STAR Methods for scaffold accessions). Unfilled boxes indicate pseudogenes. Orange, IFNβ; blue, IFNω; red, IFNα; yellow, IFNδ; purple, IFNε; green, IFNκ. The single non-IFN gene within the locus (KLHL9) is in gray. Not pictured are non-coding genes. The ellipse indicates the presence of additional non-IFN genes on the same scaffold.

(B) Maximum likelihood phylogenetic tree of bat type I IFN proteins and homologs in other species. R. aegyptiacus proteins are marked in red, with groups of closely related proteins collapsed. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if 65 or over.

See also Figure S6, Table S6, and STAR Methods.

Diversity of the Type I Interferons in Raegyp2.0 (A) Locus map of the type I IFNs in Raegyp2.0. Each arrow designates a scaffold sequence in the Raegyp2.0 genome (see STAR Methods for scaffold accessions). Unfilled boxes indicate pseudogenes. Orange, IFNβ; blue, IFNω; red, IFNα; yellow, IFNδ; purple, IFNε; green, IFNκ. The single non-IFN gene within the locus (KLHL9) is in gray. Not pictured are non-coding genes. The ellipse indicates the presence of additional non-IFN genes on the same scaffold. (B) Maximum likelihood phylogenetic tree of bat type I IFN proteins and homologs in other species. R. aegyptiacus proteins are marked in red, with groups of closely related proteins collapsed. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if 65 or over. See also Figure S6, Table S6, and STAR Methods. Given previous reports of constitutive IFN expression in bats, we sought to determine whether the expanded IFN genes may be constitutively expressed and found limited baseline expression of these genes (Table S6). To determine whether these IFN-ωs may be induced by viral infection, we infected immortalized R. aegyptiacus cells (RoNi) with the Cantell strain of Sendai virus, a known inducer of IFNs in other species. We observed induction of IFN-ω transcripts (Figure S7 C), although at relatively low levels compared to IFN-β transcripts. To examine whether IFN-ω proteins retain the canonical antiviral function of type I IFNs, we synthesized recombinant Egyptian rousette IFN-β and IFN-ω4, and an unrelated protein of similar size, and used these proteins in an antiviral assay in RoNi cells. Consistent with a bona-fide antiviral function, treatment of RoNi cells with IFN-ω4 blocks infection with vesicular stomatitis virus encoding eGFP (VSV-eGFP) (Figure S7). Diversity and Expression of Type I IFN Genes in R. aegyptiacus, Related to Figure 5 (A) Phylogeny of R. aegyptiacus type I IFN proteins. Maximum likelihood phylogenetic tree of bat type I IFN proteins. Bootstrap evidence (percentage of 500 bootstrap replicates) is labeled on branches if over 65. (B) Antiviral effect of recombinant R. aegyptiacus IFN-ω4. RoNi cells were treated with recombinant IFN-ω4 (rIFN-ω4), rIFN-β1, or an unrelated protein (rPA-D1) for 4 hr, infected with VSV-eGFP at an MOI of 0.05, and imaged for eGFP expression 1 day post infection. Higher concentrations of recombinant IFN-ω4 inhibit viral replication as demonstrated by the absence of eGFP expression in cells after multiple viral replication cycles. Brightness was increased by 20% on all images. (C) Sendai virus (SeV) infection of RoNi cells elicits an IFN response, including IFN-ω. RoNi cell monolayers were infected with SeV strain Cantell at an MOI of 1.0 or mock infected, and harvested for total RNA extraction and sequencing at 3, 8, and 24 hr. Sequencing data were quantified by IFN subtype in transcripts per million (TPM). Values plotted are the mean ± standard deviation of three replicates for each time point. IFN-ε and IFN-δ were not expressed. Adj. p values from unpaired t test between SeV and mock: ∗ < 0.05, ∗∗ < 0.005, ∗∗∗ < 0.0005. See STAR Methods.

Discussion

Few viruses are known to cause acute disease in bats, including those that cause profound, often lethal, disease in humans. The precise mechanism(s) of this viral resistance is not known. One hypothesis that has been gaining acceptance is that bats present especially potent innate antiviral defenses compared to primates, controlling viral replication early in infection, and as a result, developing effective adaptive immune responses (Baker et al., 2013). Our results suggest a different hypothesis that is consistent with the idea that bat antiviral mechanisms are different in essential ways from those of other mammals, but is additionally associated with enhanced infection tolerance rather than enhanced defense. This hypothesis is supported by infection studies of MARV transmission among R. aegyptiacus bats, in which bats that are “naturally” infected by other experimentally inoculated bats appear to have a protracted incubation period (Amman et al., 2015, Schuh et al., 2017). Further, these studies showed that infected bats can remain viremic and shed infectious virus for extended periods of time (up to 3 weeks after infection) before eventually clearing the virus (Data S1). Despite prolonged infection, limited inflammation is observed in even the most highly infected tissues, similar to what was previously observed in infected wild-caught bats (Jones et al., 2015, Towner et al., 2009).

Type I IFNs

Bats may tolerate viral infections to a greater extent by minimizing the proinflammatory effectors that promote damage to the host in many viral infections. Type I IFNs are induced very early in viral infection and act by inducing effectors encoded by interferon stimulated genes (ISGs). Different IFN subtypes specifically interact with the common IFN receptor inducing a distinct spectra of ISGs with different antiviral potencies (Hoffmann et al., 2015). The magnitude and nature of the IFN response determine whether the resulting effects on the host are harmful or beneficial (Malireddi and Kanneganti, 2013). For example, dysregulation of the type I IFN response has been implicated in the pathogenesis of multiple emerging viruses, including MARV (Liu et al., 2017, Connor et al., 2015). Although IFN gene families differ substantially across mammals (Secombes and Zou, 2017), the extensive expansion of IFN-ω genes in R. aegyptiacus to almost two dozen genes is striking. This expansion is dramatically different from what was observed in P. alecto (Zhou et al., 2016), but is consistent with the expansion in P. vampyrus (Kepler et al., 2010). In our hands, at least one of the IFN-ω members, IFN-ω4, has a marked antiviral effect against VSV-eGFP infection in RoNi cells (Figure S7). However, the effect observed in this in vitro system, was less potent than that of IFN-β. Thus, IFN-ω4 most likely induces a unique ISG pattern and may be more effective against different viruses, or may induce lower levels of effectors in a more regulated response. Consistent with this latter scenario, Banerjee et al. (2017) have shown that poly I:C treatment induces type I IFNs in both human and Eptesicus fuscus bat cells, but bat cells express much lower levels of inflammatory mediators. A wider variety of signaling mediators may provide R. aegyptiacus greater flexibility to develop a more nuanced antiviral response. It is also possible that different IFN-ωs may complement and/or synergize with each other. We observed IFN-ω gene induction in RoNi cells after Sendai virus infection, albeit at lower levels than IFNβ or IFN-α genes (Figure S7). Given that Sendai virus is known to be a potent inducer of IFN-α and -β in particular, it is possible that different viruses or other stimuli may preferentially induce IFN-ω genes. Unlike P. alecto, R. aegyptiacus shows no evidence of constitutive IFN expression (Table S6). Both bats are reservoirs for different emerging viruses of high virulence to humans, so it will be of great interest to determine whether these apparent differences in IFN expression represent distinct mechanisms of viral control by both bats species.

NK Cell Receptors

NK cells are an important component of innate antiviral responses, and have been associated with survival of Ebola virus infection (Liu et al., 2017). The unique organization, structure, and increased signaling complexity of the NK cell receptors in multiple bats point to adaptations that are also consistent with the infection tolerance hypothesis. Our findings suggest that NKG2/CD94 receptors, which are more associated with an inhibitory response in other species, serve as the primary NK cell receptors in bats. Consistent with this, all but one of the NKG2A/B-like genes in R. aegyptiacus have inhibitory motifs at the cytosolic tail. The expansion of CD94 genes makes the possible combinatorial diversity of heterodimeric receptors very large, as previously demonstrated for prosimians (Averdam et al., 2009). Because signal transmission occurs via a conserved set of residues, the additional receptor diversity is likely to be associated with the capacity to bind additional ligands or differentially interact with the same ligands. Greater diversity in ligand binding would provide better recognition of MHC class I alleles to recognize self for NK cell tuning and licensing, and also to distinguish a variety of pathogen mimics of MHC class I molecules (Parham and Moffett, 2013). The KLRD variants missing conserved cysteines add an additional level of complexity to this interaction that needs further characterization once tools for isolating NK cells in bats are available. In other mammals, KLRC genes are also expressed on T cells, and NKG2A has been shown to control the level of T cell activation during viral infection, preventing excessive activation and immunopathology in mice (Rapaport et al., 2015). The high baseline expression of inhibitory KLRC genes may indicate an immune-inhibitory state associated with both NK cells and T cells in R. aegyptiacus bats in the absence of infection, although this remains to be confirmed with functional studies. A further striking feature of the NK cell receptor genes in R. aegyptiacus and the other bats we studied is the signaling mode they are predicted to use. In genes with potential for activating signaling, an arginine residue is preferred (Figure 2B), suggesting DAP10, rather than DAP12, recruitment (Koch et al., 2013). DAP12 has been shown to be more efficient in activating cytokine production than DAP10 (Lanier 2009), therefore a preference for DAP10 could mean that activating NK receptors in bats have adapted to be less potent inducers of cytokines, and thus less inflammatory. NK cell receptors that possess both inhibitory and activating domains are unusual across mammals. The only known genes with both functions are the single gene KIR2DL4 in humans and the NKG2 genes in lemurs (Parham and Moffett, 2013). KIR2DL4 activation promotes robust cytokine secretion but not cytotoxicity (Kikuchi-Maki et al., 2005). It is possible that the R. aegyptiacus KLRC genes mimic the signaling of human KIR2DL4, although parallels between both receptors are difficult to make without functional assays. However, our genomic evidence suggests that NK cell receptors in bats are uniquely regulated, especially given the multiple changes in the signaling potential of these receptors, which are highly conserved in eukaryotes ranging from Drosophila to humans.

MHC Class I Genes

The potential extended capacity of the KLRC/KLRD system to bind diverse ligands is accompanied by expansion of MHC class I genes outside of the canonical MHC locus, although there is no obvious bat ortholog of HLA-E. A similar expansion and absence of a clear HLA-E ortholog has been observed in prosimians (Averdam et al., 2009). If one of the MHC class I genes in Raegyp2.0 is functionally similar to HLA-E, the high similarity of predicted class I nonamers (Figure 4E) suggests that presenting a peptide from one MHC class I molecule would likely be interchangeable with presenting a peptide from another. Thus, expression of MHC class I molecules would have to be dramatically decreased in order for “missing-self” detection to occur. While functional orthologs cannot be inferred without functional assays, the large number of MHC genes encoded outside the canonical locus in the bats we studied (Figure 4B) may be suitable ligands for the expanded KLR genes. Distribution of the MHC genes across the genome might serve as a mechanism to generate redundancy as different KLRC receptors might interact with distinct MHC class I genes. This could potentially result in a higher activation threshold for NK cells. Taken together, these results show that multiple bats, including R. aegyptiacus, have expanded and diversified numerous antiviral loci, and potentially developed unique adaptations in NK cell receptor signaling, and type I IFN responses. Our findings are consistent with the hypothesis that certain key components of the immune system in bats have coevolved with viruses toward a state of respective tolerance and avirulence, although tolerance is likely not the only mechanism at play. For example, in addition to enhanced flexibility, the expansion of type I IFNs may also point to enhanced potency of antiviral defenses. Recent studies have shown that IFN stimulation in bats induces some ISGs that are not induced by IFN in humans, and that the response kinetics may vary as well (De La Cruz-Rivera et al., 2018). Adaptations in potency are also indicated by observations of lower viral loads in R. aegyptiacus bats compared to humans (Amman et al., 2015, Jones et al., 2015). Finally, even with mutual disarmament, the host must be alert to viruses that may spontaneously revert to an antagonistic phenotype. In either case, definitive tests of these hypotheses await the development of further experimental reagents for cytometry and biochemical intervention—reagents that are being developed now with information made available by the completed genome project.

STAR★Methods

Key Resources Table

Contact for Reagent and Resource Sharing

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Gustavo Palacios (gustavo.f.palacios.ctr@mail.mil).

Experimental Model and Subject Details

Source organism

Genomic DNA was isolated from the liver and spleen tissue of a wild-caught healthy older juvenile male Egyptian rousette bat captured at Python Cave in Uganda. Research was conducted under an IACUC approved protocol in compliance with the Animal Welfare Act, PHS Policy, and other Federal statutes and regulations relating to animals and experiments involving animals. The facility where this research was conducted is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care, International and adheres to principles stated in the Guide for the Care and Use of Laboratory Animals, National Research Council, 2011.

Cell lines

Rousettus aegyptiacus immortalized fibroblasts (RoNi) were originally generated in the work described in Biesold et al. (2011). Briefly, kidney tissue from a wild-caught sub-adult Egyptian rousette bat was cultured and immortalized via lentiviral transduction of the SV40 large T antigen. The sex of the bat was not recorded at the time of capture. Cell cultures were genotyped and tested for mycoplasma, SV5, filoviurses and lyssaviruses by RT-PCR. Cells were cultured at 37°C at 5% CO2 in DMEM (Dulbecco’s Modified Eagles Medium) with 4.5g L glucose supplemented with 10% fetal bovine serum, 1% penicillin/streptomycin 100x conentrate, 1% L-Glutamine 200mM, 1% Sodium Pyruvate 100mM, and 1% MEM nonessential amino acids 100x concentrate.

Method Details

Nucleic acid extraction and sequencing

We used the UltraClean Blood DNA Isolation kit (MO BIO Laboratories, Carlsbad, CA) with a customized protocol for tissue (available upon request) to isolate genomic DNA from liver and spleen tissue of a healthy older juvenile male Rousettus aegyptiacus bat captured at Python Cave in Uganda. For sequencing on the Illumina platform, 1 ug of genomic DNA was sheared to 400 bp using Covaris LE220 Focused-ultrasonicator (Covaris, Woburn, MA). End repair, A-tailing and ligation of adapters were performed on the Apollo 324 automated system, using Prep X Complete ILMN DNA Library Kit (WaferGen Biosystems, Fremont, CA). KAPA Library Amplification Kit with 10 cycles of PCR was used for library enrichment. Libraries were quantified by qPCR using KAPA Library Quantification Kit (KAPA Biosystems, Wilmington, MA). Each library was loaded on 8 lanes of the high output flow cell. Cluster amplification was performed on the cBot with the TruSeq PE Cluster Kit v3-cBot-HS (Illumina, San Diego, CA). Clustered flow cell was sequenced on the HiSeq 2500 instrument with the TruSeq SBS Kit –HS (Illumina). 720 Gb of 2x101bp paired-end data were produced (approximate coverage of 145x). For sequencing on the PacBio platform, genomic DNA was sheared to 20kb average size using g-TUBE (Covaris). After DNA damage repair and ends repair, hairpin adapters were ligated to form a SMRTbell template. ExoIII and ExoVII treatment was used to remove failed ligation products. Size selection was performed on Blue Pippin system (Sage Sciences, Beverly, MA) using 0.75% dye-free agarose gel cassette, marker S1 and Hi-Pass protocol; low cut was set on 4000 bp. Final library assessment was obtained by Qubit dsDNA BR assay and Agilent 2100 Bioanalyzer DNA 12000 chip. To obtain longer reads, libraries were sequenced with P5-C3 chemistry. Annealing of sequencing primer and binding polymerase P5 to the SMRTbell template was performed according to PacBio calculator. The polymerase-template complexes were bound to MagBeads, loaded onto SMRTcells (SMRT Cell 8 pack V3) at final concentration 180 pM, and sequenced with 180 min movies on PacBio RS II instrument (Pacific Biosciences, Menlo Park, CA).

Genome Size Estimation and Heterozygosity analysis

Genome size was estimated by k-mer analysis. K-mers from 4 lanes of the Illumina dataset were counted using Jellyfish (Marçais and Kingsford, 2011). Excluding rare 25-mers that occur at low depth, the most frequently occurring depth was 57x (Figure S1). The presence of multiple peaks suggests that this genome has a high degree of heterozygosity. Peak k-mer frequency (M), real sequencing depth (N), read length (L), k-mer length (K), total bases (T), and genome size (G) are related by the following formulas (Li et al., 2010): Using this method, we estimate a complete genome size of 2.08 Gb, which is very close to previous estimates of 2.11 Gb based on nuclear densitometry (Kwiecinski and Griffiths, 1999) (Figure S1A).

Genome Assembly

Illumina reads were assembled separately from the PacBio reads with SparseAssembler, a short read assembler that exploits high coverage to construct a modified de Bruijn graph (DBG) (Ye et al., 2012). Because raw Pacific Biosciences (PacBio) long sequencing reads can be error-prone, we corrected potential sequencing errors in these reads with ∼34x of high-accuracy Illumina reads using LoRDEC v0.5 (long read de Bruijn graph error correction) (Salmela and Rivals, 2014). Contiguity of the short-read assembly was improved by incorporating long-read data using DBG2OLC (Ye et al., 2016), which anchors the short-read contigs generated with SparseAssembler to the long PacBio reads. Multiple approaches were used to scaffold the assembly. First, we used PacBio reads with the long-read scaffolding program LINKS v1.5.1, which uses a paired k-mer approach to scaffold assemblies with long reads even if they have already been used in the assembly (Warren et al., 2015). LINKS was used iteratively until no improvement was observed in contiguity, which was after seven iterations. Second, we made use of transcriptome data recently published (Lee et al., 2015) to scaffold our assembly with L_RNA_Scaffolder (downloaded 9/28/2015) (Xue et al., 2013). Last, we reincorporated the paired-end information from our Illumina data for scaffolding with SSPACE v3.0 (Boetzer et al., 2011). As a post-processing step, we aligned the Illumina reads to the assembly using Bowtie2 (Langmead and Salzberg, 2012) and ran the variant calling program Pilon (Walker et al., 2014) on the resulting alignment to correct potential mis-assemblies, consensus calling errors, or homopolymer indel errors. We also tested the use of the Quiver algorithm from Pacific Biosciences as a post-processing step prior to pilon, but found that it reintroduced insertion and deletion errors that led to poor gene models. The genome-wide average GC content is about 40%, and intra-scaffold GC content ranges from 31.1% to 71.4%. Raegyp2.0 has a higher degree of heterozygosity (0.53%, Figure S1B) than has been reported for other bat genomes (Zhang et al., 2013). To check for mis-assemblies, we used the gene coverage program CEGMA v2.5 with mammalian settings to identify the presence and coverage of highly conserved eukaryotic genes in the genome (Parra et al., 2007). We also remapped ∼54x of short paired reads onto the assembly with bwa-mem v0.7.10 and assessed the number of reads that map appropriately (in the correct orientation and with the expected insert size) with SAMtools flagstat (SAMtools v0.1.18) (Li, 2013, Li et al., 2009). 98.41% of reads align, and 91.19% of reads map appropriately, indicating no major misassemblies. To assess the quality of the Raegyp2.0 assembly, we used a statistical analysis program, QUAST v3.0 (Gurevich et al., 2013), to look at baseline statistics, including the total assembly length, the percentage of estimated genome size covered, and contiguity statistics. The Nx and NGx plots (generated using QUAST) in Figure S1 show the shortest sequence for which the total length of all sequences of its length or longer make up “x” percentage of the total assembly size (Nx) or the total estimated reference genome size (NGx). For example, In a length-sorted list of sequences, the N50 refers to the length of the smallest sequence for which the sum of the length of all sequences of the same or longer length constitute 50% of the total assembly length. The scaffold N50 refers to the N50 of all the scaffolds in the genome, while the contig N50 refers to the N50 of all the contigs in the genome. To compare genome contiguity statistics across all available bat genome projects (Table 1), we accessed the appropriate GenBank Genome page for each species and reported contiguity statistics as listed (accessed on 8/8/17).

Genome Annotation

The assembly was submitted to GenBank (accession GCA_001466805.2) and annotated using the NCBI eukaryotic genome annotation pipeline (Thibaud-Nissen et al., 2013). A total of 26.25% of the genomic sequence was identified as repetitive by the de novo repeat finder WindowMasker (Morgulis et al., 2006) and was masked for the purpose of aligning evidence and predicting genes. Transcripts and proteins available in GenBank and known RefSeq transcripts and proteins for bats and human were aligned to the genome with Splign (Kapustin et al., 2008) and ProSplign, along with model RefSeq proteins previously annotated on the Pteropus alecto and Myotis brandtii genomes. In addition, over 2 billion RNA-Seq reads derived from 12 different Rousettus aegyptiacus tissues and available in SRA were aligned. Model precursors were created by Gnomon (National Center for Biotechnology Information n.d.), a gene calling algorithm trained on Pteropus alecto, by collapsing overlapping alignments with compatible splice patterns. In a second step also run by Gnomon, model precursors with high coding propensity were extended or joined if missing start or stop codon using an HMM model. The resulting models were evaluated and filtered based on multiple criteria, including supporting evidence, conflicts with models on the other genomic strand, Blast hits to UniProtKB/SwissProt if over 50% ab initio sequence, and number of exons (single-exon non-coding RNA were eliminated). Models in the final set were assigned a function by orthology calculation to human, or if no ortholog could be calculated, by homology to proteins in UniProtKB/SwissProt. Finally, models were assigned GeneIDs and RefSeq model transcript and protein accession (with XM_, XP_, XR_ prefixes) and loaded to the Nucleotide, Protein and Gene databases as part of NCBI Rousettus aegyptiacus Annotation Release 100. The genome contains 36.4% repetitive content, similar to that found in other bats (Table S1). 19,668 of the annotated genes were protein-coding genes and 2,380 were non-coding genes. 2,198 coding sequences were corrected for premature stop codons, small internal gaps, or frameshifts based on aligning evidence (NCBI Eukaryotic Annotation Group, 2016). When present, the corrected models were used for downstream analysis. 5,958 long non-coding RNAs were predicted with full support from transcript data and 340 tRNA models were predicted with tRNAscan-SE (Lowe and Eddy, 1997). 96.03% of predicted mRNAs were fully supported by transcriptomic or protein data. A total of 16,254 of predicted protein-coding genes (83%) had at least one protein aligning to a UniProtKB/SwissProt protein for over 95% of its length, which was higher than for any other bat annotated by the NCBI pipeline at the time. In addition, the number of the UniProtKB/SwissProt hits covered over 95% of their length by a predicted protein was similarly high (82%), indicating that the predicted proteins on Raegyp2.0 represent full-length models. Of the 19,668 protein-coding genes annotated in the genome, 317 genes have no associated gene labels or homolog in open reading frames from genomes of other species in RefSeq, and may represent novel R. aegyptiacus-specific genes. However, the median length of the protein product of these genes (257 amino acids) is significantly lower than the median length of all annotated proteins (498 amino acids). Thus, it remains a possibility that these genes have been previously identified in other species but are partial or poor models in Raegyp2.0 because of misannotation or local misassembly.

Designating gene families

We inferred homologous protein groups among 15 species of mammals using a similarity-based approach within the OrthoMCL pipeline (Fischer et al., 2011). The following species were included in the analysis: Rousettus aegyptiacus (Egyptian rousette bat), Pteropus vampyrus (large flying fox), Pteropus alecto (black flying fox), Myotis davidii, Myotis lucifugus (little brown bat), Homo sapiens, Macaca fascicularis (crab-eating macaque), Macaca mulatta (rhesus macaque), Mus musculus (mouse), Cavia porcellus (guinea pig), Cricetulus griseus (Chinese hamster), Sus scrofa (pig), Bos taurus (cow), Equus caballus (horse), Canis familiaris (dog). Protein data were downloaded from RefSeq and filtered to include only the longest protein product of a gene for use in the pipeline. Briefly, OrthoMCL gathers all-against-all blastp hits into reciprocal best hits (between species) and reciprocal better hits (within species). These hits are converted into a graph network describing likely orthologous or paralogous relationships among proteins, and MCL clustering is performed to group proteins into families. Using OrthoMCL v2.0.9, we obtained 19,310 groups of proteins and 7,041 single-copy orthologous groups. We annotated all OrthoMCL groups with functional family designations from the PANTHER database (Mi et al., 2016). We first labeled each group with a representative RefSeq protein accession from that group, using a protein from a well-curated genome (human, mouse, macaque) whenever possible. We mapped all RefSeq accessions directly to PANTHER family IDs or to UniProtKB accessions for subsequent mapping to PANTHER IDs using bioDB, PantherDB, and/or the UniProtKB accession mapping feature (Mi et al., 2016, Mudunuri et al., 2009, UniProt Consortium, 2015). All OrthoMCL groups with the same PANTHER family ID were collapsed into new gene families, which produced a total of 9,555 gene families (2,400 single-copy orthologous gene families). Some families remain unlabeled after this process because proteins from relatively new genomes may be missing in PANTHER. Of note, 3,450 OrthoMCL groups did not map to PANTHER families and were designated unlabeled gene families for downstream analysis. Using this process, upon examining the type I IFN gene family, we observed that many genes were not accounted for despite being present in the NCBI annotation. We discovered that these genes were labeled as singletons or in families with only R. aegyptiacus proteins, and therefore would not be classified as a PANTHER family since no R. aegyptiacus proteins were in the PANTHER database at the time of analysis. We manually corrected the numbers of genes in this family based on NCBI annotations and yielding 9,550 families, 3,445 of which were unlabeled.

Identification of repetitive content

Repetitive content was identified based on homology to the RepBase database using RepeatMasker (Smit et al. n.d.). “One code to find them all” (Bailly-Bechet et al., 2014) was used to resolve nested repeats and provide family-level quantitative information.

Heterozygosity analysis

All short reads were mapped to the genome using Bowtie2 (Langmead and Salzberg, 2012). Duplicate reads were marked and removed with PicardTools v1.131 (Broad Institute n.d.). Variants were called using Samtools v1.3 (Li et al., 2009). BCFTools v1.3.1 was used to filter variant calls in regions with parameters –g3 –G10 –e ‘%QUAL < 20 || (RPB < 0.1 && %QUAL < 30 || (DP < 30) || (DP > 250) || (MQ < 20)’.

Endogenous viral content

All genomes were screened for endogenous viral elements (EVEs) with DIGS for EVEs (Gifford n.d.) using the EVEs v0.1 reference library composed of EVEs derived from viruses in the Parvoviridae, Bornaviridae, Hepadnaviridae, Circoviridae, Filoviridae, and Bunyaviridae families.

Species tree generation

We extracted all 2,400 single-copy orthologous proteins (inferred by methods described above) and performed multiple sequence alignments of each group with Mafft v7.305b (Katoh and Standley, 2013). All alignments were trimmed with trimAL v1.3 (Capella-Gutiérrez et al., 2009) using the -automated1 parameter, and concatenated into a super-protein for each species. After concatenation, the super-proteins were re-trimmed with trimAL (-automated1 parameter) and used to generate a maximum likelihood species tree with RAxML v8.2.9 under a JTT + Г substitution model with empirical base frequencies (Stamatakis, 2014). 1,000 bootstrap replicates were used to assess branch reliability.

Gene family expansion and contraction analysis

Gene families from the proteomes of 15 mammals (as designated in above methods) were used as input for the CAFE v3.1 (Computational Analysis of gene Family Evolution) algorithm (De Bie et al., 2006) to estimate a value for lambda (the birth and death rate for a given gene per million years) by maximum likelihood. Since CAFE assumes that all input gene families have at least one member in the most recent common ancestor of all species, gene families were filtered to exclude single lineage families. Only families with at least one gene in both the Laurasiatheria superorder (bats, ungulates, carnivores) and the Euarchontoglires superorder (primates, rodents) were included. We manually corrected type I IFN gene family sizes after observing that several genes that were annotated as type I IFNs by the NCBI did not appear in our inferred gene families. Further exploration revealed that these additional IFNs get classified as single gene families or as unlabeled gene families because they consist of only bat IFNs. The resulting families were further filtered to exclude eight families with large ranges in size (> 100) across the tree, leaving 7,698 families for expansion and contraction analysis. CAFE was used to obtain a maximum likelihood estimate of a global birth and death rate parameter λ (lambda; rate of gain/loss per gene per million years) across the species tree (created by above methods) of 0.0169284. The size of each family at each ancestral node was estimated and used to obtain a family-wise p value to indicate non-random expansion or contraction for each family, as well as significant expansion or contraction across all branches of the species tree. Tables S3 and S4 contain all the gene families with expansions and contractions in R. aegyptiacus prior to and post IFN-correction (described above), respectively.

Annotation of genes

We used annotations derived from the NCBI pipeline to extract and characterize NK cell receptor genes and MHC genes. These annotations were supplemented with homology-based predictions from the gene family analysis described above, and manual correction after examination in BioEdit v7.0.0 (Hall, 1999). For example, out of the 12 genes classified as C-type lectins in our gene family analysis, one gene is missing exons and is considered a pseudogene. The NCBI gene database contains another gene annotated as an NKG2A-like gene, but was classified as a singleton in our gene family analysis because of missing exons. Additionally, one protein sequence identified as an NK cell receptor-like C-type lectin by our gene family analysis remained uncharacterized by the NCBI pipeline. Upon manual reannotation, we discovered that the uncharacterized protein sequence was actually a misannotation of two NKG2A-like genes—one complete pseudogene and one partial gene. Thus, this analysis is able to detect homologs and is robust to potential misannotation issues. To annotate the type I IFN genes, we used IFN sequences inferred from P. vampyrus sequencing traces from Kepler et al. (2010) as queries in a blastn search with the following parameters: -task blastn -evalue 0.05. Resulting hits were screened based on the lowest e-value, and the sequences were extracted with blastdbcmd and examined in BioEdit v7.0.0 for coding potential (Altschul et al., 1990, Hall, 1999). Gene annotations are shown in Figures 2, 4, 5, and S3. The scaffolds containing the NKC locus in Figure 2A are in order from top to bottom: NW_15493182.1, NW_15493451.1, NW_15493213.1, and NW_15494625.1. The scaffolds containing the MHC class I locus and extra-locus genes in Figure 4 are, in order from top to bottom: panel A. NW_015494903.1, NW_015493289.1, and NW_015494931.1; panel B, column 1: NW_015493957.1, NW_015493337.1, NW_015493066.1, NW_015493167.1, NW_015493330.1, NW_015493360.1, and NW_015494802.1; B, column 2: NW_015494846.1, NW_015493471.1, NW_015492968.1, NW_015494660.1, and NW_015493352.1. The scaffolds containing the type I IFN locus in Figure 5A are, in order from top to bottom – column 1: NW_015494712.1, NW_015494244.1, NW_015493479.1, NW_015493859.1, NW_015494258.1, NW_015492835., NW_015494622.1; column 2: NW_015493694.1, NW_015493581.1, NW_015493794.1, NW_015493974.1, NW_015494085.1, NW_015494371.1, NW_015494373.1, NW_015494147.1, NW_015494111.1, NW_015494299.1. The scaffolds containing the NKC loci in Figure S3B are, in order from top to bottom: P. vampyrus - NW_011888897.1, NW_011889578.1, NW_011889241.1, NW_011889581.1, NW_011889318.1; P. alecto - NW_006431924.1, NW_006436696.1, NW_006429163.1, NW_006432008.1; M. davidii - NW_006299270.1, NW_006295002.1, NW_006281977.1, NW_006289839.1; M. lucifugus - NW_005871058.1, NW_005873184.1, NW_005874133.1. To extract MHC class I putative nonamers, proteins derived from MHC class I genes (or the closest mouse homologs to HLA-A) were aligned, and nonamers determined based on location of known nonamer sequences (O’Callaghan, 2000, Kaiser et al., 2008). NKG2D ligands were collected from the NCBI Gene database or from UniProt (UniProt Consortium, 2015). For bat proteins, all putative functional, in-frame, non-partial genes were identified based on their annotation as NKG2D ligand-like by the NCBI Eukaryotic Annotation pipeline. In the phylogenetic tree displaying NKG2D ligands (Figure S7), abbreviations are as follows: Hsap (H. sapiens), Raeg (R. aegyptiacus), Pvam (P. vampyrus), Efus (E. fuscus), Pale (P. alecto), Mluc (M. lucifugus), Mbra (M. brandtii), Mdav (M. davidii), Harm (H. armiger), Mnat (M. natalensis), Rsin (R. sinicus), Mmus (M. musculus), Rnor (R. norvegicus).

Phylogenetic analysis

Gene sequences were retrieved from Ensembl release 90 (Yates et al., 2016) by looking up the human ortholog (horse ortholog for IFND) and retrieving every annotated placental mammal ortholog (except NKG2D ligands as noted above). To this, we added a curated set of bat orthologs retrieved from GenBank. The protein sequences were then aligned in Mega 6.0 (Tamura et al., 2013) with MUSCLE (Edgar, 2004) using a maximum of 20 iterations. Mega 6.0 was then used to estimate the most likely model, which in all cases was JTT+Γ. This model was used to generate phylogenies for each gene. Sites were not considered for phylogenetic analysis if a deletion was present in > 5% of sequences (10% for NKG2A). Bootstrap confidence values were determined using 500 replicates.

Transcriptome analysis

Reads from the Lee et al. (2015) dataset (SRA project: SRP066106) (Lee et al., 2015) were pseudoaligned to the set of CDSs annotated in the Raegyp_2.0 assembly supplemented with manually annotated type I IFN genes and NKG2 genes using kallisto v0.43.0 (Bray et al., 2016). Gene-level transcript counts were calculated by summing the transcript per million (TPM) values for all transcripts of a given gene. The resulting TPM values were log2 transformed and heatmaps were generated using pheatmap (Kolde, 2012). For Table S6, TPM values of IFNs were normalized by dividing by GAPDH TPM in the same tissue.

Sendai virus infection of RoNi cells

RoNi cells in 10% DMEM (GIBCO) were seeded at 5x104 in 24-well plates (Corning), then infected in triplicate by Sendai virus (SeV, Cantell strain, Charles River Laboratories, Wilmington, MA) at an MOI of 1, or mock infected. Fresh media was exchanged following a 1hr adsorption and plates were incubated at 37°C. Cells were harvested in 1x RNA Lysis/Binding Solution Concentrate (Thermo Fisher Scientific, Waltham, MA) at 3hr, 8hr and 24hr post-infection, followed by magnetic bead purification and TURBO DNase treatment using the MagMAX-96 Total RNA Isolation Kit (Ambion, Thermo Fisher Scientific) according to manufacturer guidelines. RNA extractions were performed on a MagMAX Express 96 Magnetic Particle Processor (Applied Biosystems). Purified total RNA samples were verified by NanoDrop spectrophotometer (Thermo Fisher Scientific) and stored at −80°C prior to use. RNA libraries were generated using the TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Human/Mouse/Rat High-Throughput kit (Illumina). The completed libraries were screened for quality with the High Sensitivity D1000 Screentape and Reagents on the Tapestation 2200 (Agilent) and the Library Qauntification kit (KAPA Biosystems). RNA-sequencing was performed using a dual-index paired end (2x125 bp) format on Illumina HiSeq 2500 with the Hiseq SBS Kit v4 (250 cycles) and the HiSeq PE Cluster Kit v4 (Illumina). Raw reads were cleaned with Trimmomatic v0.33 (Bolger et al., 2014), aligned to the R. aegyptiacus transcriptome (from NCBI, supplemented with manually annotated type I IFN genes and NKG2 genes) and quantified in TPM using kallisto v0.43.0 as described above (Bray et al., 2016).

Production of recombinant 6xHis-tagged IFNs

Human FreeStyle 293F cells (Thermo Fisher Scientific; see Key Resources Table) were maintained in FreeStyle 293 Expression Medium (Thermo Fisher Scientific) at 37°C at 8% CO2 with continuous shaking at 135 rpm. 293F cells at a density of 1x106/mL were transfected with plasmids encoding pCAGGS/6xHis–IFN-β1, and –IFN-ω4 (Blue Heron Biotech, Bothell, WA) using FreeStyle MAX reagent diluted in OptiPRO SFM (Thermo Fisher Scientific) according to the manufacturer’s protocol (https://www.thermofisher.com/us/en/home/references/protocols/cell-culture/transfection-protocol/freestyle-max-reagent.html#2). 3 days post-transfection, clarified media (centrifuged at 4°C for 1 hr at 4000rpm) was collected and frozen at −20°C till purification. BL21(DE) cells were transformed with pET22b/6xHis-PA-D1 and streaked on LB agar plates containing 50ug/mL ampicillin. Resulting colonies were grown in 4mL of 2xYt media (ampicillin 50ug/mL) at 37°C till a 600nm OD of 0.6 was reached. 3 mL of starter culture was added to 97 mL of 2xYt media (ampicillin 50ug/mL) and grown till a 600nm OD > 1. Cells were stimulated with IPTG (0.5mM) for 3 hr at 37°C, and then spun down. Cell pellets were frozen at −20°C before protein purification. Proteins were purified via a Capturem His-tagged purification maxiprep kit (Clontech, Takara Bio, Mountain View, CA) and dialyzed into sterile 1x PBS with a Vivaspin 2 protein concentration column (MWCO 10kDa; GE Life Sciences). Proteins were quantified by western blot for the 6x-His tag, and by 280nm absorbance measured on a NanoDrop spectrophotometer. Proteins were stored at −20°C; immediately before use, proteins were diluted in PBS and sterile filtered with a 0.2 μm pore filter.

RoNi cell antiviral assay

RoNi cells in 10% FBS media were seeded at 3x104 in 96-well plates, and treated with serial dilutions of recombinant 6xHis-tagged IFN-β1, recombinant 6xHis-tagged IFN-ω4, or an unrelated recombinant 6xHis-tagged protein (rPA-D1) for 4 hr at 37°C. The interferon-containing media was removed, and cells were infected with vesicular stomatitis virus encoding eGFP (VSV-eGFP; kindly provided by John Connor, Boston University School of Medicine; Whitlow et al., 2006, Whitlow et al., 2008) at an MOI of 0.05 in 2% FBS RoNi media, and imaged for GFP expression 1 day post infection (multiple viral replication cycles).

Selection analysis

Among many others, we included genes involved in dendritic cell maturation, induction, and signaling of type I IFNs, and cytokine responses, since dysfunctions of these pathways have all been reported as contributors to the pathogenesis of filoviruses in primates (Connor et al., 2015, Liu et al., 2017). We also included genes in DNA damage response pathways, since adaptations in DNA damage responses during the selection for flight in bats have been hypothesized to influence immune responses in bats (for a full list of genes studied, see Table S5). All genes of interest (immune genes, echolocation genes, flight genes) were downloaded from RefSeq in GenBank format and coding sequences for each gene were extracted using a python script. For each member of an ortholog group, the longest isoform was selected. Amino acid sequences of ortholog groups were aligned with Muscle v3.8.31 (Edgar, 2004). The resulting alignments were then trimmed with trimAl v1.4 (Capella-Gutiérrez et al., 2009) to retain only high-quality aligned regions and alignment columns containing gaps. We used codeml within the PAML v4.9b suite of phylogenetic analysis tools to estimate ω (omega), the nonsynonymous/synonymous nucleotide substitution rate ratio, and κ (kappa), the transition/transversion rate ratio with the F3 × 4 codon frequency matrix (Bielawski and Yang, 2005, Yang, 1998). Seven separate models were used (Figure S2) and maximum likelihood scores estimated for each. The models shown in Figure S2 describe the hierarchy of models tested for each gene included in the analysis. Likelihood ratio tests were performed, progressively allowing more degrees of freedom. Model 0 assumes one rate of evolution in all branches. The fit of three models (1a, 1b, and 1c) are separately compared to Model 0 via a likelihood ratio test, and the best fitting model (highest likelihood of the three) is then compared to the next model in the hierarchy (e.g., Model 1a is compared to 2a, Model 1c is compared to both 2a and 2b), and the best fitting model is then compared to Model 3. If the next model in the hierarchy does not fit the data better than the previous model, the previous model is taken to be the most likely. Two-ratio unconstrained models were first tested against the one-ratio null model to test for differential selective pressure. Any gene for which the null model was rejected proceeded to testing of the most likely two-ratio model against all three-ratio models in which it was nested using the same procedure, and, finally, any gene for which the two-ratio model was rejected against any three-ratio model proceeded to testing the most likely three-ratio model against the four-ratio model. FDR was controlled at each level using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). Models showing ω1 > 1 along any branch were then compared against two-ratio constrained models with ω1 fixed at 1 to test for neutral evolution. Branches showing significantly higher than average, but not greater than 1, ω1 may have positively selected sites. However, relaxation of purifying selection along these branches cannot be ruled out using these models. Analyses were run with multiple starting values of both κ (2,2.5,4) and ω (0.1,1,2) to provide a check against local optimas. These analyses were automated using LMAP v1.0.0 (Maldonado et al., 2016).

Quantification and Statistical Analysis

Significance of expanded and contracted gene families was determined as follows: CAFE v3.1 provides a family-wide p value for gene families that evolved differently than expected, and additionally provides a Viterbi p value that assigns a p value to the contribution of each species to the family-wide evolution across the tree (De Bie et al., 2006). The Benjamini-Hochberg procedure was applied to family-wide p values for the gene families that evolved differently than expected. The p values were ranked from lowest to highest with identical p values assigned the same rank, and the rank for the next non-identical p value incremented by the total size of the identical group. The rank of each p value was divided by the total number of hypotheses (7,698, for each gene family) and multiplied by 0.05 to obtain the adjusted p value with a false-discovery rate of 0.05. For families with adjusted p values < 0.05, the Viterbi p values were used to determine whether the expansion or contraction in a specific lineage was significant. Families with expansions and contractions in Figure 1 have family-wide adjusted p values less than 0.05 and Viterbi p values less than 0.05. Significance of differential expression between Sendai virus-infected samples and mock-infected samples in Figure S6C was determined via an unpaired two tailed t test with TPM values from three biological replicates, with adjusted p values considered significant if less than 0.05. Meta-analysis of viremia and oral shedding of MARV-infected R. aegyptiacus bats from two previous studies (Amman et al., 2015, Schuh et al., 2017) was conducted by averaging raw TCID50 equivalent units per mL values across both experiments, with values matched to respective time point post-infection up to 28 days. The standard deviation of the averages was taken for each time point against the number of MARV-positive bat samples, except at time points with fewer than two MARV-positive bats. Linear values were then converted into log form, and trendlines of the average log values from each dataset across time were included using the moving average option with a periodicity of 2.

Data and Software Availability

The accession number for the Rousettus aegyptiacus genome Raegyp2.0 reported in this paper is GenBank: GCA_001466805.2 and RefSeq: GCF_001466805.2. The accession number for the raw and analyzed SeV-infected RoNi cell transcriptome data reported in this paper is GEO: GSE108941. The accession number for the whole genome shotgun sequencing project reported in this paper is GenBank: LOCP00000000.2. The images of bats, cow, mouse, horse, monkey, and dog are used under a creative commons license (CC BY 4.0, Clipart by AnimalsClipart.com). The image of a pig is used under a creative commons license (CC BY 3.0, Anysnapshot.com). The images of hamster and guinea pig are used under a creative commons license (CC BY 4.0, Author: Bob Comix). The image of a human is in the public domain (CC0 1.0). These images can be found at the following websites: http://anysnapshot.com/pig-graphics/, http://animalsclipart.com/bats-flying-silhouette/, http://animalsclipart.com/gallery/mouses/page/2/, http://animalsclipart.com/gallery/cows/page/2/, http://animalsclipart.com/gallery/dogs/page/3/, https://commons.wikimedia.org/wiki/File:Typical_ASL_Signing_Space_representation.svg, http://animalsclipart.com/gallery/horses/page/2/, http://www.supercoloring.com/silhouettes/hamster, http://www.supercoloring.com/silhouettes/guinea-pig, http://animalsclipart.com/gallery/monkeys/page/2/

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Bacterial and Virus Strains

Sendai virus, Cantell strain	Charles River Laboratories	Cat# 10100774
Vesicular stomatitis virus-eGFP	Whitlow et al., 2006, Whitlow et al., 2008; (Laboratory of John Connor, Boston University School of Medicine)	N/A

Biological Samples

Healthy, wild-caught, older juvenile male Egyptian Rousette bat (R. aegyptiacus) – liver and spleen tissue	This paper	https://www.ncbi.nlm.nih.gov/biosample/SAMN04287759

Chemicals, Peptides, and Recombinant Proteins

Recombinant 6xHis-IFN-β1 (R. aegyptiacus protein)	This paper (Blue Heron Biotech)	N/A
Recombinant 6xHis-IFN-ω4 (R. aegyptiacus protein)	This paper (Blue Heron Biotech)	N/A
Recombinant 6xHis-PA-D1 (Domain 1 of B. anthracis PA protein)	This paper (Blue Heron Biotech)	N/A
Human: FreeStyle 293-F Cells	Thermo Fisher Scientific	Cat# R79007
FreeStyle MAX Reagent	Thermo Fisher Scientific	Cat# 16447500
OptiPRO SFM (1X)	Thermo Fisher Scientific	Cat# 12309-050
Lysis/Binding Solution Concentrate	Thermo Fisher Scientific	Cat# 4462362

Critical Commercial Assays

Capturem His-tagged purification maxiprep kit	Clontech, Takara Bio	Cat# 635713
Vivaspin 2 Protein Concentrators, MWCO 10000	GE Life Sciences	Cat# 28932247
UltraClean Blood DNA Isolation Kit (customized protocol)	MO BIO Laboratories	Cat# 12000-100
Prep X Complete ILMN DNA Library Kit	WaferGen Biosystems	Cat# 640101
KAPA Library Amplification kit	KAPA Biosystems	Cat# KK2621
KAPA Library Quantification kit	KAPA Biosystems	Cat# KK4824
TruSeq PE Cluster Kit v3-cBot-HS	Illumina	Cat# PE-401-3001
TruSeq SBS Kit – HS (200 cycle)	Illumina	Cat# FC-401-3001
MagMAX-96 Total RNA Isolation Kit	Ambion	Cat# AM1830
SMRTbell Template Prep Kit 1.0	Pacific Biosciences	Cat# 100-259-100
DNA/Polymerase Binding Kit P4	Pacific Biosciences	Cat# 100-236-500
DNA/Polymerase Binding Kit P5	Pacific Biosciences	Cat# 100-256-000
DNA Sequencing Reagent 2.0	Pacific Biosciences	Cat# 100-216-400
DNA Sequencing Reagent 3.0	Pacific Biosciences	Cat# 100-254-800
SMRT Cell 8Pack V3	Pacific Biosciences	Cat# 100-171-800
Bioanalyzer DNA 12000 Kit	Agilent	Cat# 5067-1508
TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Human/Mouse/Rat High Throughput (96 samples, 96 indexes)	Illumina	Cat# 20020597
HiSeq PE Cluster Kit V4 - cBot	Illumina	Cat# PE-401-4001
HiSeq SBS Kit V4 250 cycle kit	Illumina	Cat# FC-401-4003
High Sensitivity D1000 ScreenTape	Agilent	Cat# 5067-5582
High Sensitivity D1000 Reagents	Agilent	Cat# 5067-5583

Deposited Data

Raegyp2.0 genome sequence	This paper	GenBank: GCA_001466805.2; RefSeq: GCF_001466805.2; GenBank: LOCP00000000.2
R. aegyptiacus transcriptome raw and analyzed data	Lee et al., 2015	GenBank: GECF00000000.1; SRA Project: SRP066106
Sendai virus-infected RoNi cell transcriptome data	This paper	NCBI GEO: GSE108941
Ensembl database release 90	Yates et al., 2016	http://aug2017.archive.ensembl.org/index.html
RefSeq proteins – Pteropus vampyrus	Baylor College of Medicine; NCBI RefSeq	RefSeq: GCF_000151845.1
RefSeq proteins – Pteropus alecto	Zhang et al., 2013; NCBI RefSeq	RefSeq: GCF_000325575.1
RefSeq proteins – Myotis davidii	Zhang et al., 2013; NCBI RefSeq	RefSeq: GCF_000327345.1
RefSeq proteins – Myotis lucifugus	Broad Institute; NCBI RefSeq	RefSeq: GCF_000147115.1
RefSeq proteins – Macaca fascicularis	Washington University; NCBI RefSeq	RefSeq: GCF_000364345.1
RefSeq proteins – Macaca mulatta	Baylor College of Medicine Genome Sequencing Center; NCBI RefSeq	RefSeq: GCF_000772875.2
RefSeq proteins – Cavia porcellus	The Genome Sequencing Platform, The Genome Assembly Team; NCBI RefSeq	RefSeq: GCF_000151735.1
RefSeq proteins – Cricetulus griseus	Beijing Genomics Institute; NCBI RefSeq	RefSeq: GCF_000223135.1
RefSeq proteins – Equus caballus	The Genome Assembly Team; NCBI RefSeq	RefSeq: GCF_000002305.2
RefSeq proteins – Mus musculus	Genome Reference Consortium; NCBI RefSeq	RefSeq: GCF_000001635.24
RefSeq proteins – Sus scrofa	The Swine Genome Sequencing Consortium (SGSC); NCBI RefSeq	RefSeq: GCF_000003025.5
RefSeq proteins – Bos taurus	Cattle Genome Sequencing International Consortium; NCBI RefSeq	RefSeq: GCF_000003205.7
RefSeq proteins – Canis familiaris	Dog Genome Sequencing Consortium; NCBI RefSeq	RefSeq: GCF_000002285.3
RefSeq proteins – Homo sapiens	Genome Reference Consortium; NCBI RefSeq	RefSeq: GCF_000001405.33
RefSeq proteins – Rousettus aegyptiacus	This paper	RefSeq: GCF_001466805.2

Experimental Models: Cell Lines

R. aegyptiacus: kidney fibroblast cells (RoNi)	Biesold et al., 2011	N/A

Recombinant DNA

pCAGGS/6xHis–IFN-ω4	This paper	N/A
pCAGGS/6xHis–IFN-β1	This paper	N/A
pET22b/6xHis-PA-D1	This paper	N/A

Software and Algorithms

Jellyfish	Marçais and Kingsford, 2011	https://github.com/gmarcais/Jellyfish/releases
SparseAssembler	Ye et al., 2012	https://sourceforge.net/projects/sparseassembler/
LoRDEC v0.5	Salmela and Rivals, 2014	https://gite.lirmm.fr/lordec/lordec-releases/wikis/home
DBG2OLC	Ye et al., 2016	https://github.com/yechengxi/DBG2OLC
LINKS v1.5.1	Warren et al., 2015	https://github.com/warrenlr/LINKS
L_RNA_Scaffolder (downloaded 9/28/2015)	Xue et al., 2013	http://www.fishbrowser.org/software/L_RNA_scaffolder/index.php/Home/Index/downloads.html
SSPACE v3.0	Boetzer et al., 2011	https://github.com/nsoranzo/sspace_basic
Bowtie2	Langmead and Salzberg, 2012	http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Pilon	Broad Institute,, Walker et al., 2014	https://github.com/broadinstitute/pilon/releases/
Quiver	Pacific Biosciences	https://github.com/PacificBiosciences/GenomicConsensus
QUAST v3.0	Gurevich et al., 2013	http://bioinf.spbau.ru/quast
CEGMA v2.5	Parra et al., 2007	http://korflab.ucdavis.edu/datasets/cegma/
bwa-mem v0.7.10	Li, 2013	https://sourceforge.net/projects/bio-bwa/files/
SAMtools v0.1.18; v1.3	Li et al., 2009	http://samtools.sourceforge.net/; http://www.htslib.org/
NCBI Eukaryotic Genome Annotation Pipeline; Gnomon tRNAscan-SE	Thibaud-Nissen et al., 2013, National Center for Biotechnology Information, n.d.; Lowe and Eddy, 1997	https://www.ncbi.nlm.nih.gov/genome/annotation_euk/;
		https://www.ncbi.nlm.nih.gov/genome/annotation_euk/gnomon/
WindowMasker	Morgulis et al., 2006	ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/
Splign; ProSplign	Kapustin et al., 2008	https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi
OrthoMCL v2.0.9	Fischer et al., 2011	http://orthomcl.org/common/downloads/software/v2.0/
bioDB	Mudunuri et al., 2009	https://biodbnet-abcc.ncifcrf.gov/db/db2db.php
RepeatMasker	Smit et al. n.d.	http://www.repeatmasker.org/RMDownload.html
“One code to find them all” script	Bailly-Bechet et al., 2014	http://doua.prabi.fr/software/one-code-to-find-them-all
PicardTools v1.131	Broad Institute	http://broadinstitute.github.io/picard/
BCFTools v1.3.1	Li et al., 2009	http://www.htslib.org/
Mafft v7.305b	Katoh and Standley, 2013	https://mafft.cbrc.jp/alignment/software/
trimAL v1.3; v1.4	Capella-Gutiérrez et al., 2009	http://trimal.cgenomics.org/downloads
RAxML v8.2.9	Stamatakis, 2014	https://github.com/stamatak/standard-RAxML
CAFE v3.1	De Bie et al., 2006	https://github.com/hahnlab/CAFE
Mega 6.0	Tamura et al., 2013	https://www.megasoftware.net/
MUSCLE v3.8.31	Edgar, 2004	http://www.drive5.com/muscle/
kallisto v.0.43.0	Bray et al., 2016	https://pachterlab.github.io/kallisto/download
pheatmap	Kolde 2012	https://cran.r-project.org/web/packages/pheatmap/
PAML v4.9b	Bielawski and Yang, 2005, Yang, 1998	http://abacus.gene.ucl.ac.uk/software.html
LMAP v1.0.0	Maldonado et al., 2016	http://lmapaml.sourceforge.net/
BioEdit v7.0.0	Hall, 1999	http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Trimmomatic-0.33	Bolger et al., 2014	http://www.usadellab.org/cms/?page=trimmomatic
DIGS for EVE, EVE library v0.1	Gifford n.d.	https://github.com/giffordlabcvr/DIGS-for-EVEs

Other

PANTHER database	Mi et al., 2016	http://www.pantherdb.org/
Ensembl release 90	Yates et al., 2016	http://www.ensembl.org/index.html?redirect=no

74 in total

Review 1. Immune functions encoded by the natural killer gene complex.

Authors: Wayne M Yokoyama; Beatrice F M Plougastel
Journal: Nat Rev Immunol Date: 2003-04 Impact factor: 53.106

2. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.

Authors: Genis Parra; Keith Bradnam; Ian Korf
Journal: Bioinformatics Date: 2007-03-01 Impact factor: 6.937

3. QUAST: quality assessment tool for genome assemblies.

Authors: Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal: Bioinformatics Date: 2013-02-19 Impact factor: 6.937

4. Cutting edge: KIR2DL4 transduces signals into human NK cells through association with the Fc receptor gamma protein.

Authors: Akiko Kikuchi-Maki; Tracey L Catina; Kerry S Campbell
Journal: J Immunol Date: 2005-04-01 Impact factor: 5.422

Review 5. Activating natural cytotoxicity receptors of natural killer cells in cancer and infection.

Authors: Joachim Koch; Alexander Steinle; Carsten Watzl; Ofer Mandelboim
Journal: Trends Immunol Date: 2013-02-13 Impact factor: 16.687

6. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

7. Experimental Inoculation of Egyptian Rousette Bats (Rousettus aegyptiacus) with Viruses of the Ebolavirus and Marburgvirus Genera.

Authors: Megan E B Jones; Amy J Schuh; Brian R Amman; Tara K Sealy; Sherif R Zaki; Stuart T Nichol; Jonathan S Towner
Journal: Viruses Date: 2015-06-25 Impact factor: 5.048

8. Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells.

Authors: Martin Hölzer; Verena Krähling; Fabian Amman; Emanuel Barth; Stephan H Bernhart; Victor A O Carmelo; Maximilian Collatz; Gero Doose; Florian Eggenhofer; Jan Ewald; Jörg Fallmann; Lasse M Feldhahn; Markus Fricke; Juliane Gebauer; Andreas J Gruber; Franziska Hufsky; Henrike Indrischek; Sabina Kanton; Jörg Linde; Nelly Mostajo; Roman Ochsenreiter; Konstantin Riege; Lorena Rivarola-Duarte; Abdullah H Sahyoun; Sita J Saunders; Stefan E Seemann; Andrea Tanzer; Bertram Vogel; Stefanie Wehner; Michael T Wolfinger; Rolf Backofen; Jan Gorodkin; Ivo Grosse; Ivo Hofacker; Steve Hoffmann; Christoph Kaleta; Peter F Stadler; Stephan Becker; Manja Marz
Journal: Sci Rep Date: 2016-10-07 Impact factor: 4.379

9. Ensembl 2016.

Authors: Andrew Yates; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Nathan Johnson; Thomas Juettemann; Stephen Keenan; Ilias Lavidas; Fergal J Martin; Thomas Maurel; William McLaren; Daniel N Murphy; Rishi Nag; Michael Nuhn; Anne Parker; Mateus Patricio; Miguel Pignatelli; Matthew Rahtz; Harpreet Singh Riat; Daniel Sheppard; Kieron Taylor; Anja Thormann; Alessandro Vullo; Steven P Wilder; Amonida Zadissa; Ewan Birney; Jennifer Harrow; Matthieu Muffato; Emily Perry; Magali Ruffier; Giulietta Spudich; Stephen J Trevanion; Fiona Cunningham; Bronwen L Aken; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2015-12-19 Impact factor: 16.971

Review 10. Evolution of Interferons and Interferon Receptors.

Authors: Chris J Secombes; Jun Zou
Journal: Front Immunol Date: 2017-03-02 Impact factor: 7.561

81 in total

1. A comprehensive annotation and differential expression analysis of short and long non-coding RNAs in 16 bat genomes.

Authors: Nelly F Mostajo; Marie Lataretu; Sebastian Krautwurst; Florian Mock; Daniel Desirò; Kevin Lamkiewicz; Maximilian Collatz; Andreas Schoen; Friedemann Weber; Manja Marz; Martin Hölzer
Journal: NAR Genom Bioinform Date: 2019-09-30

2. Structure and Peptidome of the Bat MHC Class I Molecule Reveal a Novel Mechanism Leading to High-Affinity Peptide Binding.

Authors: Zehui Qu; Zibin Li; Lizhen Ma; Xiaohui Wei; Lijie Zhang; Ruiying Liang; Geng Meng; Nianzhi Zhang; Chun Xia
Journal: J Immunol Date: 2019-05-10 Impact factor: 5.422

3. Evolution of Hepatitis B Virus Receptor NTCP Reveals Differential Pathogenicities and Species Specificities of Hepadnaviruses in Primates, Rodents, and Bats.

Authors: Lucie Etienne; Dominique Pontier; Stéphanie Jacquet; Jean-Baptiste Pons; Ariel De Bernardo; Barthélémy Ngoubangoye; François-Loic Cosset; Corinne Régis
Journal: J Virol Date: 2019-02-19 Impact factor: 5.103

4. Orthogonal genome-wide screens of bat cells identify MTHFD1 as a target of broad antiviral therapy.

Authors: Danielle E Anderson; Jin Cui; Qian Ye; Baoying Huang; Ya Tan; Chao Jiang; Wenhong Zu; Jing Gong; Weiqiang Liu; So Young Kim; Biao Guo Yan; Kristmundur Sigmundsson; Xiao Fang Lim; Fei Ye; Peihua Niu; Aaron T Irving; Haoyu Zhang; Yefeng Tang; Xuming Zhou; Yu Wang; Wenjie Tan; Lin-Fa Wang; Xu Tan
Journal: Proc Natl Acad Sci U S A Date: 2021-09-28 Impact factor: 11.205

5. RTP4 Is a Potent IFN-Inducible Anti-flavivirus Effector Engaged in a Host-Virus Arms Race in Bats and Other Mammals.

Authors: Ian N Boys; Elaine Xu; Katrina B Mar; Pamela C De La Cruz-Rivera; Jennifer L Eitson; Benjamin Moon; John W Schoggins
Journal: Cell Host Microbe Date: 2020-10-27 Impact factor: 21.023

6. Antibody-Mediated Virus Neutralization Is Not a Universal Mechanism of Marburg, Ebola, or Sosuga Virus Clearance in Egyptian Rousette Bats.

Authors: Amy J Schuh; Brian R Amman; Tara K Sealy; Markus H Kainulainen; Ayan K Chakrabarti; Lisa W Guerrero; Stuart T Nichol; Cesar G Albarino; Jonathan S Towner
Journal: J Infect Dis Date: 2019-05-05 Impact factor: 5.226