Literature DB >> 34258218

Micropathogen community identification in ticks (Acari: Ixodidae) using third-generation sequencing.

Jin Luo^1,2, Qiaoyun Ren¹, Wenge Liu¹, Xiangrui Li², Mingxin Song³, Guiquan Guan¹, Jianxun Luo¹, Guangyuan Liu¹.

Abstract

Ticks are important vectors that facilitate the transmission of a broad range of micropathogens to vertebrates, including humans. Because of their role in disease transmission, it has become increasingly important to identify and characterize the micropathogen profiles of tick populations. The objective of the present study was to survey the micropathogens of ticks by third-generation metagenomic sequencing using the PacBio Sequel platform. Approximately 46.481 Gbp of raw micropathogen sequence data were obtained from samples from four different regions of Heilongjiang Province, China. The clean consensus sequences were compared with host sequences and filtered at 90% similarity. Most of the identified genomes represent previously unsequenced strains. The draft genomes contain an average of 397,746 proteins predicted to be associated with micropathogens, over 30% of which do not have an adequate match in public databases. In these data, Anaplasma phagocytophilum and Coxiella burnetii were detected in all samples, while Borrelia burgdorferi was detected only in Ixodes persulcatus ticks from G1 samples. Viruses are a key component of micropathogen populations. In the present study, Simian foamy virus, Pustyn virus and Crimean-Congo haemorrhagic fever orthonairovirus were detected in different samples, and more than 10-30% of the viral community in all samples comprised unknown viruses. Deep metagenomic shotgun sequencing has emerged as a powerful tool to investigate the composition and function of complex microbial communities. Thus, our dataset substantially improves the coverage of tick micropathogen genomes in public databases and represents a valuable resource for micropathogen discovery and for studies of tick-borne diseases.

Entities: Chemical Disease Gene Species

Keywords: Metagenomic; Microbial communities; Micropathogens; Third-generation sequencing; Ticks

Year: 2021 PMID： 34258218 PMCID： PMC8253887 DOI： 10.1016/j.ijppaw.2021.06.003

Source DB: PubMed Journal: Int J Parasitol Parasites Wildl ISSN： 2213-2244 Impact factor: 2.674

Introduction

Micropathogens pose serious threats to the health of livestock, wildlife and even humans. In many cases, the spread of disease is mediated through micropathogens residing within arthropod vectors (Paula et al., 2017). Ticks are important vectors that facilitate the transmission of a broad range of micropathogens to vertebrates, including humans (Eisen et al., 2017). These small arachnids are capable of transmitting micropathogens, including viruses, bacteria, protozoa, and fungi (Jahfari et al., 2017; Krzysztof et al., 2015). Ticks and the micropathogens that they transmit cause direct damage to animals by reducing animal weight, milk production and leather quality (Luo et al., 2019). However, a wide variety of micropathogens are present in ticks, and the confirmed micropathogens represent only a small percentage of all micropathogens. Because of their role in disease transmission, it has become increasingly important to identify and characterize the micropathogen profiles of tick populations. In addition, the African swine fever incident that was spread by soft ticks in 2018 in China caused substantial losses to the pig industry, attracting the attention of related fields. Some undetected micropathogens are also likely to cause severe disease in animals or humans. For example, in 2010, a previously undetected bunyavirus severely impacted many people in Henan Province (Xu et al., 2011). Therefore, it is particularly important to predict the micropathogens that may be carried by vectors. Currently, the use of PCR amplicon sequencing is limited to a few predefined targets (Adrian et al., 2020; Latrofa et al., 2020). Thus, deep sequencing is used for micropathogen detection. In 2017, micropathogens were detected in ticks by small RNA sequencing. micropathogens, such as viruses and bacteria, detected in ticks in this study had not been previously detected in ticks, these results provided a good basis for research on important tick-borne pathogens (Luo et al., 2017). However, because the sequences were short, the assembled data were incomplete. As a result, research on the identification of pathogens is largely lacking. In recent years, unbiased next-generation metagenomic sequencing (NGMS) has been used to identify novel and emerging human pathogens circulating in tick vectors. For example, Heartland virus, discovered in Missouri in 2012 via NGMS, is transmitted by the lone star tick (Amblyomma americanum) and is a potential cause of febrile illness and death in humans (Laura et al., 2012). Heartland virus has since been detected in mammalian hosts in 13 U.S. states (Riemersma et al., 2015). However, a shortcoming of NGMS is sequence length, which cannot accurately identify similar sequences. In 2015, PacBio launched a new and upgraded third-generation sequencing instrument, the PacBio Sequel sequencing system (Lavezzo et al., 2016; Rhoads et al., 2015; Wagner et al., 2016; Shingo et al., 2018). The associated read length, high throughput, high accuracy and other features of this instrument have brought a new third-generation metagenomic sequencing (TGMS) experience to the research field. In the present study, micropathogens were detected using TGMS to assess population characteristics and the diversity of micropathogens in ticks.

Materials and methods

Tick collection and DNA extraction

Unengorged ticks [Haemaphysalis japonica (H. japonica) (n = 102), Ixodes persulcatus (I. persulcatus) (n = 97), Dermacentor silvarum (D. silvarum) (n = 150) and Haemaphysalis concinna (H. concinna) (n = 112)] were collected throughout Jiameng forest farm and Sanguliu Yichun city (longitude 128°48′-129°08′ east, latitude 47°41′-48°04′ north), Luobei County, Hegang city (longitude 130°01′-131°34′ east, latitude 47°12′-48°21′ north) and Jiejinkou Tongjiang city (longitude 132°50′25.61″ east, latitude 47°56′3.60″ north), Heilongjiang Province, China, from 15 May to 23 July in 2018. All ticks were collected by the flag method, which is suitable for less populated grasslands and involves dragging a white cloth of approximately 1 square metre on grass to collect unfed ticks. The collected ticks were identified by morphology in the Department of Veterinary Parasitology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Based on the epidemiological characteristics of local dominant ticks, these samples were separated into four different groups. All H. japonica, I. persulcatus, D. silvarum and H. concinna ticks were collected from different regions and mixed as groups 1 (G1), 2 (G2), 3 (G3) and 4 (G4). To sterilize the external surface of the ticks to ensure that the micropathogen sequences were internal to the ticks, the collected ticks were immediately placed in phosphate-buffered saline (PBS) and washed twice in a solution containing 0.133 M NaCl, 1.11% sodium dodecyl sulphate (SDS) and 0.0088 M ethylenediaminetetraacetic acid (EDTA) (Luo et al., 2017). These clean ticks were mixed and stored in liquid nitrogen until ground using a mortar and pestle with liquid nitrogen, and genomic DNA (gDNA) was extracted with a QIAamp DNA Mini Kit (QIAGEN, China) following the manufacturer's instructions.

Library construction and sequencing

Total DNA quality was analysed on a PacBio RS II sequencing platform analyser system and by resolution on a denaturing polyacrylamide gel electrophoresis system. A DNA database library was generated according to the DNA sample preparation instructions. Subsequently, DNA was amplified with Pfx DNA polymerase (Invitrogen, China) using 20 PCR cycles and a PacBio DNA primer set. PCR products were purified, and the recovered DNA was precipitated and quantified with both a Nanodrop Spectrophotometer (Thermo Scientific) and a TBS-380 mini fluorometer (Turner Biosystems) using PicoGreenH dsDNA quantitation reagent (Invitrogen). The sample concentration was adjusted to 10 nM, and a final volume of 10 mL was used for the sequencing reaction. The purified DNA library was used for cluster generation (on the PacBio Cluster Station). Subsequently, DNA was sequenced on a PacBio Sequel machine following the manufacturer's instructions (Nextomics), and the library construction process is shown in Fig. 1. The gDNA concentration was normalized by dilution from a high to a low concentration and was then sequenced using the PacBio platform.

Fig. 1

The genomic DNA library was generated according to the PacBio Sequel sample preparation instructions.

Transcriptome sequence analysis

The raw sequencing reads contained low-quality sequences (Table 1). To generate reliable data for analysis, we processed the raw reads as follows. 1) Quality control of the sequencing data: The circular consensus sequencing (CCS) workflow of SMRT Link software (Yan et al., 2020) and the Arrow algorithm (Sontag et al., 2009) were used to obtain high-quality raw CCS reads with the primary parameters of --minPasses 1 --polish -minPredictedAccuracy 0.8. To obtain clean CCS reads, accuracy and length filtering were performed with the parameters accuracy ≥ 99% and length ≥ 500 bp. CCS and host sequence filtering were performed using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to compare the clean CCS reads with the host sequences, which were filtered at a 90% similarity level. In this process, Rhipicephalus microplus (https://www.ncbi.nlm.nih.gov/assembly/GCA_002176555.1/) and Ixodes scapularis (https://www.ncbi.nlm.nih.gov/assembly/GCF_000208615.1/) were used as references. 2) Gene prediction: The genes were predicted from samples of metagenomic sequences using MetageneMark (Wazim et al., 2014), and the primary parameter was -p meta. Redundant sequences were removed from the predicted genes based on a 95% similarity level and 90% coverage using CD-HIT. 3) Species annotation and taxonomic analysis of pathogens: Sequence alignment with the nonredundant (NR) database was performed with DIAMOND and the CAZy database; nonredundant gene sets were compared to the entire NR database, and all gene annotation results were obtained. The primary parameters were --evalue 0.00001 and --sensitive. For species annotation and statistical analysis of the gene annotation results obtained, the LCA algorithm was used to annotate all CCS reads, and finally, species abundance information was calculated based on the CCS annotation results. 4) Functional annotation: To examine the evolutionary genealogy of genes, the non-supervised orthologous groups (eggNOG) database provides a functional annotation of constructed orthologous groups using the Smith-Waterman matching algorithm. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database is now a comprehensive database, at the core of which are the KEGG Pathway and KEGG Orthology (KO) databases.

Table 1

Sequencing data statistics for each sample.

Samples	Raw Bases (Gbp)	Raw CCS	Low quality reads	Clean CCS	Host Removal	Bases (Mbp)	N50 Length (bp)
G1	14.876	525,604	142,425	383,179	348,745	494.500	1417
G2	9.844	380,783	185,783	195,000	153,240	368,792	2406
G3	10.798	457,749	246,077	211,672	145,837	359,253	2463
G4	10.963	442,502	225,154	217,348	162,561	405.766	2496
Average	11.620	451,659	199,860	251,799	202,596	407,078	2196

Note: Sample: Sample name. Raw Bases: Raw bases of subreads (Gbp). Raw CCS: Sequence consistency analysis and preliminary quality control of subreads were performed to obtain the number of original CCS data. Clean CCS: The number of CCS results obtained by further performing a series of data quality controls on the original CCS data. Host Removal: The host filtering sequence for Clean CCS is the final sequence set to enter subsequent analyses.

Sequencing data statistics for each sample. Note: Sample: Sample name. Raw Bases: Raw bases of subreads (Gbp). Raw CCS: Sequence consistency analysis and preliminary quality control of subreads were performed to obtain the number of original CCS data. Clean CCS: The number of CCS results obtained by further performing a series of data quality controls on the original CCS data. Host Removal: The host filtering sequence for Clean CCS is the final sequence set to enter subsequent analyses.

Species abundance and community composition

From the phylum to species level, the top 50 species were selected by the maximum ranking method, and a community composition heat map was drawn based on species abundance. The heat map of community composition allowed the proportions of different species in different samples to be easily identified, which is convenient for the discovery of dominant species. The redundant genes were analysed in the NR database using DIAMOND, and the corresponding gene copies were then combined to calculate the species abundance in each sample. For all samples, from the kingdom to species level (kingdom, phylum, class, order, family, genus and species), the maximum ranking method was used to select the species with an abundance greater than 1% among the top 100 species. Then, a histogram of the relative abundances of species was drawn, which is convenient for viewing the dominant species in each sample.

Microbial sequence detection using BLAST and PCR confirmation

BLAST searches were conducted with NCBI BLAST 2.2.26 to identify micropathogen sequences in the clean unique reads. The results were then manually analysed to screen for potential viral, bacterial, fungal, and protozoan sequences. To verify the reliability of the data pertaining to microbial community composition in various samples following third-generation sequencing, the presence of previously described micropathogens, including Simian foamy virus, Crimean-Congo haemorrhagic fever orthonairovirus, Coxiella burnetii (C. burnetii), Borrelia burgdorferi (B. burgdorferi) and Anaplasma phagocytophilum (A. phagocytophilum), was assessed by PCR using micropathogen-specific primers (listed in Table 4). PCR amplification was performed in an automatic DNA thermocycler (Bio-Rad, Hercules, CA, USA), and the PCR products were separated by 1.5% agarose gel electrophoresis to assess the presence of specific bands indicative of micropathogens (data not shown). The DNA fragments generated were recovered, ligated into the pGEMR-T Easy vector (Invitrogen, Carlsbad, CA, USA) and transformed into competent Escherichia coli DH5α cells (Takara Bio Inc., Dalian, China). At least three positive clones were sequenced per sample by GenScript Corporation (Piscataway, NJ, USA).

Table 4

Oligonucleotides used as primers for PCR analysis of micropathogens.

Micropathogens	Primer names	Primer sequence (5′–3′)	References
Simian foamy virus	SFVpol	5′-CCTGGATGCAGAGTTGGATC-3′	Reid MJC et al. (2017)
Simian foamy virus	SFVpol874	5′-CACGAATTTCCTGTAAAAAGA-3′	Reid MJC et al. (2017)
Crimean-Congo haemorrhagic fever agent	CCHFVF	5′-TGGACACCTTCACAAACTC-3′	Tekin S et al. (2012)
Crimean-Congo haemorrhagic fever agent	CCHFV536R	5′-GACAAATTCCCTGCACCA-3′	Tekin S et al. (2012)
Coxiella burnetii	CbUF	5′-AAGGATCCAATTAACCGTTGTAGTT-3′	Qi Y et al. (2018)
Coxiella burnetii	CbUR1042	5′-CGGAATTCTCACTCTTTCCTATGTT-3′	Qi Y et al. (2018)
Borrelia burgdorferi	BBUF	5′- CACGA CTT TCT TCG CCT TAA AGC-3′	Maggi RG et al. (2019)
Borrelia burgdorferi	BBUR	5′- GTT AAG CTC TTA TTC GCT GAT GGT A-3′	Maggi RG et al. (2019)
Anaplasma phagocytophilum	APHmsp4F	5′-ATGAATTACAGAGAATTGCTTGTAGG-3′	de la Fuente J et al. (2005)
Anaplasma phagocytophilum	APHmsp849R	5′-TTAATTGAAAGCAAATCTTGCTCCTATG-3′	de la Fuente J et al. (2005)

Phylogenetic and taxonomic analysis

In the present study, Simian foamy virus, Crimean-Congo haemorrhagic fever orthonairovirus, C. burnetii, B. burgdorferi and A. phagocytophilum were analysed to assess the diversity of the putative pathogens in the ticks. The different pathogen-specific sequences were amplified and sequenced to construct a phylogenetic tree using the neighbour joining method in MEGA 7 (Livak and Schmittgen, 2001).

Results

Sequencing data statistics

In the present study, we performed metagenomic Sequel sequencing on 4 tick samples. The raw data and clean CCS quality statistics of the samples before and after quality control are shown in Table 1. A total of 46.481 Gbp of raw data were obtained, and the proportions in the raw data were determined, with an average of 451,659 clean CCS reads obtained per sample (data not shown). The identified genes were compiled into a nonredundant catalogue of 282,746 genes, and the average maximum and N50 lengths were 4290 and 2195 bp, respectively. The average sequence length is shown for each sample (Fig. 2).

Fig. 2

Summary of the tag length distribution following the sequencing of genes from the four samples.

Gene prediction and functional annotation

MetageneMark and FragGeneScan were used to directly predict the genes in the CCS reads to avoid the introduction of error into the assembly (Ismail WM et al., 2014). The results showed that the average gene number was 387,746, the total length of the predicted genes was 44,897,989 and the average max length was 4290 bp for each gene (Table 2).

Table 2

Gene prediction statistics of each sample.

Samples	Total Number	Total Length (bp)	Max Length (bp)	Min Length (bp)
G 1	513,897	56,853,886	4371	60
G 2	360,245	42,224,994	3726	60
G 3	333,896	36,617,219	4212	60
G 4	382,946	43,895,857	4851	60
Average	397,746	44,897,989	4290	60

Note: MetageneMark was used to directly predict the genes of the CCS reads to avoid the introduction of error into the assembly.

Gene prediction statistics of each sample. Note: MetageneMark was used to directly predict the genes of the CCS reads to avoid the introduction of error into the assembly. DIAMOND was used to compare the gene set sequences with the eggNOG database (the expected value of the parameter set was set as 1e-5), and the corresponding functional classification (category) and homology groups (non-supervised orthologous groups, NOGs) were obtained. The results identified nearly 9000 genes as being involved in replication, recombination and repair (Table 3). Genes involved in maintaining nuclear structure were not expressed in almost any of the four samples, with only one gene with a similar function detected in the G2 sample. Of all the genes with known functions, 1376 were common to all samples. There were 503 genes specific to G1, 2226 genes specific to G2, 937 genes specific to G3 and 622 genes specific to G4, while other genes were shared between the two samples (Fig. 3).

Table 3

Functional classification of the eggNOG annotation results of the four samples.

Functional Category	Description	Samples and Gene Number
Functional Category	Description	Group 1 (G1)	Group 2 (G2)	Group 3 (G3)	Group 4 (G4)
A	RNA processing and modification	24	24	15	21
B	Chromatin structure and dynamics	320	443	238	545
C	Energy production and conversion	964	2047	795	618
D	Cell cycle control, cell division, chromosome partitioning	208	377	175	163
E	Amino acid transport and metabolism	898	3288	930	643
F	Nucleotide transport and metabolism	536	975	343	409
G	Carbohydrate transport and metabolism	609	1815	615	450
H	Coenzyme transport and metabolism	691	1195	310	466
I	Lipid transport and metabolism	590	1251	510	444
J	Translation, ribosomal structure and biogenesis	1408	2082	829	949
K	Transcription	617	2047	701	462
L	Replication, recombination and repair	18547	14150	9432	16255
M	Cell wall/membrane/envelope biogenesis	751	2047	642	435
N	Cell motility	8	324	100	19
O	Posttranslational modification, protein turnover, chaperones	1158	1484	793	832
P	Inorganic ion transport and metabolism	336	1961	670	337
Q	Secondary metabolites biosynthesis, transport and catabolism	208	791	245	211
R	General function prediction only	0	0	0	0
S	Function unknown	42726	30115	21232	26982
T	Signal transduction mechanisms	407	1811	656	385
U	Intracellular trafficking, secretion, and vesicular transport	654	1051	501	538
V	Defence mechanisms	215	588	257	170
W	Extracellular structures	0	5	9	0
Y	Nuclear structure	0	1	0	0
Z	Cytoskeleton	115	69	87	88

Fig. 3

Venn diagram based on the eggNOG database. Note: The corresponding functional categories and non-supervised orthologous group (NOG) numbers were obtained from the eggNOG database.

Functional classification of the eggNOG annotation results of the four samples. Oligonucleotides used as primers for PCR analysis of micropathogens. Venn diagram based on the eggNOG database. Note: The corresponding functional categories and non-supervised orthologous group (NOG) numbers were obtained from the eggNOG database. Analysis of the CAZy annotations showed that glycosyl transferases had a high match rate in the four samples and that the number of matched genes was more than 4000 copies (data not shown). The KEGG database is also an important tool for studying molecular functions. In this analysis, DIAMOND was used to compare the gene set sequence with the KEGG database (the expected e-value was set as 1e-5), and the corresponding metabolic pathways and KOs were obtained. The results identified a number of human disease-related genes, including those related to neurodegenerative diseases, viral infection-related diseases, parasitic infection-related diseases, bacterial infection-related diseases, immune diseases, endocrine diseases and metabolic diseases. In addition, a large number of pathogen-related genes were detected in different samples (data not shown). Based on the analysis of microbial genes (16 or 18S rRNA), some of the microbes were common among different ticks, while others were specific to certain ticks (Fig. 4). These genes are most likely involved in pathogen invasion or in protecting the body against various environmental stimuli (Luo et al., 2019).

Fig. 4

Venn diagram based on the KEGG Orthology database. Note: DIAMOND was used to compare gene sequences with the KEGG database, and the corresponding metabolic pathway information and KEGG orthology results of the genes were obtained from the KEGG database.

Community composition

Ixodes showed a relatively high abundance in the community composition analysis, because all of the raw CCS data of ticks were directly mapped to the genomes of all known species (Fig. 5). Therefore, its relative abundance between different samples was high. Anaplasma and Coxiella also had a high relative abundance in the four samples, indicating that they are common micropathogens in ticks. Moreover, a relatively high abundance of Pseudomonas was only observed in the G2 sample, indicating that this bacterium was sensitive to I. persulcatus ticks.

Fig. 5

Heat maps of community composition based on genera. Note: X-axis, template name; Y-axis, genus. The darker the blue colour is, the higher the enrichment of the genus in the sample.

Species abundance detection and annotation

The above results showed that bacteria play an important role in the composition of the microbial populations in ticks. Analysis of the relative abundance of the same species in different samples indicated that A. phagocytophilum was an important pathogen in these libraries and accounted for 40, 22, 38 and 50% of the G1, G2, G3 and G4 community compositions, respectively. Importantly, B. burgdorferi was detected in G1 from I. persulcatus ticks, and C. burnetii was detected in all samples; both of these species induce severe diseases in humans (Fig. 6).

Fig. 6

Bar chart of the relative abundances of genera.

Bar chart of the relative abundances of genera. Moreover, a number of viruses that infect humans and animals were identified, such as Simian foamy virus, Pustyn virus and Crimean-Congo haemorrhagic fever orthonairovirus from G2, G3 and G4, respectively. (Fig. 7). However, in addition to these known micropathogens, there are still many important undetected micropathogens that remain to be identified (Fig. 8).

Fig. 7

Population distribution of micropathogens in different samples. Note: “%” represents the proportion of micropathogen in the total community from each sample.

Fig. 8

Bar chart of the relative abundances of bacteria.

Population distribution of micropathogens in different samples. Note: “%” represents the proportion of micropathogen in the total community from each sample. Bar chart of the relative abundances of bacteria. In G1, Bole tick virus, Wuhan tick virus and Tjuloc virus had a high abundance of 18% (Fig. 9A). In G2, Lymphocystis disease virus comprised 33% of the community (Fig. 9B). In G3, Blacklegged tick phlebovirus and Pustyn virus had high abundances of 20 and 19%, respectively (Fig. 9C). In G4 (Fig. 9D), Hubei diptera virus and Tacheng tick virus had high abundances of 21 and 19%, respectively, while other viruses, such as Crimean-Congo haemorrhagic fever orthonairovirus, Ambidensovirus CaaDV1, Lambdina fiscellaria nucleopolyhedrovirus and Culex tritaeniorhynchus totivirus, accounted for 10% of the viral composition in the communities. Interactive pie charts were generated using KRONA for the species annotation results, where the order from the inside to the outside represents the different classification levels, and the sectorial areas represent the relative proportions of different species. The results indicated that the levels of different pathogens in the samples were suggestive of how likely the pathogens were to cause harm to the host.

Fig. 9

Pie chart of the distribution of viral abundances in different samples. Note: “%” represents the proportion of the virus in the total community from each sample, and the different colours represent different viruses. A: The primary micropathogens analysed in G1. B: The primary micropathogens analysed in G2. C: The main micropathogens analysed in G3. D: The main micropathogens analysed in G4.

Bacterial infection of ticks identified by PCR

To assess the presence of micropathogens in different tick samples, PCR was performed on the samples collected. We assessed the presence of some major micropathogens, including Simian foamy virus, Crimean-Congo haemorrhagic fever orthonairovirus, Coxiella burnetii, Borrelia burgdorferi and Anaplasma phagocytophilum, using the specific primers listed in Table 4. The results were negative for Simian foamy virus, while the other pathogens tested positive in different samples, and the results were consistent with the data generated from third-generation metagenomic sequencing.

Phylogenetic distribution of novel lineages

The 16S rRNA gene analysis indicated high level of conservation between C. burnetii (identified in the four samples analysed). The same phenomenon was observed for Borrelia burgdorferi, but Borrelia was highly conserved at the genus level (Fig. 10A), while Coxiella remained highly conserved at the species level (Fig. 10B). The MSP4 gene of A. phagocytophilum was obviously divided into three main genotypes (Fig. 10C). In addition, Simian foamy virus and Crimean-Congo haemorrhagic fever orthonairovirus show a more complex taxonomy, and the sequences are less conserved for the same genes (Fig. 10D and 10E).

Fig. 10

Phylogenetic analysis of the isolated bacteria/viruses. Reference oligonucleotide sequences were selected by BLAST searches of the NCBI nt database. (A) Subtrees of the experimental sequences from the Borrelia burgdorferi 16S rRNA gene. (B) Subtrees of the experimental sequences from the Coxiella burnetii 16S rRNA gene. (C) Subtrees of the experimental sequences from the Anaplasma phagocytophilum MSP4 gene. (D) Subtrees of the experimental sequences from the Simian foamy virus pathogen. (E) Subtrees of the experimental sequences from the Crimean-Congo haemorrhagic fever orthonairovirus segment-S gene.

Discussion

The vast woodland resources in northeast China are an important base for livestock breeding and provide an excellent environment for tick survival. Therefore, understanding ticks and tick-borne diseases is important for animal and human health. Thus, the future identification of complex communities of micropathogens via metagenomic sequencing may facilitate the development of effective control measures against ticks and tick-borne diseases. In the present study, we surveyed field-collected ticks from northeast China for tick-borne micropathogens. These findings will improve our understanding of the factors that affect the transmission of micropathogens by tick vectors and also be beneficial for the analysis of the relationship between ticks and their resident microbial populations. A total of 47 Gbp of read data were generated. After generating the raw sequence data, some insertion tags, low-quality tags, poly-A tags and small tags were removed. Then, the length distributions of clean CCSs and common/specific tags were summarized for the four samples. In addition, the length distribution results indicate that the sequencing length was continuous (Fig. 2), which is a unique feature for TGMS and indirectly indicates the reliability of the data and integrity of the gDNA for further analysis. Gene prediction is an important index in micropathogen sequence analysis. Gene composition is of great significance to the source of sequences, functional analysis of genes, and species and population compositions of micropathogens (Fig. 3, Fig. 4). The results showed that disease-related pathogens comprised an important part of all detected genes, including genes from viruses, bacteria and parasites, and some expressed genes were related to immunity and metabolism (data not shown). This result strongly suggests that micropathogens are involved most possible infectious diseases transmitted by ticks and indicate that the livestock industry and human health will likely be severely harmed if prevention efforts are not strengthened. These results also provide a good foundation for screening disease-related pathogens and carrying out research on gene-related vaccines. This is the first time that a TGMS method has been used to assess the potential threat of ticks, and the results provide important guidance for the prevention and control of ticks in the future. To further analyse the structure of micropathogen communities in ticks, all sequences were mapped to the genomes of different species. The microbial population in ticks is a complex system that includes viruses, bacteria and parasites and fungi. At the genus level, a high abundance of Ixodes sequences was identified in the four different samples (Fig. 5) that obviously originated from the samples themselves but also demonstrated the reliability of our sequencing quality. Moreover, Anaplasma and Coxiella also had high abundances in these samples, indicating that these pathogens were common micropathogens in ticks from Heilongjiang Province, China, and revealed the broad spectrum of these micropathogens in ticks. These results serve as a reminder that attention must be paid to the prevention of tick bites in daily activities due to a high risk of infection-related tick-borne diseases, which can result in unnecessary health-related and economic losses. The clean CCS reads mapped to the genomes of all micropathogens, revealing the presence of many micropathogens and demonstrating that these micropathogens infect not only animals but also humans. Among the identified micropathogens, A. phagocytophilum, C. burnetii and B. burgdorferi cause serious diseases in humans, although B. burgdorferi was only detected in I. persulcatus ticks in the G2 sample. B. burgdorferi causes severe clinical symptoms and even death in infected animals. Although B. burgdorferi was only detected in I. persulcatus, as it is a dominant tick in Heilongjiang Province, this pathogen is likely responsible for the prevalence of Lyme disease in the region and is potentially harmful to human health and animal husbandry. A. phagocytophilum (formerly Ehrlichia phagocytophilum) (Dumler et al., 2001) is a gram-negative bacterium that is unusual in its tropism to neutrophils. This pathogen causes anaplasmosis in sheep and cattle, also known as tick-borne fever and pasture fever, and causes the zoonotic disease human granulocytic anaplasmosis (Annetta et al., 2017). C. burnetii is one of the most infectious organisms (Li et al., 2005; David et al., 2017), and the disease caused by this bacterium occurs in two stages: an acute stage in which patients present with headaches, chills, and respiratory symptoms and an insidious chronic stage. These micropathogens were detected in ticks in different geographical environments from Heilongjiang Province by TGMS, suggesting the need for further attention and continued public health monitoring. In addition, disease prevention in the livestock breeding industry should be strengthened to achieve healthy breeding. According to the results presented in Fig. 6, approximately 30% of the sequences in each group of samples could not be mapped to any genome. Because the species range of micropathogens is wide, there is no database containing all micropathogens. TGMS has also been able to detect only a small number of micropathogens, while many remain undetected. Therefore, it is necessary to continue to identify and study these undetected micropathogens. Furthermore, these unknown sequences pose a challenge to our study of the microbial community composition of ticks and of the prevention and control of tick-borne diseases that may harm human health and animal husbandry. Because little is known about ticks, especially tick-borne pathogens, substantial effort needs to be devoted to studies on tick-borne pathogens to provide better and more accurate information for enhanced prevention and control of diseases caused by these vectors. Among some of the known sequences, many virus sequences were detected, such as Simian foamy virus, Bole tick virus, and Pustyn virus sequences (Fig. 7). Tjuloc virus was present in the G1 and G2 samples and accounted for 8 and 11% of the total community, respectively. Blacklegged tick phlebovirus was identified in the G3 and G4 samples and accounted for 20 and 10% of the total community, respectively. In particular, viruses that are named for the place in which they were found, such as Wuhan tick virus, Hubei diptera virus, Tacheng tick virus and Indiana vesiculovirus, were identified, probably because of factors related to animal trade or migration, which introduces micropathogens from other regions and even other countries to Heilongjiang Province. These findings are highly significant for the investigation and research of foreign diseases. The above results are also shown in Fig. 8, Fig. 9. The pathways, enzymes, and protein families that were observed to be overrepresented in the tick groups are potentially relevant to tick-related processes. Glycoside transferases play an important role as antibiotic glycosyltransferases and constitute a category of compounds that are widely used in the clinic for their antibacterial and anticancer activities (Huang et al., 2018). With the development of sequencing techniques, large numbers of glycosyltransferases have been identified in various microbial genomes, of which a few are known for their glycosylation specificity and efficiency (Dirk et al., 2002; Blanchard et al., 2002; Luzhetskyy et al., 2008). The use of efficient glycosyltransferases from among these unexplored glycosyltransferase sequences is a promising strategy for the biosynthesis of glycoside compounds. Based on the CAZy database annotation of the DIAMOND gene sequences, glycoside transferases were identified as the most abundant, followed by glycoside hydrolases, and the fold difference between the two was as high as 25. Therefore, these two enzymes play an important role in glycoside metabolism. This finding also indicates that glycoside metabolism is of great importance for micropathogens infecting hosts and for protecting the body from injury. The distinct geographical environment and climatic conditions in Heilongjiang Province may influence the diversity of resident species. To characterize the potential microbial population characteristics of different tick species at the regional scale, we used PCR to detect some major micropathogens, such as Simian foamy virus, Crimean-Congo haemorrhagic fever orthonairovirus, C. burnetii, B. burgdorferi and A. phagocytophilum in H. japonica, I. persulcatus, D. silvarum and H. concinna tick samples from four regions in Heilongjiang. Approximately 45.62% of the four tick species specimens analysed were positive for A. phagocytophilum and C. burnetii. B. burgdorferi was only detected in the G1 sample, and Crimean-Congo haemorrhagic fever orthonairovirus was only detected in the G1 and G3 samples. However, we were unable to detect any genes belonging to Simian foamy virus. Furthermore, we did not detect any genes from the causative agents responsible for Crimean-Congo haemorrhagic fever in the wild I. persulcatus and H. concinna ticks. In contrast, A. phagocytophilum and C. burnetii were clearly detected in different tick species isolated from Heilongjiang Province. This result suggests that these micropathogens might have increased potential to lead to epidemic outbursts, especially where their distribution is high. However, further experiments are necessary to verify that some of these microbial agents can be transmitted in the wild. Based on the 16S rRNA gene, C. burnetii and B. burgdorferi are highly conserved pathogens and show only slight differences in sequence among various strains, indicating the potential for the detection prevention control of these pathogens. In addition, the genotype MSP4 from A. phagocytophilum is obviously divided into three major types, of which type I is relatively limited in distribution, while types Ⅱ and Ⅲ are widely distributed worldwide and are also the primary pathogens causing zoonotic diseases. Although virus sequences are simple, they vary based on the environment or host, allowing them to adapt in different environments, as has been observed for Simian foamy virus and Crimean-Congo haemorrhagic fever orthonairovirus.

Conclusion

In the present study, for the first, time TGMS methodologies were used for micropathogen discovery in wild-caught tick vectors. The results suggest that metagenomic sequencing can facilitate the identification of numerous micropathogens in the microbiomes of wild-caught ticks. This approach could be used to not only monitor microbial communities in infectious insect vectors but also as an ideal tool for the surveillance of novel emerging bacterial and viral diseases. Finally, a more thorough understanding of the ecological factors associated with the prevalence and persistence of micropathogen lineages associated with vectors will ultimately aid in the prediction and prevention of the spread of disease. Finally, the technology used to analyse the micropathogens in ticks by TGMS will ultimately help to identify potentially novel micropathogens and predict and prevent the spread of disease.

Ethics approval

The present study was approved by the Ethics Committee of Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences (approval no. LVRIAEC, 2020–006), and tick samples were collected in strict accordance with the requirements of the Ethics Procedures and Guidelines of the People's Republic of China.

Author contributions

G.L. and J.L. designed the experiments. Q.R. and J.L. performed the experiments. J.L. and H.Y. analysed the data. J.L., H.S., and X.L. wrote the manuscript. G.G., B.Z., G.L., X.Q., Y.T., and M.S. collected experimental materials. All authors read and approved the final version of the manuscript.

Funding

This study was financially supported by grants from the (no. 2019YFC1200502, 2019YFD1200500, 2017YFD0501200), National Parasite Resource library (NPRC-2019-194-30), NSFC (31572511), Fundamental Research Funds of the Chinese Academy of Agricultural Sciences (Y2019YJ07-04, Y2018PT76), ASTIP (CAAS-ASTIP-2016-LVRI), NBCITS (CARS-37).

Declaration of competing interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

22 in total

1. Regional prevalences of Borrelia burgdorferi, Borrelia bissettiae, and Bartonella henselae in Ixodes affinis, Ixodes pacificus and Ixodes scapularis in the USA.

Authors: Ricardo G Maggi; Marcée Toliver; Toni Richardson; Thomas Mather; Edward B Breitschwerdt
Journal: Ticks Tick Borne Dis Date: 2018-11-27 Impact factor: 3.744

2. Bayesian inference reveals ancient origin of simian foamy virus in orangutans.

Authors: Michael J C Reid; William M Switzer; Michael A Schillaci; Amy R Klegarth; Ellsworth Campbell; Manon Ragonnet-Cronin; Isabelle Joanisse; Kyna Caminiti; Carl A Lowenberger; Birute Mary F Galdikas; Hope Hollocher; Paul A Sandstrom; James I Brooks
Journal: Infect Genet Evol Date: 2017-03-06 Impact factor: 3.342

3. Sequence analysis of the msp4 gene of Anaplasma phagocytophilum strains.

Authors: José de la Fuente; Robert F Massung; Susan J Wong; Frederick K Chu; Hans Lutz; Marina Meli; Friederike D von Loewenich; Anna Grzeszczuk; Alessandra Torina; Santo Caracappa; Atilio J Mangold; Victoria Naranjo; Snorre Stuen; Katherine M Kocan
Journal: J Clin Microbiol Date: 2005-03 Impact factor: 5.948

4. Protective immunity against Q fever induced with a recombinant P1 antigen fused with HspB of Coxiella burnetii.

Authors: Qingfeng Li; Dongsheng Niu; Bohai Wen; Meiling Chen; Ling Qiu; Jingbo Zhang
Journal: Ann N Y Acad Sci Date: 2005-12 Impact factor: 5.691

Review 5. Features and applications of bacterial glycosyltransferases: current state and prospects.

Authors: Andriy Luzhetskyy; Andreas Bechthold
Journal: Appl Microbiol Biotechnol Date: 2008-09-06 Impact factor: 4.813

6. Gene finding in metatranscriptomic sequences.

Authors: Wazim Mohammed Ismail; Yuzhen Ye; Haixu Tang
Journal: BMC Bioinformatics Date: 2014-09-10 Impact factor: 3.169

7. Rapid and Visual Detection of Coxiella burnetii Using Recombinase Polymerase Amplification Combined with Lateral Flow Strips.

Authors: Yong Qi; Qiong Yin; Yinxiu Shao; Suqin Li; Hongxia Chen; Wanpeng Shen; Jixian Rao; Jiameng Li; Xiaoling Li; Yu Sun; Yu Lin; Yi Deng; Wenwen Zeng; Shulong Zheng; Suyun Liu; Yuexi Li
Journal: Biomed Res Int Date: 2018-04-12 Impact factor: 3.411

8. Comparative analysis of microRNA profiles between wild and cultured Haemaphysalis longicornis (Acari, Ixodidae) ticks.

Authors: Jin Luo; Qiaoyun Ren; Ze Chen; Wenge Liu; Zhiqiang Qu; Ronghai Xiao; Ronggui Chen; Hanliang Lin; Zegong Wu; Jianxun Luo; Hong Yin; Hui Wang; Guangyuan Liu
Journal: Parasite Date: 2019-03-26 Impact factor: 3.000

9. Heartland Virus Neutralizing Antibodies in Vertebrate Wildlife, United States, 2009-2014.

Authors: Kasen K Riemersma; Nicholas Komar
Journal: Emerg Infect Dis Date: 2015-10 Impact factor: 6.883

Review 10. PacBio Sequencing and Its Applications.

Authors: Anthony Rhoads; Kin Fai Au
Journal: Genomics Proteomics Bioinformatics Date: 2015-11-02 Impact factor: 7.691