| Literature DB >> 28185573 |
Pedro Furió-Tarí1, Ana Conesa2,3, Sonia Tarazona4,5.
Abstract
BACKGROUND: The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area) is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information.Entities:
Keywords: Associations; Gene; Genomic region; NGS; Omics integration; Peak
Mesh:
Year: 2016 PMID: 28185573 PMCID: PMC5133492 DOI: 10.1186/s12859-016-1293-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Definition of the areas of a gene used by the RGmatch algorithm
Fig. 2Examples of two different situations that would result in a region being associated with more than one gene. a Two overlapped genes with different isoforms. b Two different genes with common areas overlapping the region (quasi-overlapping genes)
Fig. 3Flowchart describing the rules used by RGmatch to decide the gene area to annotate the region-transcript association (default algorithm options)
Table showing the results at the exon level for the example shown in Fig. 2
| Region | Midpoint | Gene | Transcript | Exon | Area | Distance | PercRegion | PercArea |
|---|---|---|---|---|---|---|---|---|
| 1_3400_3700 | 3550 | Gene2 | Tr1_Gene2 | 2 | INTRON | 0 | 66.45 | −1 |
| 1_3400_3700 | 3550 | Gene2 | Tr1_Gene2 | 2 | GENE_BODY | 0 | 33.55 | 6.73 |
| 1_3400_3700 | 3550 | Gene1 | Tr1_Gene1 | 1 | TSS | 0 | 66.45 | 100.0 |
| 1_3400_3700 | 3550 | Gene1 | Tr1_Gene1 | 1 | 1st_EXON | 0 | 33.55 | 5.94 |
| 1_3400_3700 | 3550 | Gene1 | Tr2_Gene1 | 1 | TSS | 0 | 66.45 | 100.0 |
| 1_3400_3700 | 3550 | Gene1 | Tr2_Gene1 | 1 | 1st_EXON | 0 | 33.55 | 5.94 |
| 1_5900_6250 | 6075 | Gene2 | Tr1_Gene2 | 1 | 1st_EXON | 0 | 100 | 29.23 |
| 1_5900_6250 | 6075 | Gene1 | Tr2_Gene1 | 2 | INTRON | 0 | 56.98 | −1 |
| 1_5900_6250 | 6075 | Gene1 | Tr2_Gene1 | 2 | GENE_BODY | 0 | 43.02 | 37.66 |
| 2_2102_2702 | 2402 | Gene4 | Tr1_Gene4 | 1 | TSS | 0 | 33.28 | 100.0 |
| 2_2102_2702 | 2402 | Gene4 | Tr1_Gene4 | 1 | PROMOTER | 0 | 48.42 | 22.38 |
| 2_2102_2702 | 2402 | Gene4 | Tr1_Gene4 | 1 | 1st_EXON | 0 | 18.30 | 80.88 |
| 2_2102_2702 | 2402 | Gene3 | Tr1_Gene3 | 1 | TSS | 0 | 33.28 | 100.0 |
| 2_2102_2702 | 2402 | Gene3 | Tr1_Gene3 | 1 | PROMOTER | 0 | 33.61 | 15.54 |
| 2_2102_2702 | 2402 | Gene3 | Tr1_Gene3 | 1 | 1st_EXON | 0 | 11.65 | 100 |
| 2_2102_2702 | 2402 | Gene3 | Tr1_Gene3 | 1 | INTRON | 0 | 21.46 | −1 |
Comparison of the functionalities of the different algorithms
| RGmatch | Homer | GREAT | CisGenome | Seq2pathway | ChIPseeker | |
|---|---|---|---|---|---|---|
| User − friendly | Command line | Command line | Web tool | Command line/GI (only in Windows) | R/Bioc | R/Bioc |
| Adaptable to pipelines | Yes | Yesa | No | Yesa | Yesa | Yesa |
| Input format | BED (also gzip-compressed BED file) | BED | BED (only 3 columns) | BED - > COD | BED - > GRanges | BED |
| Association resolution | Gene, transcript, exon | Gene, transcript | Gene | Gene | Gene | Gene, transcript |
| Area annotation | Yes | Yes | No | No | Yes | Yes |
| Flexibility | Distance, Areas, Rules, Area priorities | No | Distance | Distance | Search radius | Area priorities, TSS distance |
| Supported species | All | All | 3 | 12 | 2 | Allb |
| Output: Gene IDs? | Any in the GTF | Gene and transcript IDs | Gene names | Gene IDs | Gene IDs and gene names | Gene and transcript IDs |
| Output: Distance? | Yes | Yes | Yes | No | Yes | Yes |
| Output: Overlapping genes? | Yes | No | No | No | Yes | No |
aHOMER and CisGenome can be integrated in analysis pipelines, although the process to obtain the annotations and parse these results is not as straightforward as with RGmatch. Seq2pathway and ChIPseeker can also be integrated with additional scripting
bIt supports all species, provided the input format is a TxDb R object. This format can be obtained from a GTF file by using the makeTxDbFromGFF function in the GenomicFeatures package
Fig. 4Venn diagram showing the number of region-gene associations obtained with the HOMER, RGmatch, and CisGenome methods
Equivalences between the gene areas defined by RGmatch and HOMER
| RGmatch | HOMER |
|---|---|
| INTRON | Intron |
| UPSTREAM | Intergenic |
| DOWNSTREAM | TTS; Intergenic |
| GENE_BODY | exon; 3′ UTR; 5′ UTR |
| TSS | promoter-TSS |
| 1st_EXON | exon; promoter-TSS; 5′ UTR; 3′ UTR |
Annotations for the region location within the gene returned by RGmatch (columns) and HOMER (rows)
Associations with equal or equivalent annotations in both methods are shown in green, and associations with different annotations are shown in red