Gene Prediction Tools
Gene prediction tools are software programs that use computational methods to identify potential protein-coding genes in genomic DNA sequences. Here are some commonly used gene prediction tools:
- GeneMark: GeneMark is a popular gene prediction tool that uses a statistical algorithm to identify protein-coding genes in genomic DNA sequences. It has been used to annotate the genomes of many organisms, including bacteria, archaea, and eukaryotes. It is known for its high accuracy in predicting protein-coding genes in bacterial and archaeal genomes. GeneMark works by analyzing the distribution of nucleotides in a genomic DNA sequence and identifying regions that have a statistical bias towards codons that encode amino acids. These regions are then analyzed further to identify the start and stop codons of potential protein-coding genes.
- Augustus: Augustus is a gene prediction tool that uses a probabilistic model to identify potential protein-coding genes in genomic DNA sequences. It is particularly useful for annotating eukaryotic genomes. Augustus works by first aligning a genomic DNA sequence to a reference genome, if one is available, or to a closely related genome. It then uses a probabilistic model to identify potential protein-coding genes based on features such as codon usage, exon-intron structure, and splicing signals.
- Glimmer: Glimmer(Gene Locator and Interpolated Markov ModelER) is a gene prediction tool that uses a hidden Markov model to identify protein-coding genes in microbial genomes. It has been used to annotate the genomes of many bacteria and archaea. Glimmer works by first identifying potential coding regions based on the presence of start and stop codons, as well as the codon bias in the genome. It then uses a hidden Markov model to identify the most likely exon-intron structure for each potential coding region, based on the observed frequencies of codons and splice sites.
- Fgenesh: Fgenesh is a gene prediction tool that uses a neural network to identify potential protein-coding genes in genomic DNA sequences. It is particularly useful for annotating eukaryotic genomes.
- GeneID: GeneID is a gene prediction tool that uses a combination of methods, including Hidden Markov Models and neural networks, to identify potential protein-coding genes, exons, splice sites and other signals in genomic DNA sequences. It has been used to annotate the genomes of many organisms, including humans.
- FINDER: FINDER (Finding Informative Non-coding Discoveries and Enhancer Regulatory regions) is a gene prediction tool that uses machine learning methods to predict regulatory regions in the human genome. Unlike most gene prediction tools, FINDER does not predict protein-coding genes directly. Instead, it focuses on identifying non-coding regions of the genome that are likely to be involved in gene regulation. FINDER is particularly useful for identifying enhancers and other regulatory elements that are active in specific cell types or during particular developmental stages. It has been used to annotate the human genome and to identify regulatory elements associated with diseases such as cancer.
- FragGeneScan: FragGeneScan has been used to annotate the genes in many metagenomic datasets, including those from environmental samples, the human gut microbiome, and viral communities. It predicts gene in complete genome and sequencing read. It has helped to advance our understanding of the diversity and function of microbial communities in various environments.
- Prodigal: This gene prediction tool was developed by Hyatt and colleagues in 2010 and has since become one of the most widely used gene prediction tools for prokaryotic genomes. It does not use HMM or IMM models rather it is based on log-likelihood function.
- Eugene: Eugene is a gene prediction tool that uses a graph-based approach to identify protein-coding genes in eukaryotic genomes. It was developed by Foissac and Gouzy in 2010 and has since been used to annotate the genomes of many eukaryotic organisms. Eugene works by first constructing a graph representation of the genome, in which nodes represent potential exons and edges represent potential splice sites. It then uses a dynamic programming algorithm to identify the most likely paths through the graph that correspond to protein-coding genes.
- FrameD: FrameD is a gene prediction tool that locates gene and frameshift in GC rich sequence. It is suitable for both prokaryotic and eukaryotic.
- GENSCAN: It was developed by Burge and colleagues in 1997. GENSCAN can predict the locations of exon-intron in genome sequence.
- PHANOTATE: PHANOTATE is a gene prediction tool that uses a combination of homology-based and de novo prediction methods to identify protein-coding genes in bacterial and archaeal genomes. It was developed by Tong and colleagues in 2019. This tool is used to annotate phage genome.
- ORFfinder: ORFfinder is a gene prediction tool that identifies open reading frames (ORFs) in DNA sequences. It was developed by the National Center for Biotechnology Information (NCBI) and is widely used for gene discovery and annotation. ORFfinder works by scanning a DNA sequence and identifying all possible ORFs based on the presence of start and stop codons. Users can specify the minimum length of ORFs to be considered, as well as the genetic code to be used for translation. However, one limitation of ORFfinder is that it only identifies ORFs based on the presence of start and stop codons and does not incorporate additional information about gene structure, such as splice sites or promoter regions.
These are just a few examples of the many gene prediction tools that are available. There are other gene prediction tools for example VEIL, NNPP, BioNix, GeneParsar, GenomeScan, BGF, ATGpr and others. The choice of tool depends on the type of genome being annotated, the availability of training data, and the research question being addressed.