Opentopia Directory Encyclopedia Tools

Blast

Encyclopedia : B : BL : BLA : Blast


In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing biological sequences, such as the amino-acid sequences of different proteins or the DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if human beings carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

Background

BLAST is one of the most widely used bioinformatics programs, probably because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. This emphasis on speed is vital to making the algorithm practical on the huge genome databases currently available, although subsequent algorithms can be even faster.

Examples of other questions that researchers use BLAST to answer are

BLAST is also often used as part of other algorithms that require approximate sequence matching.

The BLAST algorithm and the computer program that implements it were developed by Stephen Altschul, Warren Gish, David Lipman at the U.S. National Center for Biotechnology Information (NCBI), Webb Miller at The Pennsylvania State University, and Gene Myers at the University of Arizona . It is available on the web at [link]. Alternative implementations are available at [link] (WU-BLAST) and [link] (FSA-BLAST).

The original paper "Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990." was the most highly cited paper published in the 1990s.

Input/Output

Input and Output, complies to the FASTA format

Algorithm

To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. In typical usage, the query sequence is much smaller than the database, e.g., the query may be one thousand nucleotides while the database is several billion nucleotides.

BLAST searches for high scoring sequence alignments between the query sequence and sequences in the database using a heuristic approach that approximates the Smith-Waterman algorithm. The exhaustive Smith-Waterman approach is too slow for searching large genomic databases such as GenBank. Therefore, the BLAST algorithm uses a heuristic approach that is slightly less accurate than Smith-Waterman but over 50 times faster. The speed and relatively good accuracy of BLAST are the key technical innovation of the BLAST programs and arguably why the tool is the most popular bioinformatics search tool.

The BLAST algorithm can be conceptually divided into three stages.

..AGTTAC..
| |||
..ACTTAG..
If a high-scoring ungapped alignment is found, the database sequence is passed on to the third stage. 
An extremely fast but considerably less sensitive alternative to BLAST that compares nucleotide sequences to the genome is BLAT (Blast Like Alignment Tool). A version designed for comparing multiple large genomes or chromosomes is BLASTZ. Also there is another well-known software called [PatternHunter] which produces significantly better sensitivity results than BLAST at the same speed or very similar sensitivity results at a much faster speed.

Parallel BLAST

Parallel BLAST versions are implemented using MPI, Pthreads and are ported on various platforms including Windows,Linux, Solaris, OSX, and AIX. Popular approaches to parallelize BLAST include query distribution, hash table segmentation, computation parallelization, and database segmentation(partition).

Program

The BLAST program can either be downloaded and run as a command-line utility "blastall" or accessed for free over the web. The BLAST web server, hosted by the NCBI, allows anyone with a web browser to perform similarity searches against constantly updated databases of proteins and DNA that include most of the newly sequenced organisms.

BLAST is actually a family of programs (all included in the blastall executable). The following are some of the programs, ranked mostly in order of importance:

See also

External links

Databases supported by Bioinformatic Harvester
UniProt | SOURCE | SMART | SOSUI | PSORT | HomoloGene | gfp-cdna | IPI | OMIM
NCBI-BLAST | Genome-Browser | Ensembl | RZPD | STRING | iHOP | Entrez

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: