A Dictionary of Bioinformatics

The Stanford Glossary may also be useful!


BLAST
PSI-BLAST

BLAST is a very rapid sequence searching method. The original BLAST did not allow gaps in the sequence matches. WU-BLAST and the current version of BLAST (BLAST2) overcome that problem.

PSI-BLAST is an enhancement in which searches are iterated, with a position specific scoring matrix. The matrix used in any iteration is computed based on significant alignments found in the previous iteration. Success depends on the quality of the matrix, which in turn depends on the homologous nature of the set of sequences which match the query above some BLAST E-value. Weighting is performed on the set of sequences used to generate the matrix according to Heinkoff D and Heinkoff JG 1994 J. Mol. Biol. 216:813-818, so that sequences with high similarities are weighted lower than the more divergent sequences.

See:
Goodman L 1997 More blast for the buck. Genome Research. 7:858-859
Atschul AF et al. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acid Res. 25:3389-3402.


BLOCKS

Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Block Searcher, Get Blocks and Block Maker are aids to detection and verification of protein sequence homology.

The Blocks database is made automatically by looking for the most highly conserved regions in groups of proteins documented in the Prosite Database.

See: http://www.blocks.fhcrc.org/


COGS

Clusters of Orthologous Groups

Typically a COG database is built by pairwise comparisons of all proteins from a set of complete genomes. For each protein, the best hit (BeT) in each of the other genomes is identified. A COG is then defined by a triangular relationship of BeTs.

This database is used by BLASTing an unknown sequence against the set of all genomes in the COGs database, and looking for the case in which the unknown sequence has BeTs to more than one member of the COG.

See:
Tatusov RL, Koonin EV, and Lipman DJ. (1997) A genomic perspective on protein families. Science. 278:631-637.


COILS

A program for finding coiled coils.

See:
Lupas, A (1996) Prediction and analysis of coiled coil structures. Methods Enzymol 266:513-525.


ISOLOG

Duplicated genes within a single organism with the same activity. Diversification of function during evolution of isologs will lead to paralogs.


KEGG

Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.

Tutorial: From Pathway to Genes and Molecules


ORTHOLOG

Homologously related sequences from different gemones with the same function. Strictly two genes are orthologs only if they had a common ancestral gene in the most recent common ancestral species. Defined by Fitch, W.M. (1970) Syst. Zool. 19:99-110


PARALOG

Paralogs are homologous (i.e. they have an evolutionary relationship). Two definitions have been used: 1) similar sequences with different functions which have arisen through duplication prior to diversification; 2) similar sequences in a single organism (in some cases these might better be named isologs). Originally the term was used to imply that functional differences would evolve after duplication. Without biochemical data, one cannot prove that the functions will be different so the word is often used in a loose sense to describe similar proteins thought not to be orthologs. Defined by Fitch, W.M. (1970) Syst. Zool. 19:99-110


PHD

A neural-network based program for predicting secondary structure

See:
Rost, B (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymology 266:525-39.


PRODOM

The ProDom protein domain database consists of an automatic compilation of homologous domains.


Radiation Hybrid (RH) Mapping

A technique for identifying landmarks every 100kb in the human genome used as part of the human genome mapping project. A RH map simply shows the order of a set of landmarks with distances between neighbours plus an indication of the level of support for the ordering.

A set of overheads on RH mapping is available from http://www.nhgri.nih.gov/COURSE99/Pdf/matise.pdf and a set of useful links is at http://linkage.rockefeller.edu/tara/rhmap/


SEALS

System for Easy Analysis of Lots of Sequences

Designed for large-scale research projects in bioinformatics rapidly to implement standard sequence analysis protocols, design new investigations.

See:
Walker, DR, and Koonin, EV (1997) SEALS: A System for Easy Analysis of Lots of Sequences. Intelligent Systems for Molecular Biology 5:333-339

Functions include:
SEG

A masking program used by BLAST to identify low-complexity regions. Runs automatically as part of BLAST but may be downloaded in a standalone version from ftp://ncbi.nlm.nih.gov/pub/seg/seg/ Note that the automatic BLAST version will only mask your probe sequence and not the database itself.

See:
Wootton, JC, and Federhen, S (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol 266:554-571.


SIGNALP

A neural-network based program for finding signal peptides

See:
Nielsen H, Engelbrecht J, Brunak S, and von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10:1-6. For a review of signal prediction methods, see:
Claros MG, Brunak S, and von Heijne G (1997) Prediction of N-terminal protein sorting signals. Current Opinions in Structural Biology 7:394-398.