Threading and Fold Recognition

As a general rule all proteins within a given fold are homologous even though the sequences may have diverged considerably. (If there is less than ~25% sequence identity, then one needs additional evidence such as conserved function and more statistically relevant measures of sequence similarity to confirm that they are homologous.) However certain folds ("superfolds") contain many non-homologous protein families. In these cases unrelated sequences adopt the same fold.
e.g. Soybean trypsin inhibitor & Interleukin 1
-or- Myoglobin & colicin-A

There are almost certainly a limited number of protein folds in nature.

Threading and fold recognition attempt to match a sequence of unknown structure against a library of known folds .






Profile matching

Bowie et al., Science 253(1991), 164

This technique was designed to find sequences in the sequence database which are likely to adopt a known fold.








Fold recognition

e.g. Russell, Copley & Barton, J. Mol. Biol. 259(1996), 349-365

Use secondary structure prediction on the sequence of unknown structure

In practice, the sequence itself - or a simplified version of the sequence - may also be used in the alignment. Predictions of solvent accessibility may also be included and filters which assure good packing may also be applied in scoring the predictions.








Threading

e.g. Jones et al., Nature 358(1992), 86-89

Matches a sequence to all known folds directly in three dimensions.

Each structure in the fold library is examined in turn.

For each threading of the sequence onto the structure, an "energy" (actually a log odds propensity score for residue pair interactions plus a solvation propensity) is calculated.

This "energy" is minimized by allowing insertions and deletions in the threading of sequence onto structure using a technique called "double dynamic programming". (Normal single dynamic programming is used in sequence alignment.)

The energies are plotted as a histogram and low energy threadings are selected as likely fold match candidates. A "z-score" (the number of standard-deviation units away from the mean) is calculated to indicate the confidence in the best matches.








Other methods

Other methods (e.g. Cohen et al., J. Mol. Biol. 156(1982)821) work by predicting secondary structure and attempting to assemble these in 3D. All possible sensible tertiary arrangements are explored. Can apply rules about preferred packing arrangements.

Other filters include:

Ab initio methods attempt to fold a protein using purely energetic methods. Generally use a simplified model of a protein (e.g. Treating each residue as a single point in space).

While much work is going on in this area, none of the methods is very successful at present.