As a general rule all proteins within a given fold are homologous even though the sequences may have diverged considerably. (If there is less than ~25% sequence identity, then one needs additional evidence such as conserved function and more statistically relevant measures of sequence similarity to confirm that they are homologous.) However certain folds ("superfolds") contain many non-homologous protein families. In these cases unrelated sequences adopt the same fold.
| e.g. | Soybean trypsin inhibitor | & Interleukin 1 |
| -or- | Myoglobin | & colicin-A |
There are almost certainly a limited number of protein folds in nature.
Threading and fold recognition attempt to match a sequence of unknown structure against a library of known folds .
Bowie et al., Science 253(1991), 164
This technique was designed to find sequences in the sequence database which are likely to adopt a known fold.
e.g. Russell, Copley & Barton, J. Mol. Biol. 259(1996), 349-365
Use secondary structure prediction on the sequence of unknown structure
In practice, the sequence itself - or a simplified version of the sequence - may also be used in the alignment. Predictions of solvent accessibility may also be included and filters which assure good packing may also be applied in scoring the predictions.
e.g. Jones et al., Nature 358(1992), 86-89
Matches a sequence to all known folds directly in three dimensions.
Each structure in the fold library is examined in turn.
For each threading of the sequence onto the structure, an "energy" (actually a log odds propensity score for residue pair interactions plus a solvation propensity) is calculated.
This "energy" is minimized by allowing insertions and deletions in the threading of sequence onto structure using a technique called "double dynamic programming". (Normal single dynamic programming is used in sequence alignment.)
The energies are plotted as a histogram and low energy threadings are selected as likely fold match candidates. A "z-score" (the number of standard-deviation units away from the mean) is calculated to indicate the confidence in the best matches.
Other methods (e.g. Cohen et al., J. Mol. Biol. 156(1982)821) work by predicting secondary structure and attempting to assemble these in 3D. All possible sensible tertiary arrangements are explored. Can apply rules about preferred packing arrangements.
Other filters include:
While much work is going on in this area, none of the methods is very successful at present.