Detection of homology - why is it useful?

Proteins are homologous if they share a common ancestor. Phrases such as "the proteins show 40% homology" are meaningless; "sequence identity" or "similarity" should be used. The latter is calculated from some sort of similarity matrix such as the Dayhoff Mutation Matrix: (Dayhoff et al., (1978) in 'Atlas of Protein Sequence and Structure' (ed.: Dayhoff) National Biomedical Research Foundation, Washington DC., Volume 5 Suppl. 3, pp. 345-352).

Homologous proteins show the same three dimensional protein fold. In general, proteins with the same fold are homologues (i.e. Structurally convergent evolution is rare) with the exception of certain very commonly occurring "super-folds". Where two proteins share the same fold and the same (or a related) function, they are almost certainly homologues even if they show no significant sequence similarity. Evolution is very unlikely to have solved the same problem twice in essentially the same way!

Therefore, if we wish to build a model of a protein of unknown structure and we know the structure of a homologous protein, we can use this as a template.

Similarly, if we wish to know the function of a protein and we know the function of a homologue, we can extract some level of functional annotation.

Various sophisticated methods are available to try to identify very distant homologues - known as "the twilight zone". Many of these rely explicitly or implicitly on searching for sparse patterns of conserved residues in a sea of noise (freely mutating residues).

    Percent Identity   Alignment Method
    100   Automatic pairwise
    90    
    80    
    70    
    60    
    50    
    40   Consensus
Twilight Zone 30    
20   Profile
10    
0   Structure prediction

Alignment of two random sequences can produce 20% sequence identity (one might expect 5% given that there are 20 amino acids, but the non-uniformity of amino acid usage means that ~20% identity is observed on average between randomly chosen pairs of sequences).

Where we can find a homologue of known structure/function we can use "comparative modelling" and inheritance of annotation.