RMS Deviation and Least Squares Fitting

Root Mean Square Deviation (RMSD) is the conventional measure used to say how similar one structure is to another. It is this useful as a measure of the accuracy of a model if one has a crystal structure of the protein with which to compare the model.

The RMSD between two structures is quite simply the square root of the average squared distance between equivalent atoms. Mathematically, this is defined by the equation:

The distances (dn ) can be visualized as:

RMSD does have limitations as a single very poorly placed atom can mask the fact that all other atoms are placed extremely well. Similarly if a structure is locally correct in two separate regions, but a single bond between the two regions is rotated, the local accuracy will be masked.

RMSDs are often quoted over "all atoms", "backbone atoms" or "C-alpha atoms". Different authors interpret "backbone" differently; it may mean N,C-alpha,C or N,C-alpha,C,O or N,C-alpha,C-beta,C,O.

Atomic coordinates in proteins are generally expressed in Ångströms (where 1Å = 10-10m = 0.1nm). Consequently, RMSDs are also expressed in Å units.

Two crystal structures of the same protein (solved by different groups or in slightly different crystallization conditions) typically have a C-alpha RMSD of 0.6-0.7Å.

Least Squares Fitting is a technique used to find the optimum fit (i.e. the lowest RMSD) between two three-dimensional structures.

Fitting is a 3 stage process:

  1. The centre of geometry of each structure is identified and one structure (the "mobile" structure) is moved such that its centre of geometry coincides with that of the other ("reference") structure.
  2. Equivalent atom pairs are assigned between the two structures. (For example the C-alpha of residue Glu-1 in the mobile structure would be assigned as equivalent to the C-alpha of residue Glu-1 in the reference structure.)
  3. The mobile structure is then rotated about its centre of geometry such that the RMS deviation between the equivalent atoms is minimized. This is generally performed using a standard mathematical iterative minimization procedure, though analytical techniques may also be used.