19. Protein Health


Overview

The Protein Health utility can be accessed as a separate application under the Application pulldown or it can be accessed from within the Protein Design application via the tool in the Protein Utilities palette. Protein Health will identify features in a protein structure which are wrong, for example the wrong chirality of Ca atoms, or uncommon, and therefore worthy of closer examination, for example main chain conformations that fall outside accepted regions on the Ramacandran map, or side chain conformations that do not correspond to regularly observed rotamers. This application can be used as an aid to model building and provide criteria to judge the quality of imported data such as PDB files.

Multiple conformations of the same structure can be read from a csr file and the results for all conformations tabulated for easy comparison.

Protein User's Reference:

C. M. Wilmott & J. M. Thornton, Prot. Eng. 3 479-493 (1990).

Ponders & Richards, J.Mol. Biol. 194 775-791 (1987).

Sutcliffe et.al., Prot. Eng. 1 385-392 (1987).

R. L. Dunbrack & M. Karplus, J. Mol. Biol. (in press).


Using Protein Health

This palette is activated from the Protein Utilities or the QUANTA Applications menu. These tools are intended to provide a guide when model building or to enable judgements to be made on the quality of imported data, such as PDB files. The tools in this palette will identify:

There are three means of presenting the results of the health check: listing to a file, listing to the textport, or by highlighting the bad features on the molecule display. If multiple conformations are read in from csr files then a comparison of their health properties is presented in a table.

If the Display Exception tool is active, then the molecule will be colored to indicate health exceptions. The atoms involved in the exception will be colored - for example, the side chain atoms are colored when the side chain is not close to a rotamer optimal conformation. It may simplify the display if you use the Display Ca and Side Chain option from the Molecule Colors tool on the Protein Utilities palette. The one health exception which may be overlooked with this simplified display is a buried, not hydrogen bonded, backbone N or O atom.

Analysis of Multiple Conformations

Several model building methods will generate multiple alternative models; particularly NMR experiment or automated model building using MODELER will generate multiple models. Analysis of the protein health criteria for each model can indicate which regions of the model may be poor and comparison of health exceptions between the models can suggest which are the better models.

In the QUANTA environment, multiple models are usually stored in a csr file and the Tabulate MultiConformations tool will read the confrontations from a csr file and apply selected health checks to all conformations. Csr files can be generated from PDB files (such as are output by XPLOR NMR refinement) using the tool in Protein Health or they can be generated in the Protein MODELER application. Csr files contain multiple sets of coordinates but they must be associated with an MSF which contains other data such as atom names and residue types. The Tabulate MultiConformations tool requires that the MSF associated with the csr file is the only selected molecule. If this is not the case, then the required MSF will be selected automatically. The health checks which are currently active and highlighted on the palette are applied to each conformation in the csr file. As the checks are performed, the molecule window is updated to display the current conformation and, if the Display Exception tool is active, colors them to indicate health check exceptions. The Legend tool on the Protein Utilities palette should be activated to provide a key to the color coding.

When the analysis is complete the full results are presented in a table. The molecule conformation reverts to that from the MSF file but the Display Conformation tool becomes available for you to display any of the conformations.

The multi-conformation table has one column per conformation and one row per health exception. The rows are labelled with the name of the residue and a short mnemonic for the exception:

If a particular conformation has a health exception then the appropriate cell in the table is marked with an "X". At the top of the table the number of exceptions for each conformation is given. The left-most column in the table has the number of conformations which have the exception.

If either the Close Contacts or Holes option are active then the number of close contacts or holes in each conformation are listed at the top of the table. You will need to use the Display Conformation tool to display the close contacts or holes for each individual conformation.

To see the exceptions reported in the table you can pick the table to reset the display. You can change the displayed conformation by picking the name of the required conformation from the top row of the conformation table. If you pick a residue ID from the left hand column of the table, then the display will be centered on that residue in the currently displayed conformation. Picking a cell in the body of the table will display the conformation given by the table column and center the display on the residue given by the table row.


Tools and Options

This tool identifies two classes of undefined coordinates:

Boxes are drawn and labelled with the residue ID for each residue that has undefined coordinates. This information is also listed to the textport.

The peptide bond is normally expected to be planar, such that the omega angle, Ca-C-N-Ca, is about 180°. By default omega angles less than -170° or greater than 170° are flagged.

The data file $HYD_LIB/protein_param.dat contains a Ramachandran map. This map is derived from analysis of observed conformations in well-resolved crystal structures1. The Ramachandran map is divided into a grid of 10° by 10° blocks. The commonly observed conformations lie in six different regions on this map. The health check flags any residue with a conformation which lies outside these regions.

Several analyses of the protein databank show that most sidechains occur in a limited range of conformations. These observed conformations are usually called rotamers.

Several libraries of the commonly occurring rotamers are described in the literature and three of them are used within QUANTA.The libraries differ in the form of the analysis which was applied, particularly in the handling of the dependence of the side chain conformation on the backbone conformation.

These rotamer libraries are available to use with either protein health checking or in side chain model building. They are listed in the file $HYD_LIB/protein_param.dat. except for the Dunbrack and Karplus rotamers which are listed in $HYD_LIB/harvard_torsion.dat.

Table 2. Rotamer Library Types
Rotamer Libraries
Description

Ponder and Richards

No analysis on main chain conformation dependency. The percentage of observed occurrence is given for each rotamer.

Sutcliffe et.al.

Rotamers are designated as specific for helix or strand or independent of main chain conformation. No percentage of observed occurrences is listed.

Dunbrack and
Karplus

In analyzing dependency on main chain conformation, the torsion space for phi and psi was divided into a grid of 20° by 20° blocks.

This library was generated by compiling statistics on the side chain torsions observed for a given range of main chain torsions. The side chain torsion space was divided into chemically sensible rotamers. For example, the torsion about a bond connecting two sp3 carbon atoms is split into 3 rotamers:

gauche + 0× < chi < 120 ×

trans 120× < chi < 240 ×

gauche -(minus) -120× < chi < 0×

If the observed torsion falls within the appropriate range, it is counted in the statistics even if it is a long way from the presumed optimal center of the range.


Dunbrack and Karplus rotamer definitions

Table 3. Recognized rotamers for chi1 (all residue types)
rotamer
chi1
definition

1

g+

< chi1

< 120×

2

t

120×

< chi1

< 240×

3

g-

-120×

< chi1

< 0×

Table 4.  Recognized rotamers for chi2
Residue types: Leu, Ile, Gln, Glu. Met, Arg, Lys, and Pro1
rotamer
chi1
chi2

1

g+

g+

2

g+

t

3

g+

g-

4

t

g+

5

t

t

6

t

g-

7

g-

g+

8

g-

t

9

g-

g-

1The values for g+, t, and g- remain constant.

Table 5. His, Phe, and Tyr (NB by definition all chi2 > 0)
rotamer
chi1
chi2

1

g+

< chi2

< 60×

2

g+

60×

< chi2

< 120×

3

g+

120×

< chi2

< 180×

4

t

< chi2

< 60×

5

t

60×

< chi2

< 120×

6

t

120×

< chi2

< 180×

7

g-

< chi2

< 60×

8

g-

60×

< chi2

< 120×

9

g-

120×

< chi2

< 180×

Table 6.  Residue type Trp
rotamer
chi1
chi2

1

g+

< chi2

< 180×

(+90× rotamer)

3

g+

-180×

< chi2

< 0×

(-90× rotamer)

4

t

< chi2

< 180×

(+90× rotamer)

6

t

-180×

< chi2

< 0×

(-90× rotamer)

7

g-

< chi2

< 180×

(+90× rotamer)

9

g-

-180×

< chi2

< 0×

(-90× rotamer)

Table 7. Residue type: Asp, and Asn
rotamer
chi1
chi2

1

g+

-90×

< chi2

< -30×

(g- rotamer)

2

g+

-30×

< chi2

< 30×

(t rotamer)

3

g+

30×

< chi2

< 90×

(g+ rotamer)

4

t

-90×

< chi2

< -30×

(g- rotamer)

5

t

-30×

< chi2

< 30×

(t rotamer)

6

t

30×

< chi2

< 90×

(g+ rotamer)

7

g-

-90×

< chi2

< -30×

(g- rotamer)

8

g-

-30×

< chi2

< 30×

(t rotamer)

9

g-

30×

< chi2

< 90×

(g+ rotamer)

For the Karplus rotamer, any sidechain not within any accepted rotamer range for its main chain conformation is assigned color 7 (green). If the side chain is within an acceptable rotamer conformation, but has a torsion angle more than the cutoff (default of 30×) from the center of the rotamer range, it is assigned color 1 (light green).

The chiral atoms within the standard amino acids which are tested are listed in Table 8. For the valine and leucine sidechains, bad chirality results from inappropriate atom naming.

Table 8.Atoms tested for Chirality
atom name
residue type

CA

All amino acids

CB

Ile

CB

Val

CG

Leu

This tool indicates buried polar atoms for all oxygen or nitrogen atoms that have a solvent accessibility less than a given cutoff (default of 0.01) and are not hydrogen bonded.

The hydrogen bonds used in this analysis are derived using generous criteria which includes the "near" hydrogen bonds as defined by the parameters in the Hydrogen Bond utility.

Hydrophilic residues normally occur on the protein surface and hydrophobic residues occur in the protein core. Residues often occur in inappropriate environments for several reasons:

The parameter used for residue accessibility is the side chain fractional accessibility. This is the sum of the accessibilities for the side chain atoms, relative to the maximum possible accessibility for the side chain. This assumes an extended conformation and minimal occlusion of the side chain by the neighboring main chain. Hydrophobic residues with greater than 0.9 accessibility and hydrophilic residues with less than 0.1 accessibility are flagged.

This tool indicates close contacts when the distance between atoms are less than some proportion of the sum of their van der Waals radii (default of 0.80). Close contacts within one residue, usually due to bad side chain conformation, and contacts between Cb atoms and neighboring residues, are not flagged.

This tool checks for holes within protein structures that are large enough to accommodates a solvent molecule.The tool identifies probable solvent sites, and is potentially useful to crystallographers trying to identify solvent molecules in electron density refinement. The minimum radius of the hole used can be set by the Options tool.

The solvent is modeled as a simple sphere of radius about 1.3 angstrom. This method considers the protein on a 3D grid and marks each grid point as either protein or solvent, depending on its proximity to a protein atom. A point is considered as `protein' if it is within the atomic radius plus the solvent radius of the center of a protein atom. A flood-fill technique is then used to identify all the connected solvent points that constitute the bulk solvent around the protein. The remaining solvent grid points, not connected to the bulk solvent, are putative holes within the protein. A second pass analysis at a higher resolution identifies if there is truly enough space for the solvent sphere to fit. The centroid and volume of each hole is listed to textport. The volume given is the space within which the centroid of a solvent sphere could move without the solvent sphere contacting the neighboring atoms. The reported volume will be greater than 0.0 but may be a very small value.

This tool opens the Protein Health Options dialog box from which you can change default variables for the Protein Health tools.

This tool activates the Residue Selection palettes. Whilst this tool is active the health checks will be applied to only the selected residues.

This tool, which is active by default, displays the exceptions to the currently active health checks on the molecule. The molecule is colored gray, color 11, and areas of bad structure are highlighted in bright colors. The legend on the bottom left of the screen describes what each color indicates. When the Close Contacts tool is selected, close atoms are indicated by dashed lines.

When this tool is picked the exceptions to the currently active health checks are listed to the textport.

When this tool is picked the exceptions to the currently active health checks are written to a file with the name molecul_health.out. If close contacts are active then they are written in the file molecule_bumps.out.

This tool generates a phi/psi plot with the allowed regions drawn in the colors listed in Table 9. The selected residues are marked on the plot.

Table 9. Allowed Main Chain Conformations1 
Conformation
Code Letter
Color on Plot

aR Right hand a helix

A

8 purple

bE Extended b strand

B

4 yellow

bP Poly proline b strand

P

13 brown

aL Left hand a helix

L

12 pale blue

g Gly left handed a helix

G

10 salmon pink

e Gly assessable region

A

5 white

1Wilmot & Thornton, 1990, Protein Engineering, 3, 479-493.

This opens the Define phi/psi Dihedrals dialog box that enables you to change the default torsions angles.

This tool enables cross-highlighting of picked residues on the phi/psi plot, the sequence viewer or structure.

This tool writes the Phi/psi angles and the side chain angles to the textport.

This tool writes the Phi/psi angles and the side chain angles to a file molecule_torsions.out.

You are prompted by the file librarian to select an XPLOR PDB file and to give a name for the csr file which is created.

You are prompted to select a csr file. If the MSF associated with the csr file is not the only currently selected MSF then a reselection is performed automatically. The currently selected, highlighted health tools are applied to every conformation in the csr file and the results tabulated. The Display Conformation tool becomes available. If the Tabulate MultiConformations tool is picked again the current table will be overwritten.

You can select a conformation number to display or opt to return to the conformation of the MSF. The conformation is displayed with the health exceptions that are listed in the table. Other health checks can be performed on this conformation by picking the appropriate tool.

1C.M. Wilmott & J.M. Thornton (1990). Prot. Eng. 3 479-493.

© 2006 Accelrys Software Inc.