This utility is concerned with sequence analysis. It can be used for sequences for which the structure coordinates are not defined. There are six different types of analysis that can be displayed, three of which are prediction methods. The results of the analyses are usually presented as plots of property versus sequence position above the sequences in the Sequence Viewer. When there are gaps in the sequence alignment, then there will normally be gaps in the plots.
L. H. Holley and M. Karplus, Proc. Natl. Acad. Sci., USA, 86, 152- 156 and G. L. LaRosa et al, Science 249, 932-935.
J. Garnier, D. J. Osguthorpe and B. Robson, J. Mol. Biol. 120, 97-120 (1978).
G.D. Rose et, al. Science 229, 834-838 (1985).
J.L. Fauchère and V. Pliska Eur. J. Med. Chem - Chim.18, 369-375 (1930).
D. Eisenberg et. al. Faraday Symp. Chem. Soc. 17, 109-120 (1982).
Several methods are available to predict the secondary structure of a sequence. The three predictions that are used in Protein Design are the Momany, GOR, and Holley/Karplus methods of prediction. In addition to these methods, this module also provides tools to plot the hydrophobicity profile and conservation profiles on the active molecules.
Secondary structure prediction methods usually consider three classes of secondary structure: a-helix, b-stand and `neither of these'. Some methods may have a turn classification. Most methods derive, for each residue in the sequence, a probability, or propensity, of the residue occurring in each of the secondary structure types. The calculated propensities are plotted in the Sequence Viewer. The predicted secondary structure type for each residue is the type with the highest propensity with some allowance made for the fact that secondary structure elements are of some minimal length.
This prediction modifies the Zimm/Bragg method. The Zimm/Bragg method, which is based on the classical Chou-Fasman technique, was developed by Dr. Harold Scheraga and co-workers Momany, Lewis, and Zimmerman.1
The Zimm/Bragg method has two coefficients, a one for helices and a zero for non-helices. Momany modified this method by enhancing these parameters with additional values, so when specific characteristics were found in sequence, such as turns and anti and parallel b-sheet regions, the value of the coefficient increased.
This method makes an initial pass through the primary sequence to determine the Zimm/Bragg coefficients. Subsequent passes are then made to enhance the coefficients by identifying certain patterns found in the primary sequence.2
For example, in the initial pass on a sequence there are regions found and categorized as being helical. On the subsequent pass it is noted that several of those regions have polar residues separated by two residues, indicating a classical 1-4 helical arrangement. Therefore, the coefficients for those polar residues would be enhanced for their 1-4 helical character.
After multiple passes, the resulting prediction coefficients are normalized and used for the final prediction of helical, b-strand, and turns.
This prediction is based on a neural network that identifies three secondary types: helix, strand, and coil. This neural network is trained on 48 unrelated proteins. Its method for assigning a secondary structure uses a window of 17 residues to determine the central residue, recognizing that a residue may be affected by another residue eight places away in the sequence. This is implemented within QUANTA as a translated neural net.
Once the assignments have been made they are smoothed such that:
This method identifies four secondary structure types: helix, extended, reverse turn, and coil. It uses an analysis of a 17-residue window to determine the secondary structure of the central residue; the residues at the center of the window have greatest influence.
The parameters used in this method are derived from a statistical analysis of protein structures to determine the probability of each amino acid type occurring at each position in a 17 residue window around a residue of each of the four secondary structure types.
The prediction can be weighted for a particular type by varying the decision constant. This constant is subtracted from the score for a weighted secondary structure type.
The calculation of conservation profiles uses the table identified with the label CONSERV in the file $HYD_LIB/protein_seq_param.dat. This table defines 10 classes of amino acid (e.g. small, aromatic, acidic) and specifies which amino acids belong in which class. The degree of conservation between two amino acid types is the number of classes to which they both belong divided by 10 so the maximum conservation score is one. The conservation value of a column of aligned residues in the sequence table is the sum of all the pairwise conservation comparisons divided by the number of comparisons. The maximum value is one. The conservation profile can be smoothed by averaging over a range of residues. The window length is set by the Profile Options tool.
Hydrophobicity is a measure, for each of the amino acids, of its immiscibility with water. Generally apolar amino acids have higher hydrophobicity parameters and are more likely to occur on the interior of proteins rather than exposed to solvent.
There are three hydrophobicity scales used: the Rose5; Fauchère and V. Pliska6; and Eisenberg7 scales. The parameters are stored in the file $HYD_LIB/protein_seq_param.dat and alternative parameter sets can be added to that file.
The Rose scale is based on the statistical analysis of the environment of protein crystal structures. The Fauchère and V. Pliska scale determines the free energy of transfer of amino acid analogs between octanol and water. The Eisenberg consensus scale is an average of several other scales.
Hydrophobicity is usually analyzed by averaging over a fairly long window (e.g. in range 7 to 21 residues) and regions of low hydrophobicity are generally found to be loop regions of the protein which are exposed to solvent.
All of the parameters analyzed in this utility are plotted in the sequence viewer above the sequences. The parameters are usually calculated for all active sequences and the plot is aligned to the sequence so there may be gaps in the plot where there are gaps in the sequence alignment. If you exit the Prediction Utility, enter the Align and Superpose utility, and uses any of the tools there to change the sequence alignment then, where appropriate, the plot will be updated to keep in sync with the sequence. The hydrophobicity plot might be a useful in alignment as, generally, the hydrophobicity plots of two homologous structures are strongly correlated.
The plot legends are colored the same as the plot that they identify. By picking a plot legend you can toggle on or off the display of the plot. The legend for an undisplayed plot is colored gray. To change the display status of several plots it may be quicker to pick the plot title (at the top of the legend) and a selection dialog box is presented. To toggle on or off the display of all plots pick the G icon on the bottom left of the sequence viewer.
The secondary structure prediction tools are applied to all active sequences and the sequences recolored according to their predicted secondary structure. The secondary structure propensities for one sequence will be plotted in the Sequence Viewer. If there is more than one sequence active, then you are prompted to select one sequence for which propensities are plotted.
Predictions are automatically saved to a file which is given a name of the form sequence_method_predict.out where sequence is the sequence name and method is the prediction method. For MSFs there is an option to save the predicted secondary structure to an MSF as extra information.
This tool plots the hydrophobic profile for the active molecules.
This opens the Hydrophobic Profile Options dialog box
You can change the hydrophobicity scale and the window length used in the hydrophobicity plot. The window length for the conservation plot is also changeable.
The options in this dialog allow you to select different scales, residue window lengths, molecules, drawing averages, and difference profiles
This tool plots the conservation profile for two or more active molecules. The conservation profile is a measure of the extent of sequence conservation along two or more aligned sequences. Sequences must first be aligned. A high conservation number is given when similar chemical types of amino acid occur at a position, and a lower number is given when chemical types differ. Conservation profiles can be averaged over a window - the length can be changed via the Profile Options tool.
This plots a graph in the sequence viewer showing, for each column of residues in the sequence viewer, the number of residues which fit into a given chemical classification such as acidic or aromatic. The classes are defined in $HYD_LIB/protein_seq_param.dat under the keyword CONSERV. Inactive sequences are excluded from this analysis. The plot shows ten different classifications and may be difficult to interpret when all classifications are displayed simultaneously. You can toggle off or on the display of a given classification by picking its name on the plot legend. To change the display status of multiple classifications pick the legend title Composition and you will be presented with a dialog box.
This tool performs a Momany secondary structure prediction for each active sequences and recolors the sequences according to the predicted secondary structure. Each prediction is written to a file of the form sequence_momany_predict.out. The prediction propensities for one sequences are plotted in the sequence viewer.
This tool performs a GOR prediction based on all the currently active sequences. To get meaningful results the sequences must be aligned. The prediction is written to a file of the form sequence_GOR_predict.out. The prediction propensities are plotted to the sequence viewer.
This tool opens a dialog box that allows you to reset the ranges and variables for the GOR prediction.
This tool performs a Holley/Karplus Prediction for each active sequence and recolors the sequence according to predicted secondary structure. The predictions is written to a file of the form sequence_holley-karplus_predict.out and the prediction propensities plotted to the sequence viewer.
This tool allows you to change the secondary structure assignment for a single residue or range of residues. The mode of residue selection is determined by the Pick Residue and Pick Residue Range tool. Once the residue or residue range has been chosen, the Secondary Structure dialog box opens. Secondary structures can then be reassigned to the specified areas.
This allows you to select single residues in order to edit their secondary structures.
This allows you to select a range of residues in order to edit their secondary structures.
Predictions for sequences which are from MSF files can be saved as secondary structure extra information in the MSF. You will be prompted to give the data a label. Be careful not to confuse predicted secondary structure with that derived from analysis of the structure.
Saved secondary structure predictions can be restored from the MSF file.
This tool exits the palette. You will be prompted to save any unsaved secondary structure predictions to the MSFs.