Profile Analysis can either be activated from the Protein Design palette or the QUANTA Applications menu. When activated from the Application menu the Protein Utilities menu is also displayed. Profile Analysis follows the method of Bowie, Luthy and Eisenberg in analyzing protein structures into 1D profiles which can be assessed against protein sequences to quantify the quality of a structural model.
U. Bowie, R. Luthy & D. Eisenberg "A method to identify protein sequences that fold into a known three dimensional structure" Science 253 164-170 (1991).
R. Luthy, J. U. Bowie & D. Eisenberg "Assessment of protein models with 3D profiles" Nature 356, 83-8 5 (1992).
Using this method, the environment of each residue in the protein structure is analyzed in terms of its secondary structure and environment, then a profile sequence is generated in which each residue is assigned to one of 18 environment classes.
The definition of residue environment is a function of two parameters: its side chain buried area and the polar environment of the side chain. The buried area of the side chain is defined as the difference between the solvent accessible area of the side chain and the maximum possible solvent accessible area. The maximum solvent accessible area is defined as the solvent accessible area for the side chain in the tripeptide of GLY-X-GLY when it is in a fully extended conformation; in this situation there are no other residues to occlude the central X residue.
The polar environment of a side chain is the proportion of the side chain area which is covered by polar atoms which can be either solvent or polar atoms abutting onto the side chain surface.
Three categories of solvent accessibility are defined as: E (exposed), P (partially buried), and B (buried). Dependent on the fraction of the environment which is polar atoms the partially buried category is further sub-divided into two categories and the buried category is sub-divided into three categories. These are designated P1, P2, B1,B2,B3 where the higher subscript denotes a greater polar environment. Combining the three recognized secondary structure types and these six side chain environment categories gives 18 possible residue environment classes.
A profile sequence is similar to a conventional sequence except that it lists residue environments rather than amino acids. From analysis of known structures it is possible to determine a quantitative score for the preference of each of the 20 amino acids for any of the 18 residue environments. With this means of scoring the suitability of an amino acid to a given residue environment, it is possible to do a conventional sequence alignment of an amino acid sequence to a profile sequence. An alignment score can then be calculated to give some measure of the suitability of that amino acid sequence to the profile sequence.
Calculating a structure profile requires several fairly time consuming calculations of residue buried area and polar environment and once these are calculated they are usually saved to the MSF file as extra information and retrieved whenever the profile of that structure is required in future. The Plot Structure Profile tool will calculate the profile for a structure (or restore it from the MSF, if possible) and generate a graph for the structure's own sequence assessed against its own profile. This plot is an indication of the quality of the model with a score for each residue in the structure. It is conventional to integrate the residue scores using a window of the order of nine residues as this produces a plot which is easier to interpret.
To compare profiles with other sequences, you should use the Select Sequence tool to select one sequence. It is then possible to use Plot Sequence Profile to generate a graph showing the score of the sequence against all currently active structures with profiles. This tool will assume the current alignment between sequence and structure(s). It is possible to attempt to optimize the sequence -structure alignment using the Align and Dot Plot tools.
Within Profile Analysis are tools to calculate the 3D profile for a selected structure. The calculation of buried areas and polar environments is slow. Therefore, once a profile analysis has been calculated, it is automatically saved to the MSF as extra information with the titles:
Once a profile has been calculated the molecule is colored according to the residue environment class. The Protein Utilities Legend tool can be used to toggle the display of the color legend.
For all currently active structures the residue buried area, residue polar environment and secondary structure are calculated and the 1D profile is derived from these data. This information is saved as extra information in the MSF. If the information is already present in the MSF then this is used and it is not recalculated. The assessment of the structure against profile for each active molecule is plotted to the sequence viewer. The window parameter used for the plot is controlled by the Profile Options tool.
You are prompted to select one sequence which may be an MSF or a sequence without an MSF. The selected sequence will be assessed against active molecules with profiles by the Plot Sequence Profile and Dot Plot tools.
For the current selected sequence and all active MSF structures the assessment of the sequence vs. the structure profile is plotted to the Sequence Viewer. If there is more than one active structure, then there will be more than one plot and these have the names of the structure in the graph legend area. The legend title includes the name of the sequence.
Dot plots are explained in Chapter 5. The dot plot parameters of window length and color range can be changed by the Options tool. The dot plot shows the current sequence against one structure profile and indicates possible alignment of sequence and structure by the stronger diagonal lines. A dot plot of a structure profile against its own sequence for a "good" structure will show the quality of data that can reasonably be expected with this method.
The current sequence is aligned against one active structure profile. The gap penalty used in this context should probably be small to correspond to the low scores that usually result from the scoring. The gap penalties can be changed by the Options tool.
Remove any gaps in the alignment of all active sequences.
By default, once a profile has been calculated and saved to MSF by the Plot Structure Profile tool it will be used in all future assessments and plots. This tool will enable recalculation of the structure profile which may be required if the structure has been changed.
The adjustable parameters are:
This tool saves the current calculated profile to the MSF. The standard MSF saving options are displayed.
This tool rereads the last saved version of an MSF and makes it current.