13. Analyze Domain Structure

Overview

Protein domains can be characterized, theoretically and experimentally, in several ways: by protein coordinates, by relative motions between domains, by the stability and folding of independent domains, or by different genetic origins and functions.

There have been several definitions of domains based on atomic coordinates. Domains can be defined in terms of:

Using inter-Ca distances

Deriving a single cutting plane

Minimizing the surface area of each domain

Grouping of structural elements

The Protein Design application uses geometric relations between secondary structure elements to automatically identify domains, and provides tools that allow you to define and edit the domains.

Analyzing Domain Structures

Tools and Options

G. M. Crippen, J.Mol. Biol. 126 315-332 (1978).

G. D. Rose, J. Mol. Biol. 134 447-470 (1979).

S. J. Wodak & J. Janin, Proc. Natl. Acad. Sci USA 77 1736-1740 (1980).

Analyzing Domain Structures

QUANTA describes a domain in terms of the secondary structure elements, rather than individual residues. A domain is defined as a group of close secondary structure elements and the loop regions are considered to be in the same domain as the secondary structure elements that they connect. The distance between two secondary structure elements is defined as the average distance between all pairwise combinations of Ca atoms in the two elements. If this average distance is less than a given cutoff distance the elements are considered to be in the same domain. The number of domains that the structure will subdivide into is dependent on the cutoff distance. For example, if the cutoff distance is decreased then fewer pairs of elements will qualify as being in the same domain and the protein will divide into more, smaller domains.

A simple clustering algorithm is used to analyze the distances between secondary structure elements and this generates a dendogram. A dendogram is a "family tree" of the secondary structure elements in which the pairs of elements which are closest in space are shown as most closely related in the tree.

Often the difficult step in domain analysis is deciding the appropriate cutoff distance for the inter-element average distance. The automatic algorithm will use a fixed value and report the number of domains which this will generate when you use the Number of Domains tool. You can alter the number of domains that are generated.

The Clustering Algorithm

This clustering algorithm finds the pair of closest secondary structure elements and joins them into one cluster. Then it repeatedly finds the closest pair of either individual secondary structure elements or clusters, which represent two or more elements. This continues until all the elements have been drawn together into one single cluster.

Clustering algorithms differ in how the distance from a cluster is calculated and how it is scaled compared with a distance from a single element. QUANTA's algorithm uses the distance from a cluster as an average of the distances from all the elements in the cluster. Therefore, the distance between two clusters is the average of all the distances between all the elements in one cluster and all the elements in the other cluster.

Associated with each cluster is a score that is the average of the distances between all the pairs of elements in the cluster. The result of the clustering is displayed as a dendrogram with the secondary structure elements listed down the screen. Elements or clusters that have been paired into a cluster are connected by a vertical line whose x-axis position is proportional to the cluster score.

Loop Regions

Residues in loop regions between secondary structure elements are assigned to domains using the following criteria:

1. Residues in loop regions between two secondary structure elements in the same domain are assigned to that domain.

2. For loop regions between secondary structure elements in different domains, a domain boundary is defined between two consecutive residues in the loop. The boundary is determined so as to minimize the sum of the distances from loop Ca atoms to the nearest secondary structure element in the same domain. All residues in the sequence before that boundary are assigned to the proceeding domain and residues in the sequence after the boundary are assigned to the following domain.

3. N-terminal residues that are not in secondary structure elements are assigned to the next domain along the protein sequence. C- terminal residues that are not in secondary structure elements are assigned to the previous domain along the protein sequence.

Tools and Options

The overall structure of a protein can be better seen if you have only the Ca atom trace displayed and colored according to secondary structure. The secondary structure elements can be highlighted by the Secondary Structure tool on the Protein Utilities palette. This shows a single vector for each element.

When this utility is used, only one molecule is active at a time. If more than one molecule is active when entering the utility, only the first remains active. If there is a domain definition saved to an MSF of a molecule, it is retrieved and used, otherwise a molecule is initially colored as a single domain.

All displayed molecules are colored to show their domain structure. For example, the first domain is color 1 (green) and the second domain is color 2 (blue). A legend on the bottom-right of the screen gives the molecule name and domain number in the appropriate color.

Three of the tools on the palette - Number of Domains, One More Domain and One Less Domain - automatically analyze the protein into some given number of domains. When these tools are selected, the molecule and sequence viewer are recolored to show the domain assignment of each residue.

There is a set of tools for manual assignment of secondary structure elements or individual residues to domains. If these are used, the molecule and sequence viewer coloring are updated appropriately, but the dendogram coloring is not changed. If any of the automatic assignment tools are used after the manual tools, then the manual changes are overwritten.

This tool toggles the display of a dendogram.

This tool displays the Enter Number of Domains dialog box from which to select the number of domains to be assigned to the protein. The maximum number of domains is equal to the number of secondary structure elements. The initial value in the dialog box is the automatic algorithm's best estimate of the number of domains using a fixed inter-element cutoff distance.

This option increases the number of domains by one. It is grayed out when the Number of Domains option has not been previously selected. When the maximum value has been reached no more domains are added.

This option decreases the number of domains by one. It is grayed out when the Number of Domains option has not been previously selected. When one has been reached, no more domains are subtracted.

This tool displays the Pick Range palette from which to select a range of residues either off the sequence table or active structure. Once the range is selected, you are prompted to select a domain from a multiple choice list. The selected residue range will be assigned to the selected domain.

This tool prompts you to select a domain from a multiple choice list, and to select an atom in the element that is reassigned to the selected domain.

This tool displays the Pick Range palette from which to select a residue range. Once the range has been selected, it is assigned to a new domain and given the next unused number.

This tool displays the Pick Range palette from which to pick two residues, one in each of two domains that are to be merged.

This tool reverts the latest edit done on the domains.

This tool lists to the textport the identity and residue range of the domains in the active molecule. The format is:

Domain identifier - first residue in range - last residue in range

This tool writes domains to the file with filename MOLECULE_domain.out. The format of the file is:

Domain identifier - first residue in range - last residue in range

This tool writes inter-secondary structure geometry to the file with filename MOLECULE_geometry.out. Listed for each pair of secondary structure elements is the structure type (such as B= beta strand or H= alpha helix), the ID of the first and last residue, the minimum and average distances between them, and the angle between them.

This tool displays a dialog box to change the setting of the Cutoff Difference in Average Distance. The default is set to 2.5 Å.

This tool saves domains as extra information to the MSF.

This tool displays a dialog box that has extra information titles from which to read the domain structure.

This tool removes the Domain Analysis palette and returns the Protein Design palette. If the domain structure has been changed, a dialog box for each structure is displayed offering the option of saving domain structure information to the MSF.

13. Analyze Domain Structure

Overview

This chapter describes:

References

Analyzing Domain Structures

The Clustering Algorithm

Loop Regions

Tools and Options

Display Cluster

Number of Domains

One More Domain

One Less Domain

Reassign Residue Range

Reassign Element

Create Domain

Merge Domains

Undo Domain Edit

List Domains

Write Domains to File

Write Geometry to File

Options

Save to MSF...

Reread MSF...

Finish