18. Motif Database

Overview

This module provides tools that search a database for structures with similar folds to one active molecule. The reference motif, by default, is all the secondary structure elements of the active molecule. However, it is possible that elements can be set inactive and excluded from the reference motif.

Searching the Motif Database

Tools and Options Definitions

Motif Database Log File

Protein User's Reference

Superpose Motif

Domain Analysis

"Use of techniques derived from graph theory to compare secondary structure motifs in proteins," E. M. Mitchell, P. J. Artymiuk, D. W. Rice & P. Willett, J. Mol. Biol. 212 151-166 (1989).

"Pharmacocophoric pattern matching in files of 3D chemical structures: comparison of geometric searching algorithms," A. T. Brint & P. Willett, J. Mol. Graph 5 49-560 (1987).

"Correction to Bierstone's Algorithm for Generating Cliques," G. D. Mulligan & D. G. Corneil, J. ACM 19 244-247 (1972).

"Fast Structure Alignment for Protein Databank Searching," C. A. Orengo, N. P. Brown & W. R. Taylor, Proteins 14 139-167 (1992).

Searching the Motif Database

The Motif Database utility is based on the same principles as the Superpose Motif utility and the theory is explained in that chapter. The motif data for proteins from the Brookhaven Protein Databank are stored in the file $HYD_LIB/motif.geo. The reference template which is used to search the database is taken from a currently selected MSF. The template can be the entire MSF structure or a fragment specified by deselecting secondary structure elements using the Change Active tool. The time taken for a motif search job to run is dependent on the size of the search fragment and it is worthwhile to ensure you have defined the minimal reference motif. The search job can be left to run and the results from the log file reviewed later.

Tools and Options

Using this utility, a search can be done to match secondary structures in a test structure to known references. The reference database used is $HYD_LIB/motif.geo. The Auto Database Search can also be used in non-graphical QUANTA in a batch file.

There can be only one molecule active when using this module. If there is more than one active molecule when the module is entered then all but the first are made inactive. Additional structures can be appended to the database.

The motif database can be used in two ways:

Finding a match with a specific structure

Running an automatic search against all the entries

An automatic database search generates a log file that can be browsed for the results of the database scan.

Initially all secondary structure elements are active and the vectors are displayed with fat lines. Selecting elements switches them off and on. When this option is selected, the Pick Element tool is selected by default and the other tools are ungrayed. When this option is selected again all the tools are switched off and grayed.

This option is grayed until the Change Activity option is selected. It allows you to deselect any secondary structure element of the active structure. The tool is switched off when either the Pick Element Range, Pick Domain, or Change Active is selected.

This option is grayed until the Change Activity option is selected. It allows you to pick a range between two secondary structure elements and deselect all the elements within the range. If there is some conflict, for example if not all elements in the range are the same state, you are prompted with a dialog box to further specify conditions. The tool is switched off when either the Pick Element, Pick Domain or Change Active is selected.

This option is grayed until the Change Activity option is selected. It allows you to deselect any defined domain. The tool is switched off when either the Pick Element, Pick Element Range or Change Active is selected.

If no database file has been selected, this option opens the File Librarian, from which a database file may be chosen. Once a database has been selected, you are presented with a scrolling list of all structures in the database from which to choose. The selected structure is then tested against the reference motif defined by the secondary structure elements in the active molecule. Vectors are drawn representing the secondary structures of all the matches superposed on the reference motif. The browsing tools are ungrayed.

If no database file is open, this tool prompts you to select one and enter the name of a log file. Every structure in the database is searched for the reference motif defined by the currently active secondary structure elements. Depending on the size of the database, this computation may take considerable time. As each structure is searched in the database, its name is listed below the command line. The final results are listed to the textport and a log file in the same format as the tables.

This option reads the logfile created by the Auto Database Search tool. It displays the information about the motif searches for the reference molecule, and displays the possible available overlays after a search is done. There are two tables displayed, Overlay Motif and Secondary Structure Elements tables.

The Overlay Motif table displays information for each matched overlay. The columns from left to right are: the overlay number, the overlay name from the database, the number of elements matched, and the RMS difference of the superposition. The subsequent columns identify elements that are matched

The Secondary Structure Elements table displays information on each secondary type in the active molecule. The columns from left to right are: molecule name, element number, secondary structure type, the ID of the first residue in the secondary structure, and the ID of the last residue in the secondary structure.

This tool displays the matched overlays and unmasks the browse tools. After a search is completed and the Table Database Results tool has read the search results, you can display the results.

The Display Overlay From Database Search dialog box opens. One database structure must selected. Since the atomic coordinates of the database structure are not known at this stage, only the secondary structure vectors can be displayed.

This tool opens the Motif Superposition dialog box, with options for different parameters. These criteria are used in matching secondary structure elements, in specifying cut-off values and in choosing the number of matches to display. For the matching secondary structure options, toggles are set for each option determining which is used and the respective cut-off values.

This tool is masked until a search is performed and the Show Overlays tool has been picked. This is the default browse option that displays all resultant overlays.

This tool is masked until a search is performed and the Show Overlays tool has been picked. This tool displays each motif in ascending order. Once the last motif is displayed, the tool sequences back to the first.

This tool is masked until a search is performed and the Show Overlays tool has been selected.This tool sequentially displays each motif in descending order. Once the first motif is displayed it sequences back to the last.

This tool is masked until a search is performed and the Show Overlays tool has been selected. From a toggle list you select one or more overlays can be selected for display.

This tool is masked until a search is performed and the Show Overlays tool has been selected. It removes from the display any overlays and masks the browse tools.

This tool opens a File Librarian from which you can select or change the database file used.

This tool saves the active molecule to database. If no database file is open you are prompted to select one. This tool enables users to generate their own database or extend existing databases.

If molecules have been superposed but not saved this tool prompts the to save coordinates to an MSF file.

Motif Database Log File

This section describes the contents of a Motif Database log file, which can be browsed using the Sys Window option from the QUANTA File menu. The file uses the following keywords:

The name of the molecule MSF from which the reference motif was generated in msf_name.

The name of the library file is library_file.

The test structure is molecule_name, which, by default, is a four-letter PDB code.

The number of matches for molecule_name is number_of_matches. This line is followed by number_of_matches lines, one for each match, and uses the format:

N number_of_elements score list_of_matches

where:

Is the match number for that test molecule;

NUMBER_OF_ELEMENTS

Is the number of elements which are matched;

SCORE

Is the score from RMS superposition of matched element endpoints;

LIST_OF_MATCHES

Is a list containing the number of fields equal to the number of elements in the reference structure. The nth field contains the number of the element in the test structure that matches the nth element in the reference structure. If there is no match to the nth element of the reference structure, the field is blank.

18. Motif Database

Overview

This chapter describes:

For more information see:

References

Searching the Motif Database

Tools and Options

Change Active

Pick Element

Pick Element Range

Pick Domain

Overlay Database

Auto Database Search

Table Database Results

Show Overlay

Options...

All Overlays

Next Overlay

Previous Overlay

Select Overlay(s)...

Clear Display

Select Database...

Save Structure to Database

Finish

Motif Database Log File

*MOLECULE msf_name

*LIBRARY library_file

*TEST molecule_name

*NMATCH number_of_ matches