G. Running the Search Standalone

Overview

The Protein Design application includes an interface to the search program, but input files can be created and/or edited and the search run independent of QUANTA. The format of the input file is described below.

Search Commands

Each line of a search command file consists of a four-letter command keyword that is followed by a list of parameters that are free format, separated by spaces. Many of the parameters need not be specified, but if subsequent parameters in the command line will be specified, then a * character should be entered in the command line to signify a missing parameter. Missing parameters are given default values. In the following documentation, the keywords and their arguments are listed. Arguments for which there are default values and which can be given as * are enclosed in square brackets, ([]).

The name of the database file. By default the name is $HYD_LIB/database.dat.

The name of the wildcard file. By default the name is $HYD_LIB/wildcard.dat.

Defines a range of proteins to be searched. The numbers n1 and n2 refer to the position of the protein in the database file. By default, all proteins in the database are searched.

Enter a list of proteins to be searched. The names should correspond to the original PDB file names. The names will be converted to uppercase. The PDB file names stored in the database are in uppercase. For a long list of file names, this command can be invoked more than once.

Search only those proteins with a text string matching the keyword in their description. Note that the text in the description is all uppercase and the keyword text will be converted to uppercase.

Only one keyword should be entered on each command line. This command can be multiply invoked; any protein that matches any of the given keywords will be searched.

Search only the molecules with resolution less than or equal to resol.

For each hit protein, extract the header information for that protein from the database file and write it to the log file. This can be used in conjunction with the commands that specify a protein (MRNG, MNAM, MKEY and RSLN) without any structural template being defined.

Stop search once nhit hits have been found. The default is 50.

The distance tolerance on all inter-fragment inter-atomic distance searches. The default is 1.0 Å.

The angle tolerance on torsion angle searches for phi and psi torsion angles. The defaults are 30.0°and 30.0°.

The angle tolerance on torsion angle searches for C alpha pseudo torsion angles. The default is 50.0°.

The angle tolerance on torsion angle searches for side chain torsion angles. The default is 30.0°.

Signifies the start of defining a template for a fragment of nres residues. This card must be followed by nres RESD cards. The template is given a name that is used for reference by the CONS command.

Specify each residue in a fragment template.

Restyp is the single character code for the residue type or a number for the wildcard template. The default is wildcard. This residue in the template can be any residue type.

Secstr is a character code to denote the secondary structure type:

H- folded conformation

A- alpha helix

T- a turn

3- a 3 residue turn

4- a 4 residue turn

5- a 5 residue turn

E- extended chain

N- N terminal residue

If any of these codes are prefaced by a NOT then the search is for residues not of that secondary structure type. The default is wildcard.

Phi, psi specifies main chain torsion angle in degrees.

Cators for residue i is the pseudo torsion between the four consecutive C alpha atoms Ca(i-1) - Ca(i) - Ca(i+1) - Ca(i+2).

Sidtor is the side chain torsions as defined by IUPAC-IUB and listed in Appendix H. This line should follow the RESD card for the residue to which it applies.

The DIST card must follow the RESD card for one of the two residues to which it applies. This card is optional to specify interatomic distances between two residues in the same template. The actual distance will be between the Ca atom or the sidechain center dependent on the type parameter which may be:

CACA Ca-Ca distance
CASA Ca-sidechain distance
SICA sidechain-Ca distance
SISI sidechain-sidechain distance

The distance from this residue to residue tares residues on in the template should be within the tolerance distance DTOL of tardis.

Defines a constraint between two residues in different templates. The constrained parameter is defined by contype:

CACA Ca-Ca distance
CASI Ca-sidechain distance
SICA sidechain-Ca distance
SISI sidechain-sidechain distance

IRNG number of residues in sequence between
template residues
XRNG exclude range of number of residues in sequence

The residues between which the constraint holds are given in the format:

template_name:residue_number

Minlim and maxlim are the minimum and maximum allowed values of the parameter specified by contype.

Chntest flags to test if the fragments are in the same protein chain. The segment IDs given in the Protein Data Bank file are taken to indicate the different chains. If chntest is zero, no check is performed; if chntest is 1, the fragments must be in the same chain.

Output some diagnostic information to log file.

Running a Search

The standard command line to run the database search is:

$HYD_EXE/search job_name

If the jobname is omitted, you are prompted for it. The input command file, job_name.ddb, has been described above. The search program creates two output files. The log file (job_name.log) contains the following:

A listing of the input command file

A list of the tests (such as sequence and secondary structure) that have been performed

Information on the database file used

For each hit, the name of the protein (the PDB filename).

For each residue, the residue ID, residue type, and secondary structure code

If a MSF doesn't exist for this protein structure in the MSF library or your working directory, then there is a warning message to that effect.

The selection file (job_name.sel) is in standard QUANTA selection format. This enables the selection of all the hits that occur in structures for which there is an MSF either in the MSF library or you working directory. If the MSF isn't found, then the selection file does not include that structure.

G. Running the Search Standalone

Overview

Search Commands

DBAS database_file

WILD wildcard_file

MRNG n1 n2

MNAM name1 [name2] [name3] .......

MKEY keyword

RSLN resol

INFO

NHIT nhit

DTOL distance

ATOL angle1 angle2

CTOL angle

STOL angle

TMPL nres template_name

RESD [restyp] [secstr] [phi] [psi] [cators]

SIDE [sidtor1 sidtor2 sidtor3 sidtor4 sidtor5]

DIST type tarres tardis

CONS contype tmpnam1:res1 tmpnam2:res2 minlim maxlim [chntest]

DBUG

Running a Search