C. File Formats

This appendix describes the format of several files that are used with QUANTA, including molecular structure files (.msf), residue topology files (.rtf), display parameters files, external files, template files, template files for hydrogen atom addition, dummy atom files (.dum), brick map files (.mbk), quanta plot files (.qpt), ChemNote data files, and atom type files for chemnote and the Molecular Editor.

Molecular structure files

Molecular structure files (MSFs) contain data and information about a molecule. An MSF contains three levels of information:

Segment data that applies to regions of the structure

Group data for individual groups or residues within the molecule

Atom data for individual atoms

In addition, extra information such as solvent accessibility, thermal mobility, and electrostatic potential can be included in the MSF. Extra information is incorporated as a number associated with each atom and can be retrieved through a label. This label enables the selection and coloring of a molecule based on one of these parameters. The extra information can also hold pointers to surface files, symmetry information, or vectors for each atom. In this way, virtually any information about a molecule can be held in the MSF.

The MSF is a sequential binary file. Word length is assumed to be at least 4 bytes in QUANTA unless specified otherwise. The records in the file are as follows.

1. nseg, ngroup, natom, version, header

This record is three integers, the dataset version (character * 10), and a header (character* 200) that contains various flags. The integers represent the number of segments in the file, the number of residues, and the number of atoms.

For files created in QUANTA98 and beyond, the version number is QUANTAR98. QUANTA 2006 reads earlier versions (e.g., QUANTAR3.3) and automatically updates them to QUANTA 2006 format. The utility $HYD_UTL/.msfi2i4 allows conversion between the old and new versions, if that is required.

2. TITLE lines

Character*80 records, containing the title for the file. Last line is END.

3. Segment Data consists of three segment records each of length nseg. They are:

a. Segment names (character * 4)

b. Residue pointers (integer*4) that point to the first residue in the segment

c. Number of residues in the segment (integer*4)

4. Residue Data contains nine records each of length nresidues. They are:

a. Residue identifiers (character*6)

b. Residue names (character*4)

c. Atom pointers (integer*4) point to the first atom in the residue in the atom list

d. Number of atoms in the residue (integer*4)

e. Segment numbers (integer*4)

f. X Coordinate of the center of the residue (real)

g. Y Coordinate of the center of the residue (real)

h. Z Coordinate of the center of the residue (real)

i. Radius of the residue from the center (real)

5. Atom Data consists of seven records each of length natom:

a. X coordinate (real)

b. Y coordinate (real)

c. Z coordinate (real)

d. Atom names (character*6)

e. Residue numbers (integer*4)

f. Atom types (integer*4)

g. Atom charges (real)

The residue number is a pointer back into the residue data so that the appropriate residue information is available for each atom. The number for each atom type points to parameters held in the parameter file.

6. Extra Data

The QUANTA MSF format allows the storage of extra per-atom information. An MSF created by QUANTA will by default contain a set of extra information that is loaded into the fourth parameter array. This set is the atomic temperature factors labeled BVALUE.

Eight types of data can be stored: real, integer, integer*4, real vector, symmetry, connectivity, bond orders, and atom constraints. Each extra piece of information requires two records (four in the case of a real vector).

a. The first record is a header record: label type nitems file

where:


Label	is a character*10 unique label used within QUANTA to reference the information.
Type	is character*4 defining the type of data as REAL, ASTR, INTG, INT2, or REA3.
Nitems	says how many items are in each record
File	(*100) is the name of a file. The filename is used when the information is a pointer to items in a separate file (for example, in a surface file).

b. The next record(s) contains the nitem pieces of data.

c. For REAL, INTG, ASTR, and INT2, there is just one record. (Atom constraints (ASTR) can only be generated by the program.) For REA3 there are three records.

d. Connectivity information and bond order information can only be generated by the program:


Connectivity	Label	CONNECT
	Type	BOND
	Nitems	natom

A record containing the connectivity information would be: (nbc (i), (ibc (j,i,),j = 1, ncb (i)), i = 1, nitems)

where:

nbc

is an integer*4 array containing the number of bonds to each atom.

bc

is an integer*4 array containing the atom numbers of the connected atoms.

nbc	is an integer*4 array containing the number of bonds to each atom.
bc	is an integer*4 array containing the atom numbers of the connected atoms.

Bond orders

Label

ORDER

Type

BOND

Nitems

number of bonds

Bond orders	Label	ORDER
	Type	BOND
	Nitems	number of bonds

A record, containing a list of the bond types for each bond would be: (ibf (i), (ib(j,i), j = 1,2), i = 1, nitems)

where:

ibf

is an integer*4 array containing bond type for each bond (single = 1, double = 2, triple = 3, aromatic = 7, non-ring resonant = 12).

ib

is an integer*4 array containing the atom numbers of the atoms forming this bond.

ibf	is an integer*4 array containing bond type for each bond (single = 1, double = 2, triple = 3, aromatic = 7, non-ring resonant = 12).
ib	is an integer*4 array containing the atom numbers of the atoms forming this bond.

e. There is a special format for symmetry information. The records are character*80 records, which begin with a five- character special label. These are CELL, CRYS, SYMM, SYMMC and END. The CELL line contains a, b, c (angstroms), alpha, beta, gamma (degrees), and the space group name. The CRYS line contains the cell type (TRICLINIC, MONOCLINIC).

The SYMM line contains symmetry operations as defined in the International tables (this should only include the unique symmetry operations); lattice translations are defined by the lattice type.

SYMMC lines contain matrices defining non-crystallographic symmetry.

END defines the end of the symmetry information.

Display parameter file

The display parameter file (param.par) contains the names, bonding, van der Waals radii, and energy parameters of the atoms recognized by QUANTA. This file is read every time QUANTA is started up. This data structure and the QUANTA dictionaries assign graphical display parameters to atoms. For molecules read into QUANTA from an external coordinate file, atom types are assigned solely on the basis of the atom name. This assignment may not be the same as the types determined in ChemNote or the Molecular Editor applications. The parameters in this file are consistent with those used by CHARMm. However, CHARMm accesses a different data structure (PARM.PRM) to obtain parameters for calculation. If a display parameter file is not specified, then all atoms are set to a predetermined default value.

See the QUANTA Parameter Handbook for a complete listing of the atom types used in QUANTA.

This format example is part of $HYD_LIB/param.par, the parameter file for proteins.

no	bndrad	vdwrad	plurad	global	emin	rmin	patom	hbond	atype	atmass
1	0.4000	0.9000	0.1000	F	-0.0498	0.800	0.044	D	H	1.00800
2	0.4000	0.9000	0.1000	F	-0.0498	0.600	0.044	D	HC	1.00800
3	0.4000	1.0000	0.1000	F	-0.0420	1.330	0.1	N	HA	1.00800
4	0.4000	0.9000	0.1000	F	-0.0498	0.920	0.044	D	HT	1.00800

The following values are applied to the atom type number listed in the first column:

Bonding radius (bndrad)

van der Waals spheres radius (vdwrad)

Sphere radius (plurad) in plots

Value (global) used in the global search for bonds

Value (emin and rmin) used in the calculation of van der Waals and electrostatic energy.

The atom is either a hydrogen bond acceptor (A), hydrogen bond donor (D or E), or not hydrogen bonded (N). The atom is "atype" CHARMm type. The CHARMm type is included for reference only. The energy parameters are taken from the CHARMm topology and parameter files for CHARMm version 22.

Atomic mass

If the hydrogen atoms are not defined explicitly, then the parameters for the atom to which they are bonded are modified to take some account of the hydrogen atoms.

The modified atom types are called extended atom types. These CHARMm atomtype names end with the letter E preceded by the number of undefined hydrogen atoms allowed for in parameterization (for example, CH2E is the atom type for an aliphatic carbon atom with two undefined attached hydrogen atoms).

If you do not enter a parameter for a particular atom type, the atom type uses the default of:

no	bndrad	vdwrad	plurad	global	emin	rmin	patom	hbond	atype	atmass
499	0.9	1.4	0.2	F	0.0	0.0	0.0	N	DEFA

Atoms with Mxx are metals; atoms with Xxx are halogens.

If you want to extend the parameter file to include your own atom types, it is recommended that you use atom numbers between 300 and 400.

External file formats

A variety of external file formats are used in QUANTA for input and output of atomic data. This section defines what these formats mean to QUANTA.

Terms used are:

x y z - orthogonal angstrom coordinates of each atom
resid - residue identifier (residue number)
resnam - residue name, such as TRP GLU
atnam - atom name, such as CE2 CA N
bvalue - bvalue or some fourth parameter

Protein Data Bank format

QUANTA reads a standard PDB data file. Atomic coordinates are taken from both ATOM ("standard" groups) and HETATM ("non-standard" groups) records.

Chain identifiers (if present) are used to define segments. If none are present, a segment name is created. HETATMs are placed in separate segments.

The second character of two-symbol element names is lowercase on input. Lowercase characters are specified in QUANTA by preceding the character with the escape character, usually, but this can be altered using SET ESCAPE. The atom names are all left-justified. This process is reversed on output, to maintain the correct PDB convention.

CHARMm/CNX/X-PLOR PDB variant

The CHARMm/CNX/X-PLOR PDB must be used if a file is to be produced or read by CHARMm, CNX, or X-PLOR. The PDB differs from the standard Brookhaven PDB format in the following respects:

1. The segment name is a four-character string in columns 73 through 76.

2. On export, any * characters in nucleic acid names are converted to ` (the * is the CHARMm/CNX/X-PLOR wildcard character).

3. On export, amino/nucleic acids are not reordered to the Brookhaven conventional order.

4. Atom names are read or written straight into the atom name field (the Brookhaven convention is right-justified within the first two characters of the atom name field).

5. CNX and X-PLOR expect the residue ID to be left-justified.

CHARMm ASCII format

QUANTA recognizes both the standard CHARMm and Brunger CHARMm formats. Output is only in standard format. The CHARMm format (.crd) is as follows:

1. TITLE lines (character*80) begin with a *. Last title line is * followed by at least seven blanks. Natom --- defined as i5

2. ATOM lines: atom# resid1 resnam atnam X Y Z segid resid2 bvalue

3. format: I5,I5,1x,a4,1x,a4,3f10.5,1x,a4,1x,a4,f10.5

You are given the option of using resid1 or resid2 as the residue identifier.

Konnert format (command mode only)

resnam resid atnam X Y Z bvalue 2x,a4,1x,a4,a4,4f10.5

Diamond format (command mode only)

X Y Z bvalue resid resnam atnam 4f10.5,6x,a4,15x,a3,7x,a4

CHARMm binary format

The CHARMm binary format (.dcd) is used to hold many sets of coordinates, including the results of a dynamics run at various time steps. The format is as follows:

HDR,ICNTRL

character*4 HDR, integer icntrl(20)
real*4 X(NATOM), Y(NATOM), Z(NATOM)
real*8 XTLABC(6)
logical QCRYS

HDR - not used in QUANTA
ICNTRL - contains information about the datasets held in file
QCRYS=ICNTRL(11).EQ.1

(1) - number of datasets in the file. Not necessarily correct.
(2) - time of the first dataset - usually in femtoseconds
(3) - time step between datasets
(9) - number of fixed atoms (NFIXED)
(11) - 1 for crystal/constant pressure calculation, 0 otherwise.
(20) - version number (22 for CHARMm 22, 0 for previous
version)

TITLE

ntitl,(title(i),i=1,ntitl)
charager*80 title(32)

NATOM

NFREAT = NATOM-NFIXED

If the number of fixed atoms (NFIXED) is not 0, then the next record is:

IFREAT(I),i=1,NFREAT) integer ifreat(*) points to the free atoms in the whole list of atoms.

IF(QCRYS) XTLABC

(X(I),I = 1,NATOM)
(Y(I),I = 1,NATOM)
(Z(I),I = 1,NATOM)

The coordinates for the first dataset in the file. Time icntrl(2). If NFIXED is 0, this first dataset is the complete set of data giving positions for both the free and fixed atoms.

Note: If this is a file from a Crystal/Constant Pressure calculation, i.e. ICNTRL(11)=1 and QCRYS=TRUE, then there will be an extra record containing symmetric shape index data, XTLABC. XTLABC is a symmetric shape matrix, only lower triangle is used.

To the end of file

IF(QCRYS) XTLABC
(X1(I),I = 1,NFREAT)
(Y1(I),I = 1,NFREAT)
(Z1(I),I = 1,NFREAT)

Coordinates for the next dataset. If NFIXED is not 0, these coordinates are used as:

do 10 i = 1,NFREAT
10 X(IFREAT(I)) = X1(I)

CHARMm binary property files

The CHARMm binary property files are similar to the binary coordinate files except there is only one entry for each dataset (in contrast to the three - x y z in a coordinate file).

Fractional coordinates (old Cambridge database format)

The Cambridge database file has a four-line header. In the following example, required spaces are represented by explanatory notes enclosed in square brackets [], indicating the number of spaces.

Line 1

'Reference_Structure_=_',I5,'[3 spaces]A,B,C=',3F8.3

Line 2

[2spaces]ALPHA,BETA,GAMMA=',3F8.3,4X,'SPGR_=',I3, 1X, A6

Line 3

I3,3X,'0 CODON=_',9X,'SYMOPS=',15

Line 4

2X,'0',5X,'RFAC=_',F3.1,'_ERRFLAG=0_(C-C)ESD=0'

Line 1	'Reference_Structure_=_',I5,'[3 spaces]A,B,C=',3F8.3
Line 2	[2spaces]ALPHA,BETA,GAMMA=',3F8.3,4X,'SPGR_=',I3, 1X, A6
Line 3	I3,3X,'0 CODON=_',9X,'SYMOPS=',15
Line 4	2X,'0',5X,'RFAC=_',F3.1,'_ERRFLAG=0_(C-C)ESD=0'

where the Reference Structure is the Cambridge Data Bank code number for the structure. The cell parameters given as A, B, C and ALPHA, BETA, GAMMA, and the space group code are the only header information used and stored by QUANTA. If you are writing out a file in C. D. B. format, you must provide the other information or accept meaningless defaults.

The coordinates (in cell fractional coordinates) then follow, using the form atom number, atom name, fractional coordinates, and atoms bonded to this atom (e.g., I4, 1X, A4, 1X, 3F10.5, 1X, 6I4). All atoms bonded to each atom should be listed so each bond is in effect defined twice.

Cambridge database FDAT format

Due to licensing considerations, for more information regarding this format, please contact:

Crystallographic Data Centre
12 Union Road
Cambridge
CB2 1EZ
UK
+44 1223 336408

Gromos atom format

TITLE LINE

Number of atoms I5

Coordinates

Residue number, residue type, atom type, atom number, orthogonal coordinates (e.g., I5, 2A5, I5, 3F8.3)

Orthogonal coordinates may be in angstroms or nanometers. The program tests the coordinates and suggest which units are being used, but you can override this decision.

The Gromos program expects the atoms of a residue to be given in a specified order. When outputting Gromos files, QUANTA attempts to reorder atoms correctly. The file $HYD_LIB/gromos.ord contains a list of all the amino acid atoms in required order. You can edit this file if necessary. The required ordering for non-amino acid residues is not included in the file. The routine expects atom names to follow IUPAC-IUB conventions. If they do not, atoms are liable not to be recognized and placed at the end of the residue. You are informed when this is the case.

Converting Gromos Trajectory Files. The program called, GROCH converts Gromos format trajectory files to CHARMm format trajectory files which can be read by QUANTA. When using this program, you should be prepared to provide the following information on the Gromos format file:

Coordinates in angstroms or nanometers

Number of atoms in dataset

Number of datasets

Accelrys provides the GROCH program as source code in the QUANTA Utility directory.

QM coordinate file format

Coordinate files which written as a result of quantum mechanics calculations are identified in QUANTA by the extension .qmc.

Dictionary files

Nohpro.dic file

The nohpro.dic file provides a reasonable assignment of charges in proteins when there are no hydrogens in the structure. The file contains documentation describing the strategy, which is essentially:

1. If the amino acid residue is charged, then the sidechain total charge adds up to +1 or -1.

2. If it is not charged, then:

a. If it is a donor, the total charge is slightly positive.

b. If it is an acceptor, the total change is slightly negative.

c. If it is both, the total charge is null.

Generic.dic file

The generic.dic file adds some additional charges to atoms, reduces the default charges on oxygens, and adds a default charge on nitrogen.

Template files

Template files contain ideal coordinates for a residue along with other information required to perform mutations.

The first record consists of:

resnam natom npos ntor

where:

resnam =

residue name;

natom =

number of atoms in the residue, number of atom records in the file;

npos =

number of records defining bonding used to fit template mutations;

ntor =

number of torsion angles to be set up after mutation.

resnam =	residue name;
natom =	number of atoms in the residue, number of atom records in the file;
npos =	number of records defining bonding used to fit template mutations;
ntor =	number of torsion angles to be set up after mutation.

1. N Atom Records - The format for each atom record is:

ATNAM TYPE X Y Z

where:

ATNAM =

atom name

TYPE =

atom type

X Y Z =

atomic coordinates

ATNAM =	atom name
TYPE =	atom type
X Y Z =	atomic coordinates

These files are adapted from the Brookhaven Protein Data Bank (PDB) format so there are extra fields in this record which are not used by QUANTA.

2. N Pos Records - The format for each pos record is:

AT1 AT2 AT3 AT4

This specifies that the position of AT1 is defined in terms of the of AT2, AT3 and AT4.

3. N Tor Records - The format for each tor record is:

At1 At2 At3 At4

This specifies that, if AUTO TOR is turned on, the torsion angle At1- At2-At3-At4 should be set up after the mutation.

The example shown here is the template file for Valine, $HYD_LIB/tmplatnoh/val.pdb.

VAL 7 3 1
ATOM 151 N VAL 19105 52.833 36.648 0.000 -0.35 -10.00
ATOM 152 CA VAL 19 11 53.367 37.966 0.000 0.10 -10.00
ATOM 153 C VAL 19 14 54.887 37.966 0.000 0.55 -10.00
ATOM 154 O VAL 19 40 55.529 36.947 0.000 -0.55 -10.00
ATOM 155 CB VAL 19 11 52.723 38.811 1.142 0.00 -10.00
ATOM 156 CG1 VAL 19 13 53.562 38.898 2.418 0.00 -10.00
ATOM 157 CG2 VAL 19 13 52.366 40.201 0.587 0.00 -10.00
5 2 1 3
6 5 2 3
7 5 2 3
1 2 5 6

The template files are in subdirectories of the library directory and references to them are kept in template library files in the library directory. There are three sets of templates:

tmplatnoh.tlf - Proteins with no hydrogen atoms defined

tmplatpol.tlf - Proteins with polar hydrogen atoms defined

tmplatall.tlf - Proteins with all hydrogen atoms defined

QUANTA determines the correct files to use automatically. The default template library is tmplatnoh.tlf for proteins with no hydrogen atoms defined.

The following example, the protein_polarhydrogen template library file, $HYD_LIB/tmplatpol.tlf, shows the template library format.

*N CA C
tmplatpol/ala.pdb Alanine ala A
tmplatpol/arg.pdb Arginine arg R
tmplatpol/asn.pdb Asparagine asn N
tmplatpol/asp.pdb Aspartic_acid asp D
tmplatpol/cys.pdb Cysteine cys C

The first record is a * followed by the names of three backbone atoms, that is atoms whose position is invariant when residues are mutated).

If these three atoms are not found in the residue to be mutated, the program issues the error message: Join atoms not properly defined. If this error message appears, you should not continue with the mutation. Old versions of the template library files do not contain this first line and the program assumes that the molecule is a protein with main chain atoms N, Ca, C.

The file then contains one record for each residue, listing the name of the template file, the name of the residue, a three letter code for the residue, and a one letter code for the residue. The residue name must be a single word. If necessary, words can be connected with underscores, for example, aspartic_acid).

During interactive mutation, QUANTA looks at the atoms present in the residue to be mutated and attempts to assign the correct template library file.

Hydrogen atom addition template file for Protein Design

In the Protein Design application, QUANTA contains a simple algorithm to add hydrogen atoms to a protein. This algorithm uses a template to fit hydrogen atoms, but it does not perform energy minimization so the structure geometry may not be ideal. However, a molecular mechanics program such as CHARMm can be used to improve the structure geometry. The hydrogen bond addition routine assumes atoms have the atom type numbers defined in the param.par parameter file. If errors occur when you use this routine, it is probably due to bad atom type assignments.

The hydrogen addition algorithm superimposes a template over the existing atoms in the structure and then takes hydrogen atom coordinates from the template and adds them to the structure. In order to fit the template unambiguously, the template must contain the coordinates of three atoms which already exist in the structure. These three atoms are usually the atom to which hydrogen atoms will be attached and the two first neighbors. For example, in adding two hydrogen atoms to a tetrahedral carbon atom which is already bonded to two non-hydrogen atoms, the required template must contain coordinates for the atoms.

For some structures, the atom to which hydrogen atoms are to be added may only have one first neighbor. In this case, the third guiding atom in the template must be a second neighbor. For example, in adding three hydrogen atoms to a methyl carbon atom, the template file contains coordinates for the atoms.

The coordinates for all the templates are contained in a single file hydtpl.dat which may be found in the Library Directory.

The following format is the template for methyl C:

	* add 3 hydrogens to tetrahedral C
	4  1  1  3  2
	13
	10
	2  	3.980    	-1.526    	-2.575
	0  	3.494    	-0.781    	-1.318
	0  	1.956    	-0.745    	-1.257
	3  	5.069    	-1.537    	-2.593
	3  	3.607    	-2.550    	-2.558
	3  	3.607   	 -1.019    	-3.465

Lines beginning with a * are treated as comments and are not read by the program. Each template begins with one or more comment lines.

The first line read by the program contains the following parameters:

Template number
Number of first neighbors
Number of second neighbors
Number of hydrogen atoms
Polar/non polar code

The template number must correspond to the position of the template in the template file. The polar/nonpolar code has a value of 1 for polar hydrogen atoms and 2 for nonpolar hydrogen atoms.

The next line lists the atom types for which the template is applicable. This is followed on the next line by the new atom type codes for the atoms once the hydrogens have been added.

The atom coordinates that follow are in the order:

atom to which hydrogen atoms will be attached
first neighbors
second neighbors
hydrogen atoms

The first column of the coordinate data is the atom type code. It is redundant except for hydrogen atoms. This is the atom type code that is given to added hydrogen atoms. Table 46 lists the atom type codes.

Table 46. Atom type codes
Atom number CHARMm atom code Number of hydrogens added Geometry New atom type

11

CH1E

1

tetrahedral

20

12

CH2E

2

tetrahedral

10

13

CH3E

3

tetrahedral

10

23

C5RE

1

trigonal

21

24

C6RE

1

trigonal

22

71

SH1E

1

trigonal

70

201

CU1E

1

trigonal

16

202

CU2E

2

trigonal

16

203

CW1E

1

trigonal

17

204

CW2E

2

trigonal

17

205

NP1E

1

trigonal

32

206

N5RE

1

trigonal

34

207

NT1E

1

tetrahedral

36

208

NT2E

2

tetrahedral

36

209

NT3E

3

tetrahedral

36

210

NC2E

2

trigonal

37

211

OI1E

1

tetrahedral

44

212

OT1E

1

tetrahedral

45

220

N6RE

1

trigonal

35

221

NP2E

2

trigonal

32

Table 46. Atom type codes
Atom number	CHARMm atom code	Number of hydrogens added	Geometry	New atom type
11	CH1E	1	tetrahedral	20
12	CH2E	2	tetrahedral	10
13	CH3E	3	tetrahedral	10
23	C5RE	1	trigonal	21
24	C6RE	1	trigonal	22
71	SH1E	1	trigonal	70
201	CU1E	1	trigonal	16
202	CU2E	2	trigonal	16
203	CW1E	1	trigonal	17
204	CW2E	2	trigonal	17
205	NP1E	1	trigonal	32
206	N5RE	1	trigonal	34
207	NT1E	1	tetrahedral	36
208	NT2E	2	tetrahedral	36
209	NT3E	3	tetrahedral	36
210	NC2E	2	trigonal	37
211	OI1E	1	tetrahedral	44
212	OT1E	1	tetrahedral	45
220	N6RE	1	trigonal	35
221	NP2E	2	trigonal	32

Dummy atom file

Dummy atom files (.dum) have the following format:

NAME TYPE nat other

where:

NAME is the dummy atom name

TYPE is the type MIDPoint, VECTor etc.

nat is the number of atoms defining this dummy, these follow on the

nat subsequent records.

other depends on type. For example there may be 3 coordinates for COOR, a distance for VECT and PERP and so on.

The following example is a file for a static dummy which was defined as a midpoint of 3 atoms. The commented lines (starting with #) form the midpoint definition.

DUM1      	COOR 0           		.103      	.332         	.418
#DUM1     	MIDP 3
#ATOM C2 	RESI      BENZ:1    MOLE benz.msf
#ATOM C4 	RESI      BENZ:1    MOLE benz.msf
#ATOM C6 	RESI      BENZ:1    MOLE benz.msf

The following file is used for a dynamic dummy:

DUM1      	MIDP 3
ATOM   C2   	RESI      BENZ:1    MOLE benz.msf
ATOM   C4   	RESI      BENZ:1    MOLE benz.msf
ATOM   C6   	RESI      BENZ:1    MOLE benz.msf

Brick map file

The QUANTA brick map files (.mbk) can be used to store 3D information on a grid. These files can then be used to create contoured wire-frame representations of the information, graphical objects in QUANTA, or a map within the X-Ray structure package.

The information on this 3D grid is stored in bricks. The overall grid is divided into 6 x 6 x 6 pieces, and the data for each brick is stored in a single direct-access record. This approach increases the speed and flexibility of selecting and retrieving portions of a complete map for contouring. In practice, bricks overlap one another on one edge to ensure that a continuous surface is generated in contouring.

The QUANTA brick map file can be used to store single values at each point on a 3D grid, a vector at each grid point, or both a number of vectors and single points. In addition, each vector or single grid point can be a single byte or a integer*4 value. How you choose which sized value to use depends on the balance between dynamic range and disk space. Within a file, the type of data used must be the same. QUANTA contains facilities to recognize and work with any of these combinations of data types.

The header of the brick map file contains all the information about the contents of the file. QUANTA reads brick map files generated by earlier versions of the program.

Brick map files are direct-access files with record lengths of 54 words for byte-type files and 216 words for integer*4-type files. The format is:

line 1: version, ntitle, filetype

version is a character*7 variable. For earlier version of QUANTA, this should be mbk_1.0; for QUANTA 2000 and above it should be mbk_2.0.

ntitle is the number of title records to follow (integer*4)

filetype (integer*4) is the type of file: 1 - byte 2 - integer*4

lines 2 to ntitle+1: (title(i),i=1,ntitle)

title lines of char*100 length

There is a limit of 50 title lines. The title contains not only textual information about the file, but also some HEADER records that detail the type of information (single point or vector) held in the file. If no HEADER records are included, QUANTA assumes this is a file containing just a single grid of scalar values.

The following additional HEADER records are only necessary if a vector field display is to be generated. The order of the HEADER records reflects the order of the data in the file. A "V" at position 27 in the HEADER record indicates that the file contains three sets of grid data corresponding to the x, y and z of a vector for the grid points. An "S" at position 27 in the HEADER record indicates a scalar set of grid data.

For example, a brick map file containing the magnitude and direction of an electrostatic field as byte values around a molecule would have the following header information:

mbk_1.0, 3,1

Gives the version number and the number of title lines (3) and indicates a byte map.

HEADERFIELD 	S1

Indicates that the first grid of data will be a single byte scalar value of the electrostatic field. The scale and offset from the main header apply to this grid of data

HEADER VECTOR ORIENTATION V1 1.000000E+01 0.000000E+00 -

Indicates that the next three grids of data in the file will form a vector of bytes with a scale and offset taken from the values given.

Electrostatic Filed map generated on ........

A title line

The final record before the grid data is the main header block of information specifying the various parameters that define the position, scale, grid, and so on of the map as indicated below.

card ntitle+2: nsec, mxyz,nbxyz,nw1,nu1,nu2,nv1,nv2,
cell,rhrms,offset,scale,lenbrk,ncode,rhmin,rhmax 
(int4 or real4 as per first character of name).

where:

nsec - number of sections of density stored in map

mxyz(3) - grid points per unit cell edge in a,b,c

nbxyz(3) - number of bricks in a,b and c in the file

nu1,nv1,nw1 - starting grid point in a,b and c in file

nu2 nv2 - finishing grid point in a and b in file

cell(6) - cell constants

ncode - orthogonalisation code

rhmin,rhmax,rhrms - minimum, maximum and mean value

offset, scale - the offset and scale to convert from the value in the bricked map to the value in the original map.

lenbrk - length of the bricks in the map (usually 6)

card ntitle+3 - end
bricks of density written as
for brick i,j,k
write(n,rec=irecno)brick written along a fast, b medium, c low
where:
irecno = (k-1)*nbxyz(2)*nbxyz(1)+(j-1)*nbxyz(2)*nbxyz(1)+ 2+ntitle

The cell constants can be interpreted in one of two ways, depending on the value of the orthogonalization code, ncode. If ncode is between 1 and 6, then the grid is in a standard crystallographic fractional coordinate system with the cell constants, specified as a,b,c, alpha, beta, gamma, defining the transformation of orthogonal angstroms. This requires that one of the grid points falls on the origin. If the ncode is 0, then the cell constants define the origin and extent of the grid of points, specified as origin(x), origin(y), origin(z), extent(x), extent(y), extent(z), in orthogonal angstroms. This allows a grid that does not fall on the origin to be stored.

Special value

If a grid point in an integer*4 brick map has a value of 32766 then it will be ignored in the contouring within QUANTA. This is useful for masking certain parts of a grid, without getting a contour at this boundary.

QUANTA plot file

The plot file (.qpt) in QUANTA is a binary file with each record written as a, x, y, z (i.e., 4 real numbers). The command number is represented by "a;" x, y, and z are parameters for the command:

a = 1 change to color (or pen) x, line width y
a = 2 move to x y z
a = 3 line to x y z
a = 4 dot at x y z
a = 5 draw x characters followed by a record of string (1: x)
a = 6 character scale x y z (interpretation of this in the program

a = 7 define patterned line where:

z is length of pattern
y is space length
x is dash lengths

a = 8 new frame
a = 9 rotate everything by x degrees
a = 10 set the units for the plot
a = 11plot a symbol as char(x) at the current point
a = 12 delete this
a = 13 set the plotter limits to x,y in physical device
coordinates
a = 14 set the plotting window to x,y
a = 15 flag to specify stereo plotting (x is the stereo angle to
use) where the stereo is created by the plotting program
a = 16 two records to specify an rgb value for a color number
the first record specifies the color number (x) the second
record specifies r g b as x, y, z
a = 17 two records to specify a filled rectangle the first record
contains the bottom left corner the second record contains
the top right corner

Not all the commands are currently used by QUANTA and the interpretation of many of them depends on your plotter and the program you use to drive it.

ChemNote data files

The following data files are used by ChemNote, the Sequence Builder, and the Molecular Editor applications.

1. $QNT_CHEM/quanta.tpl

This is a template file for the ChemNote to CHARMm conversion mode.

2. $QNT_CHEM/chrmtype.typ and $QNT_CHEM/
chrmpost.typ

These files define the CHARMm atom types.

3. $QNT_SEQ/peptide.bck

This file contains values for backbone structures of polypeptides. This file is read by the Sequence Builder. Changes are made by editing the file directory.

Conformation of a molecule is specified by indicating values for dihedral angles. Backbone structure often extends over several residues. Commonly used structures are defined in this file.

The format of the file, and instructions for modifying the information contained in the file are given in the file.

4. $QNT_SEQ/peptide.nom

This file contains the shorthand names for dihedral angles and is read by the Sequence Builder. Changes are made by editing the file directly.

Conformation of a molecule is specified by indicating values for dihedral angles. Dihedral angles are identified by the four atoms involved in the angle. However, there are many shorthand names for the important angles that make dihedral identification easier. Phi, psi, and omega are some of these shorthand names in a polypeptide backbone. Because there is more than one convention for naming dihedral angles and the shorthand names may change from residue to residue, the Sequence Builder must have a flexible way of giving dihedral angles shorthand names.

The format of the file, and instructions for modifying the information contained in the file are given in the file.

5. $QNT_SEQ/sequence.tpl

This template is used by the Sequence Builder to create a command input file for residue sequences.

6. $QNT_SEQ/seq_menu.aud

This file contains the menu and dialog box information for the Sequence Builder.

Atom type files for ChemNote and the Molecule Editor

The files chrmtype.typ and chrmpost.typ contain the definitions used to assign atom types to atoms in ChemNote and the QUANTA Molecule Editor. The files contain several rules, each associating a pattern with an atom type. If the atoms and bonds around a specific atom match the pattern in a rule, then the atom is assigned the rule's atom type.

The format of both files is the same, However, the files vary in usage. The file chrmtype.typ is applied first to obtain most of the basic typing. Highly complicated systems such as some heterocycles and conjugated systems must have their typing refined. This is the function of the file chrmpost.typ. Both files are applied to all molecules but molecules with simple typing are not affected by the rules contained in chrmpost.typ.

In the atom type files, lines starting with `*' must appear exactly as illustrated. Lines starting with `!' are comments which are ignored by the program. The file is divided into the following sections:

1. The standard file header

The program checks the format version number against the expected number and issues an error message if the numbers do not match and the file is not read. The file update version number, which represents the last date that the file was changed by Accelrys, should not be altered.

2. The number of rules in the file

The format of this line is "P [#]", where [#] represents the number of rules defined in the file. In the example, the full file must define 298 rules. If you add or remove rules, adjust this number accordingly. There is no predefined array limit on the number of type rules that can be added, but each rule takes up some memory. If too many rules are added, typing may become slower, or in extreme cases even generate "out of memory" errors.

3. The rules

There must be exactly as many rules as indicated by the number on the `P' line mentioned above. It is a good idea to try to illustrate the pattern being defined in the rule using comment lines before each rule, as shown in the example. Most rules have such illustrations, and you are encouraged to keep the pictures up-to-date if you change or add rules.

A rule begins with a line containing "T [#]", where [#] states the number of subsequent lines which make up the rule. Each line of the rule after the "T" defines an atom in the pattern, so [#] also represents the number of atoms that the pattern matches. In the first rule in the example, the line is "T 4", and the rule contains four subsequent lines.

The first atom in the rule is special, since it is the atom whose type will be assigned when the pattern is found to match a part of the structure. The next few lines describe the atoms directly connected to the first atom, subsequent lines define atoms connected to these atoms, and so on. There is no real limit on how far a rule can extend, but on a practical basis rules rarely travel more than three atoms out.

The format of an atom line is: a b c element

The first field, a, specifies the line number relative to the current atom line where definitions for attached atom begin. For example, in the first atom line in the first rule in the example, a is 1. This means that the next line in the rule begins the definition of atoms attached to the central atom of the rule, in this case a hydrogen. A will always be 1 in the first line of a rule.

The second field, b, specifies the number of atoms connected to the current atom. All connected atoms must be defined in contiguous lines in the rule, starting with the line specified by the first field as described above. If b is positive, then exactly that many atoms must be attached to the atom for it to match the pattern. If b is negative, then there must be at least |b| atoms attached (|b| is the absolute value of b); if there are more the atom will still fit the pattern. If b is zero then the rule does not continue beyond this atom; it doesn't matter how many atoms are connected to it.

The third field, c, has a different meaning for the first atom than it does for subsequent atoms. For the first atom, it specifies the numeric atom type that will be assigned to the atom if the rule matches the atom and its surroundings. This number is associated with the atom type names in the file MASSES.RTF. For all lines after the first, this number represents the bond order for the bond between this atom and the subsequently-defined atom it is connected to. In the second line of the rule in the example, this number is 1, which means there must be a single bond between the N and H. A `?' may be used as a wildcard, in which case it's only important that a bond exists, not what sort of bond. Allowable bond orders are 1, 2, 3, 7, and 12, where 7 and 12 may be used interchangeably to indicate a resonant or aromatic bond.

The fourth field, element, specifies the element each atom must be to match the rule. In all lines after the first, a `?' may be used as a wildcard to match any element. So in the example rule, the first line specifies that the rule matches a hydrogen; the second line matches a nitrogen.

Looking at the first rule, we see that the first line specifies a hydrogen atom; it must be attached to only one atom, whose definition is on the next line. If the rule matches, the hydrogen will be given the atom type 2 (HC). The next line specifies that the attached atom must be a nitrogen, connected to the hydrogen by a single bond; furthermore it must be attached to exactly two other atoms, whose definitions follow. The following lines specify that it doesn't matter what element those two atoms are, just that one of the bonds must be resonant, and one single.

4. Special ring type rules

The above rules do not deal with cyclic systems; some atom types, however, are specific to ring systems. The last section of the file contains rules which assign ring-specific atom types.

The first line of this section has the format "R [#]", where [#] is the number of ring rules that follow. Since each ring rule is a single line, [#] also specifies the number of lines that follow before the end of the file (excluding blank or comment lines).

Each line has the following format:

	type size new_type ring1 ring2 ring3

These rules are applied after the initial typing rules are finished, so they can depend on the atom types the initial rules have assigned.

The first field, type, specifies the numeric atom type an atom must have for QUANTA to attempt to apply the rule to the atom.

The second field, size, specifies that the atom must be a member of a ring of the specified size. If size is negative, it means that the ring must be aromatic as well as having |size| number of atoms. If size is -1, however, it means the rule applies to atoms in conjugated ring systems, and then the fourth, fifth, and optionally sixth fields are used to specify the sizes of the two or three rings the atom must be a member of. If size is zero, it means the atom must be a member of at least one ring, but the number of rings and the ring size is unimportant.

The third field, new_type, specifies the atom type to assign to any atom that matches the pattern described by the current rule.

So the ring rule:

	33 0 32

means that any atom of type NX (33) that appears in a ring should be changed to an NP (32). The more specific rule:

	22 -5 21

means that any C6R atom (22) that is in a 5-member aromatic ring should be changed to C5R (21). Finally, the conjugated ring rule:

	27 -1 26 6 5 0

means that any atom of type CR66 that is a member of both a 5- and 6- membered ring should become CR56.

It is possible to have up to three conjugated rings, so ring sizes should be delimited. If fewer than three rings are used, fill the remaining fields with 0.

5. End of file

The file is terminated by a line containing "* End of File".

Atom typing rules example

The following example represents a portion of the atom typing rule file.

* Polygen Corporation: ChemNote atomtype rules file
* File format version number
86.1124
* File update version number
91.0621
*
! Total number of patterns in the data file.
P 298
! H2-N- H on a charged group - HC
! |r
!

T 4
 1 1 2 H
 1 2 1 N
 0 0 1 ?
 0 0 12 ?
! 
! HC-NC- H on a uncharged guanidinium group - HC
! 
.
.
.
.
T 1
 1 0 176 Re
T 1
 1 0 6 MBe
T 1
 1 0 7 B
!
! Ring cycles
!
.
.
.
.
R 21
 33 0 32
 22 -5 21
 27 5 25
 28 6 29
 28 0 14
 30 6 39 
 23 6 24
 34 6 35
 72 6 73
 52 6 53 
 182 6 181 
 10 3 191 
 10 4 193 
 14 3 190 
 14 4 192 
 32 4 33
 27 -1 26 6 5 0
 182 -1 180 5 6 0
 25 -1 26 5 6 0
 181 -1 180 6 5 0
 195 -1 26 6 5 0
* End of File