A. Conversion of External Sequence Data Files to QUANTA Format

Data generated outside QUANTA can be read into QUANTA and displayed in the Sequence Viewer as graphs or by coloring sequences according to the input data. The import utility Read Sequence Data File is a pullright from Sequences on the Files pulldown. This utility will currently only read data which is in a QUANTA sequence data format which is described below. A demonstration of reading and displaying data from PHD, the EMBL secondary structure prediction server, is described in Chapter 2..

A demo jiffy program which converts data from the PHD format to QUANTA format can be built from two source files:

$QNT_ROOT/user_group_files/sequence_data/phd_quanta.f
$QNT_ROOT/user_group_files/sequence_data/sequparam.inc

To build the program, copy these files to your area and enter the following command:

> f77 -o phd_quanta phd_quanta.f 

The resultant executable program, phd_quanta, should read the demonstration PHD output file:

$QNT_ROOT/user_group_files/sequence_data/dfr.phd 

and generate a QUANTA data file identical to:

$QNT_ROOT/user_group_files/sequence_data/dfr.sqdat.

Two subroutines in the demo program may be useful for anyone writing their own conversion jiffy: writetitle which writes the title line for the file writerealdata, which writes out one dataset in the appropriate format


Sequence Data File Format

QUANTA reads the ascii file in free format so the exact formatting of each line is flexible. The binary file is defined the same as the ascii file but without the formatting.

The file has two header lines which are defined in FORTRAN:

character*30 title      !  title for data file
integer n_data_sets ! number of datasets in file
integer max_data_length ! maximum number of elements in a dataset

write(*,fmt='(a30)')title
write(*,fmt='(i7,1x,i7)')n_data_sets,max_data_length

and then for each of the n_data_sets data sets:

       character*6 type          ! data type (current only `REAL' supported)
       character*30 label ! a label for the dataset
       character*30 seq_name ! the name of the sequence to which data should
c be attached (can be left blank)
       integer color ! the color for graph display
       integer visibility ! if >0 then graph should be visible by default
       integer data_length ! number of elements in data list
       real data(max_data_length)! the data
c
       type='*REAL'
c
       write(*,fmt='(a6,a30)')type,label
       write(*,fmt='(a30,1x,2(i4,1x),i7)')seq_name,color,appear,data_length
       write(*,fmt='(10f8.3)')(data(i),i=1,data_length)


Sequence User Color File Format

When a user data file is read into QUANTA and you opt to color sequences according to the data in that file then the file seq_user_color.dat is read. QUANTA searches for this file in the standard search path (the current user's working directory, your library directory (defined by environment variable $QNT_USR), or the QUANTA library directory (defined by environment variable $HYD_LIB)).

Where the data from the sequence data file is used to color a sequence it is necessary to define the relationship between the data value and the color that the residues are drawn. A simple example might be that all residue with data values less than zero are colored blue and all residues with data values above zero are colored red. Each color scheme in the user color file has a label which should match the label(s) of dataset(s) in the user data file.

The demonstration user color file ($QNT_ROOT/user_group_files/sequence_data/seq_user_color.dat) defines two coloring schemes for SECSTR, the secondary structure, and ACCESS, the predicted residue solvent accessibility.

For each coloring scheme there is a label line which begins with an asterisk (*) and can be defined as follows:

        character*30 label      ! color scheme label
        character*2  operation  ! the  operation to apply in coloring
        integer n_colors        ! number of colors in scheme
        real data_limit(*)      ! the data cutoff value for color
        integer color(*)        ! the color

        write(*,fmt='(a30,1x,a2)')label,operation
        do 10 n=1,n_colors
        write(*,fmt='(f8.3,1x,i3)')data_limit(n),color(n)
 10     continue

The parameter operation can be either LT or GT, that is "less than" or "greater than". If, for example the operation is LT, then the color scheme will be interpreted as:

"If the data value is less than data_limit(n) then color the residue color(n)."


© 2006 Accelrys Software Inc.