16. Sequence Database

Overview

This utility provides a group of options to search the Protein Data Bank for sequences that closely match a specific sequence. QUANTA uses the FASTA sequence search algorithm.

Reading a Protein Sequence

Protein User's Reference

Align and Superpose

Appendices: Reading Sequence Formats

Read Sequence/Alignment File

D. J. Lipman and W. R. Pearson, "Rapid and Sensitive Protein Similarity Searches", Science, 227, 1435 (1985).

M. O. Dayhoff, Atlas of Protein Sequence and Structure, (National Biomedical Research Foundation, Silver Spring, Md., 1978), volume 5, supplement 3

FASTA Sequence Searching

The Sequence Database application in QUANTA provides a sequence search option, searching the Protein Data Bank for sequences that closely match a specified sequence. The FASTA sequence search algorithm¹ is used to search protein sequences stored in the file $HYD_LIB:pdbseqence.lib.

When FASTA initially scans the protein sequence library, the target sequence is compared to each library sequence. The best region of homology without gaps is found and an initial score is calculated. If the initial score is greater than the cutoff score, the library sequence is stored for later consideration.

The cutoff score used in the FASTA search is automatically calculated and is dependent on the query sequence. If the reference sequence is short, say less than 28 residues, then it may be necessary to specify a cutoff score since the cutoff generated automatically is usually too severe. When searching for short sequences, use a fairly high cutoff score initially and if this fails to find any matches, use a lower cutoff score.

Database search results are automatically listed to the QUANTA textport and written to the output file .log. The results include:

A histogram indicating the number of database sequences found against their initial score

The names of protein sequences that have the highest scores

Protein sequences showing the alignment between the query sequence and the retrieved sequence

Tools

This utility displays the Sequence Database dialog box. All the options activate the File Librarian.

You are prompted to give a file name for a sequence file and then prompted to enter the sequence as single letter or three letter amino acid code.

Enter a blank line to terminate sequence input. The sequence is written to a FASTA sequence file with file extension .fta. Alternative means to generate the same file are using the Create Sequence option in the Protein Editor or using the Write Sequence File option of the Sequence utility on the Files pulldown. The latter option will write out any currently selected sequence, be it sequence-only data or from an MSF, to a FASTA format file.

You are prompted to select a sequence file and the search job is run. The results are displayed in the textport and also saved to a log file.

This option displays the File Librarian to select a .log file to read. Once the log file is selected, the results are displayed to the textport. To browse the file, use the <Enter> key to display the information to the textport and the slidebar to move up and down.

This option displays the File Librarian to select an alternate sequence database file to use. The default database file used is pdbseqence.lib.

¹D. J. Lipman and W. R. Pearson, Rapid and Sensitive Protein Similarity Searches, Science, 227, 1435 (1985).

16. Sequence Database

Overview

This chapter describes:

For more Information see:

References

FASTA Sequence Searching

Tools

Enter search sequence

Run sequence search

Read sequence search log file

Change sequence database file