GenBank
is the NIH genetic sequence database, an annotated
collection of all publicly available DNA sequences.
A new release is made every two months. GenBank
is part of the International Nucleotide Sequence
Database Collaboration, which is comprised of
the DNA DataBank of Japan (DDBJ), the European
Molecular Biology Laboratory (EMBL), and GenBank
at the National Center for Biotechnology Information.
These three organizations exchange data on a
daily basis.
Each
GenBank entry includes a concise description
of the sequence, the scientific name and taxonomy
of the source organism, and a table of features
that identifies coding regions and other sites
of biological significance, such as transcription
units, sites of mutations or modifications,
and repeats. Protein translations for coding
regions are included in the feature table. Bibliographic
references are included along with a link to
the Medline unique identifier for all published
sequences.
Most
sequence analysis programs on PSC supercomputers
are capable of reading in GenBank data in the
GenBank flat file format. The location of the
data in the flat file format is built into the
MAKSEQ program. However, if you find it neccesary
to view the GenBank files in the flat file format,
they can be found in the AFS directory /afs/psc/biomed/db/genbank