Protein Data Bank
Encyclopedia : P : PR : PRO : Protein Data Bank
The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. This data, typically obtained by X-ray crystallography or NMR spectroscopy, is submitted by biologists and biochemists from around the world, is released into the public domain, and can be accessed for free. The database is the central repository for biological structural data.
History
Founded in 1971 by Brookhaven National Laboratory, the Protein Data Bank was transferred in 1998 to the Research Collaboratory for Structural Bioinformatics (RCSB), which is composed of Rutgers University, the University of Wisconsin-Madison, NIST and the San Diego Supercomputer Center. Funding comes from the National Science Foundation, Department of Energy, National Library of Medicine and the National Institute of General Medical Sciences. The European Bioinformatics Institute in the UK and the Institute for Protein Research in Japan also collect, process and submit data files.In 2003, the [Worldwide Protein Data Bank]was formed, consisting of three member organizations that act as deposition, data processing and distribution centers for PDB data. The founding members are [RCSB PDB (USA)], [MSD-EBI (Europe)] and [PDBj (Japan)]. The mission of the wwPDB is to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community.
The PDB is a key resource in structural biology and is critical to more recent work in structural genomics.
Countless derived databases and projects have been developed to integrate and classify the PDB in terms of protein structure, protein function and protein evolution.
Growth
When the PDB was originally founded it contained just 7 protein structures. Since then it has undergone an approximate exponential growth in the number of structures, which does not show any sign of falling off.The growth rate of the PDB has been the subject of fairly extensive analysis.
Contents
As of 20 June, 2006, the database contained 37,269 released atomic coordinate entries (or "structures"), 34,109 of that proteins, the rest being nucleic acids, nucleic acid-protein complexes, and a few other molecules. About 5,000 new structures are released each year. Data are stored in the mmCIF format specifically developed for the purpose.Note that the database stores information about the exact location of all atoms in a large biomolecule; if one is only interested in sequence data, i.e. the list of amino acids making up a particular protein or the list of nucleotides making up a particular nucleic acid, the much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration should be used.
Statistics
As of 20 June, 2006, the "PDB Holdings List" at [RCSB] reported the following statistics:| Proteins | Nucleic Acids | Protein/NA complexes | Other | Total | |
|---|---|---|---|---|---|
| X-ray diffraction | 29258 | 902 | 1353 | 28 | 31541 |
| NMR | 4690 | 705 | 121 | 6 | 5522 |
| Electron microscopy | 88 | 9 | 29 | 0 | 126 |
| Other | 73 | 4 | 3 | 0 | 80 |
| Total | 34109 | 1620 | 1506 | 34 | 37269 |
File format
Through the years the PDB file format has undergone many, many changes and revisions. Its original format was dictated by the width of computer punch cards.- [PDB Format Guide - Prepared by the PDB Staff at BNL] The PDB format specification can be found here, and it is vital that you read this before looking at the raw data.
- Recently PDB provides a representation of PDB data in XML format, [PDBML] format.
- [ftp.rcsb.org] The raw data can be downloaded from here.
- [www.rcsb.org] Statistics about the PDB can be found here.
- [The Molecular Modeling DataBase (MMDB)] from NCBI
- [The Macromolecular Structure Database] from the European Bioinformatics Institute
- [The Data Uniformity Project] from PDB
Some people would say that this is a Good Thing; others would argue that, without a universal repository of information (i.e., a common dictionary), how can we talk about the same thing.
Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. This should not be used as an identifier for biomolecules, since often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs.
If a biologist submits structure data for a protein or nucleic acid, PDB staff reviews and annotates it. The data are then automatically checked for plausibility. The source code for this validation software has been released for free. The main data base accepts only experimentally derived structures, and not theoretically predicted ones (see protein structure prediction).
Various funding agencies and scientific journals now require scientists to submit their structure data to PDB.
Viewing the data
The structural data can be used to visualize the biomolecules with appropriate software, such as RasMol, Jmol, chime, web browser VRML plugin or any web-based software designed to visualize and analyse the protein structures such as STING. The PDB website also contains resources for education, structural genomics, and related software.Links to
- [link] The best mapping is provided by Kim Henrick's group at EBI.
- [link] PDB provide a mapping on their beta site, but it is at the whole PDB level not chain level.
- [link] Search at BRENDA enzyme database portal.
- [link] PDBSProtEC:
References
Printed
- Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535-542. PMID 875032.
- Sussman, JL, Lin, D, Jiang, J, Manning, NO, Prilusky, J, Ritter, O & Abola, EE. Protein data bank (PDB): a database of 3D structural information of biological macromolecules. Acta Cryst 1998; D54:1078-1084. PMID 10089483.
Online
- [Protein Data Bank] - home page
- [Banking on structures] An overview article by Tracy Smith Schmidt (PDF link).
- [The Protein Data Bank] - A very extensive and highly cited paper on PDB by Berman et al. PMID 10592235
Other external links
- [ExPASy - Swiss-Prot and TrEMBL]
- [DNA Sequence Collaborator's Page] International Nucleotide Sequence Database Collaboration
Molecular Graphic Visualisation Tools
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
