Opentopia Directory Encyclopedia Tools

Protein structure

Encyclopedia : P : PR : PRO : Protein structure


Proteins are amino acid chains, made up from 20 different L-α-amino acids, also referred to as residues, that fold into unique three-dimensional protein structures. The shape into a which a protein naturally folds is known as its native state, which is determined by its sequence of amino acids. Below about 40 residues the term peptide is frequently used. A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. Protein sizes range from this lower limit to several thousand residues in multi-functional or structural proteins. However, the current estimate for the average protein length is around 300 residues. Very large aggregates can be formed from protein subunits, for example many thousand actin molecules assemble into an actin filament. Large protein complexes with RNA are found in the ribosome particles, which are in fact 'ribozymes'.

Protein structure, from primary to quaternary structure.
Enlarge
Protein structure, from primary to quaternary structure.

Biochemists refer to four distinct aspects of a protein's structure:

In addition to these levels of structure, proteins may shift between several similar structures in performing of their biological function. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as "conformations," and transitions between them are called conformational changes.

The primary structure is held together by covalent peptide bonds, which are made during the process of translation. The secondary structures are held together by hydrogen bonds. The tertiary structure is held together primarily by hydrophobic interactions but hydrogen bonds, ionic interactions, and disulfide bonds are usually involved too.

The two ends of the amino acid chain are referred to as the carboxy terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity.

Amino acid structure

The basic structure of an α-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Cα-atom has 4 different ligands (the H is omitted in the drawing) and is thus chiral. An easy trick to remember the correct L-form is the CORN-rule: when the Cα-atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction. 
Basic structure of amino acid
Basic structure of amino acid

CO-R-N rule
CO-R-N rule

The different side chains R determine the chemical properties of the amino acid or residue (the residue is the amino acid side chain plus the peptide backbone, see below).  

Name  (Residue) 3-letter
code
Single
code
Relative
abundance
(%) E.C.
MW pK VdW volume
3)
Charged,'''
Polar,
Hydrophobic
Alanine ALA A 13.0 71   67 H
Arginine ARG R 5.3 157 12.5 148 C+
Asparagine ASN N 9.9 114   96 P
Aspartate ASP D 9.9 114 3.9 91 C-
Cysteine CYS C 1.8 103   86 P
Glutamate GLU E 10.8 128 4.3 109 C-
Glutamine GLN Q 10.8 128   114 P
Glycine GLY G 7.8 57   48 -
Histidine HIS H 0.7 137 6.0 118 P,C+
Isoleucine ILE I 4.4 113   124 H
Leucine LEU L 7.8 113   124 H
Lysine LYS K 7.0 129 10.5 135 C+
Methionine MET M 3.8 131   124 H
Phenylalanine PHE F 3.3 147   135 H
Proline PRO P 4.6 97   90 H
Serine SER S 6.0 87   73 P
Threonine THR T 4.6 101   93 P
Tryptophan TRP W 1.0 186   163 P
Tyrosine TYR Y 2.2 163 10.1 141 P
Valine VAL V 6.0 99   105 H

Side chain conformation

The atoms along the side chain are named with Greek letters in Greek alphabetical order: alpha, beta, gamma, delta, epsilon... and so on. Alpha refers to the carbon atom closest to the carbonyl group of that amino acid, beta the second closest and so on. The alpha atom is usually considered a part of the backbone. The dihedral angles around the bonds between these atoms are named chi1, chi2, chi3... E.g. the first and second carbon atom in the side chain of lysine is named alpha and beta and the dihedral angle around the alpha-beta bond is named chi1. Side chains can be in different conformations called gauche(-), trans and gauche(+). Side chains generally tend to try to come into a staggered conformation around chi2.

The polypeptide chain

Two amino acids
Two amino acids

Bond angles for ψ and ω
Bond angles for ψ and ω

Two amino acids are combined in a condensation reaction. Notice that the peptide bond is in fact planar due to the delocalization of the electrons. The sequence of the different amino acids is considered the primary structure of the peptide or protein. Counting of residues always starts at the N-terminal end (NH2-group).

In contrast to the rather rigid peptide bond angle ω(the bond between C1 and N) (always close to 180 degrees), the dihedral angles phi φ (the bond between N and Cα) and psi ψ (the bond between Cα and C1) can have a certain range of possible values. These angles are the degrees of freedom of a protein, they control the protein's three dimensional structure. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a Ramachandran plot. A few important bond lengths are given in the table below.

Peptide bond Average length Single bond Average length Hydrogen bond Average (±30)
Ca - C 153 pm C - C 154 pm O-H --- O-H 280 pm
C - N 133 pm C - N 148 pm N-H --- O=C 290 pm
N - Ca 146 pm C - O 143 pm O-H --- O=C 280 pm

Secondary structure elements

The polypeptide chain of a protein seldom forms just a random coil. Remember that proteins have either a chemical (enzymes) or structural function to fulfil. High specificity requires an intricate arrangement of 3-dimensional interactions and therefore a defined conformation of the polypeptide chain. In fact, some neurodegenerative diseases like Huntington's may be related to random coil formation in certain proteins. The two most common secondary structure arrangements are the right-handed alpha helix and the beta sheet, which can be connected into a larger tertiary structure (or fold) by turns and loops of a variety of types. These two secondary structure elements satisfy a strong hydrogen bond network within the geometric constraints of the bond angles ω, ψ, and φ. The β-sheets can be formed by parallel or, most common, antiparallel arrangement of individual β-strands.

Only the atoms of the backbone are involved in secondary structure, not the amino acid side chains ("R groups").

The left panel shows the hydrogen bonding in an actual α-helix backbone. Note that the nth residue O (Lys 153) bonds to the (n+4)th following residue's N (Arg 147). The actual values of some displayed H-bond distances give you some idea about the variations to expect within a helix. The center panel includes the side chains which were omitted in the left panel for clarity. You see the side chains pointing towards the N-terminal of the chain (lower residue numbers) and thus it is usually possible to determine the direction of the helix quite well during initial model building. A very nice 0.2 nm electron density is shown in the right panel
Enlarge
The left panel shows the hydrogen bonding in an actual α-helix backbone. Note that the nth residue O (Lys 153) bonds to the (n+4)th following residue's N (Arg 147). The actual values of some displayed H-bond distances give you some idea about the variations to expect within a helix. The center panel includes the side chains which were omitted in the left panel for clarity. You see the side chains pointing towards the N-terminal of the chain (lower residue numbers) and thus it is usually possible to determine the direction of the helix quite well during initial model building. A very nice 0.2 nm electron density is shown in the right panel

Here are some more representation of the same helix.
Ball and stick model
Ball and stick model

Backbone
Backbone

Secondary structure cartoon (linguini diagram)
Secondary structure cartoon (linguini diagram)

The hydrogen bond network in a 2-stranded, antiparallel β-sheet. The side chains are sticking out above or below the plane of the picture. It less clear cut than in the case of the helix, in which direction to initially trace a beta sheet strand. The beta sheet can be infinitely extended due to the repeatable H-bonding pattern to either side of a strand.
The hydrogen bond network in a 2-stranded, antiparallel β-sheet. The side chains are sticking out above or below the plane of the picture. It less clear cut than in the case of the helix, in which direction to initially trace a beta sheet strand. The beta sheet can be infinitely extended due to the repeatable H-bonding pattern to either side of a strand.

The pleated nature of the sheet becomes distinctly visible in the right panels of this figure, showing also the side chains sticking out above and below the sheet plane. If you look carefully, you will also notice that the sheet has a left twist (centre panel).
Enlarge
The pleated nature of the sheet becomes distinctly visible in the right panels of this figure, showing also the side chains sticking out above and below the sheet plane. If you look carefully, you will also notice that the sheet has a left twist (centre panel).

Turns, loops and a few other secondary structure elements such as a 3-10 helix complete the picture. We have now enough pieces to assemble a complete protein, displaying its typical tertiary structure.

Multimeric states

A protein comprised of a single polypeptide is called a monomeric protein. If it is a complex of two or more polypeptides (i.e. multiple subunits), it is called a multimer. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. Multimers made up of identical subunits may be referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits may be referred to with a prefix of "hetero-" (e.g. a heterodimer).

Folds and motifs of protein structure

Despite that there are about 100,000 different proteins expressed in eukaryotic systems, there are much fewer different structural motifs and  folds, partly as a consequence of evolved pathways and mechanisms. Motif in this sense refers to a small specific combination of secondary structural elements (such as helix-turn-helix). These elements are often called supersecondary structures. Fold refers to a global type of arrangement, like helix-bundle or beta-barrel. Structure motifs usually consist of just a few elements, e.g. the 'helix-turn-helix' has just three. Note that while the spatial sequence of elements is the same in all instances of a motif, they may be encoded in any order within the underlying gene. Protein structural motifs often include loops of variable length and unspecified structure, which in effect create the "slack" necessary to bring together in space two elements that are not encoded by immediately adjacent DNA sequences in a gene. Note also that even when two genes encode secondary structural elements of a motif in the same order, nevertheless they may specify somewhat different sequences of amino acids. This is true not only because of the complicated relationship between tertiary and primary structure, but because the size of the elements varies from one protein and the next.

Protein folding

Main article: Protein folding

The process by which the higher structures form is called protein folding and is a consequence of the primary structure. A unique polypeptide may have more than one stable folded conformation, which could have a different biological activity, but usually, only one conformation is considered to be the active, or native conformation.

Structural domain

Main article: Structural domain

Within a protein, a structural domain ("domain") is an element of overall structure that is self-stabilizing and often folds independently of the rest of the protein chain. Many domains are not unique to the protein products of one gene or one gene family but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain of calmodulin. Because they are self-stabilizing, domains can be "swapped" by genetic engineering between one protein and another to make chimeras. A domain may be composed of one, more than one or not any structural motifs.

Structure classification

Several ways have been developed for the structural classification of proteins. These seek to classify the data in the Protein Data Bank in a structured order. Several databases have been made which classifies proteins with different methods. SCOP, CATH and FSSP are the largest ones. The methods used are purely manual, manual and automated, and purely automated. Work is being done to better integrate the current data. The classification is consistent between SCOP, CATH and FSSP for the majority of proteins which have been classified, but there are still some differences and inconsistencies.

Protein structure determination

Around 90% of the protein structures available in the Protein Data Bank have been determined by X-ray crystallography. This method allows the exact 3D coordinates of all the atoms in the protein to be determined to within a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques, which can also be used to determine secondary structure. Note that secondary structure can be determined via other biochemical techniques such a circular dichromism. Secondary structure can also be predicted with a high degree of accuracy (see next section).

A rough guide to the resolution of protein structures
Resolution Meaning
>4.0 Coordinates meaningless
3.0 - 4.0 Fold possibly correct, but errors are very likely. Many sidechains placed with wrong rotamer.
2.5 - 3.0 Fold likely correct except that some surface loops might be mismodelled. Several long, thin sidechains (lys, glu, gln, etc) and small sidechains (ser, val, thr, etc) likely to have wrong rotamers.
2.0 - 2.5 As 2.5 - 3.0, but number of sidechains in wrong rotamer is considerably less. Many small errors can normally be detected. Fold normally correct and number of errors in surface loops is small.
1.5 - 2.0 Few residues have wrong rotamer. Many small errors can normally be detected. Fold always correct, also in surface loops.
0.5 - 1.5 Even at this resolution we can find threonines with the wrong chirality on the C-beta. But in general, structures have only some small errors at this resolution.

Computational prediction of protein structure

The generation of a protein sequence is much simpler than the generation of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been proposed. Ab initio prediction methods use just the sequence of the protein. Threading uses existing protein structures.

Rosetta@home is a distributed computing project which tries to predict the structures of proteins with massive sampling on thousands of home computers.

Software

There are many available software packages, such as free web-based STING, used to visualize and analyze protein structures. Several packages, such as [[Quantum]], used to predict conformational changes of proteins and its influence on protein's functions. Several methods have been develped to compare structures of different proteins. Please see structural alignment


Proteins
Protein biosynthesis | Posttranslational modification | Protein folding | Protein structure | Protein structural domains | Protein targeting | Proteasome | List of proteins | Membrane protein | Globular protein | Fibrous protein | List of types of proteins | Proteome | Protein methods

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: