Chapters 4-6 Peptide Sequencing and Synthesis, Protein Structure 1. Protein Sequencing 2. Peptide Synthesis 3. 3D Structure Determination 4. Protein Folding by Chaperones
1) Protein Sequencing Lernziele: 1) Verstehen der Schritte, welche zu Sequenzierung eines Proteins notwendig sind, inklusive N-terminale Analyse, Spaltung der Disulfidbrücken, Kettenfragmentierung, Edman Abbau (oder Massenspektrometrie), und Rekonstitution der Sequenz 2) Verständnis der Wichtigkeit von Proteindatenbanken
Why? Protein Sequencing 1) Sequence -> understand function, determine structure of protein 2) Sequence comparison of related proteins 3) Diagnostic test for inheritead diseases caused by mutations in a given protein 1953, first sequence of bovine insulin, took 10 years, 100g protein (Nobel Prize: F. Sanger) Principle: Need a pure sample of your protein, break down protein into pieces that can be sequenced by a chemical method See guided exploration 4
Ways to get information about the sequence of a proteins 1) Clone/Sequence the corresponding DNA/cDNA 2) Identify and/or sequence the protein by mass spectrometry 3) Perform classical amino acid sequencening of the purified protein! If you want to know about pos-translational modifications, or suspect RNA editing, you need to purify and sequence the protein of interest (ApoB-48 / ApoB-100)
Quantification of proteins To purify and subsequently analyze/sequence the protein you need to be able to see it somehow, to follow its enrichment during different purification steps Assay: - monitor catalytic activity of your protein - use of an antibody against your protein (ELISA)
ELISA enzyme-linked immunosorbant assay Bsp. Schwangerschaftstest basierend auf ELISA detektieren Chorion Gonadotropin im Urin Animated Figure
Protein Reinigung ist ein mehrstufiger Prozess Prinzip: Die einmaligen physikochemischen Eigenschaften des gewünschten Proteins werden ausgenutzt, um es von allen anderen Proteinen schrittweise abzutrennen
Protein Sequencing: The steps 1) Break disulfide bridges and prevent their reformation 2) Separate and purify individual chains of the protein complex, if there are more than one 3) Determine their amino acid composition 4) Determine the N-terminal amino acid of each of the subunits 5) Break each chain into fragments, about 50 As long 6) Seperate the fragments from each other 7) Determine the sequenece of each fragment 8) Repeat with different cleavage 9) Reconstruct the sequence of the overall protein
Schritte zur Primärstruktur/Sequenz 1. Vorbereitungen zur Sequenzierung Bestimmung der Anzahl unterschiedlicher Untereinheiten Spaltung der Disulfidbrücken Reinigung der Untereinheiten Bestimmung der AS Zusammensetzung der Ue. 2. Sequenzierung der Ue. Fragmentierung der Ue. in Peptide, welche durchsequenziert werden können Trennung und Reinigung der Fragmente Bestimmung der Sequenz der Fragmente Wiederholung mit unterschiedlicher Fragmentierung, um die Fragmente aufgrund ihrer Überlappungen ordnen zu können 3. Aufbau der kompletten Struktur Anordnung überlappender Fragmente Bestimmung der Inter- und Intra-Ketten Disulfidbrücken
Overview of protein sequencing
A) First step: separate subunits How many subunits is my protein composed of? N-terminal amino acid analysis reveals the number of different types of subunits: Modify N-terminal AA with dansyl chloride, hydrolyze all peptide bonds, determine the nature of the dansyl-modified AA Or performe 1 cycle of Edman degradation If I get Gly and Phe, I know that my protein has at least two nonidentical subunits
Analyse von AS mittels HPLC HPLC = High Performance Liquid Chromatography Um AS voneinander aufzutrennen, meist mittels reverse phase (hydrophobes Säulenmaterial) Dedektierung der AS meist nach Derivatisierung zb mittels O-Phthalaldehyde (-> fluorescent), oder (UVvis, Radioactiv, etc)
Disulfide bonds between and within polypeptides are cleaved by 2- mercaptoethanol
Alkylierung der Cysteine mit Iodessigsäure
B) The polypeptide chains are then cleaved Sequence determination has a size-limit of about 40 AA Peptides longer than 40 to 100 AA need be shortened: cleaved into pieces, fragments that can be subject to chemical sequencing Chemical or Enzymatic cleavage of proteins by: endopeptidases, exopeptidases, CNBr Trypsin as example for an endopeptidase that cleaves after Arg or Lys
Typsin cleavage
Cyanogen bromide (CNBr) cleavage Chemical cleavage after Met
C) Edman degradation removes a peptide s first amino acid residue Stepwise remove one amino acid after the other in a cyclic chemical process from the N-terminus of a purified peptide: 1 = Ala, 2 = Gly, 3 = Phe (sequence) Edmans reagents: PITC, phenylisothiocyanate => phenylthiocarbamyl adduct, PTC Hydrolyze with anhydrous trifluoroacetic acid => thiazolinone derivative Determine nature of phenyltiohydantoin, PTH-amino acid after each cycle Automated process, up to 50 Aa in a run
-> identify after each cycle ON SOLID SUPPORT -> automated Animated Figure
Generating overlapping fragments to determine the amino acid sequence of a polypeptide Animated Figure
Determining the positions of disulfide bonds
Reconstructed protein sequences are stored in databases Sequencing of overlapping peptides for reconstruction of the entire protein sequence Determine position of disulfide bridges Record sequence in public database, UniProt, Swissprot
D) Protein sequencing by mass spectrometry Mass spec measures mass-to-charge ration (m/z) of ions in the gase phase + charge due to Lys or Arg Ile = leu (isomeric) Gln-Lys, =0.036Da Edman chemistry does not work if the N- terminus is blocked/modified (N-formyl Met) removed by aminopeptidase Nobel Prize: 2002 John Bennett Fenn Kogan Lecture on ibioseminars (30 min)
Electrospray ionization mass spectrometry (ESI-MS) Massenspektrometrie trennt Ionen aufgrund ihres Masse-zu- Ladungs-Verhältnisses (m/z) Positive Ladungen in Proteinen durch Lys oder Arg MALDI-TOF <-> ESI-MS/MS
Ionisation by MALDI and FAB
The ESI-MS spectrum of the 16,951-D horse heart protein apomyoglobin. P 1 = (M+z 1 ):z 1 P 2 = (M+z 1-1):(z 1-1)
Sample calculation
Peptide sequencing by tandem MS
The tandem mass spectrum of the doubly charged ion of the 14-residue human [Glu 1 ]fibrinopeptide B (m/z = 786)
Two-dimensional electrophoresis o To resolve complex protein mixtures o First dimension: isoelectric focusing (IEF), separation of proteins according to their pi in a stable ph gradient o Second dimension: SDS-PAGE (separation acc. to molecular mass) o Can resolve up to 5 000 protein spots, i.e. E.coli or yeast proteome
Zwei-dimensionale Gelelektrophorese Isoelektrische Fokusierung (IEF) gepaart mit SDS-PAGE => Proteomics: R. Aebersold (ETH-Z,) M. Mann (MPI, Munich)
2) Protein Synthese Lernziele: 1) Verstehen der Schritte, welche zur Synthese von Peptiden und Proteinen verwendet werden
5. Polypeptidsynthese o Chemische Synthese von Peptiden aus AS: zu Struktur - Funktionsbeziehungen Einbau von nicht-standard AS Pharmakologie, bsp Insulin o Erst: homopolypeptide, Bsp. Polyglycin, Polyserin o 1953: Oxytocin (Nonapeptid), biologisch aktives Neuropeptid, Kontraktion des Uterus
Chemische Synthese von (Poly-) Peptiden o Prinzip: Addition einer Boc-geschützten AS an den N-Terminus der wachsenden Kette, C->N Wachstum
Merrifield: Festkörperphasensynthese o Kopplung der wachsenden Kette (C->N) an eine feste Matrix (Polystren) o Automatisierte Peptide Synthesizer o Problem, Ausbeute: 98% Ausbeute pro Schritt, 100AS -> 200 Schritte -> 0.98 200 100 = 2% Endausbeute! o Nobel Prize 1984
Kopplungschemie der ersten AS an die Matrix 1. Kopplung 2. Entschützen
DCCD dient der Aktivierung und Kondensation der AS
Beispiele von geschützten Seitenketten
Freisetzung
Native chemical ligation o Length limit ~60 Aa o Native chemical ligation to produce longer peptides
Problems
3) Protein Struklturbestemmung Lernziele: 1) Verstehen wie Röntgenkristallanalyse und NMR eingesetzt werden um die Raumstruktur von Proteinen zu bestimmen
Die Tertiärstruktur von Proteinen Die Tertiärstruktur beschreibt die Faltung der Sekundärstrukturelemente (α-helix, β-faltblatt und Turns) und spezifiziert die Raumkoordinaten eines jeden Atoms im Protein = 3D Struktur Diese Information wird in speziellen Struktur Datenbanken deponiert (protein structure database, pdb) Experimentell kann die Tertiärstruktur durch zwei Techniken bestimmt werden: Röntgenstrukturanalyse von Protein Kristallen oder NMR von Proteinen in Lösung
Protein Kristalle X-ray Diffraktionsmuster
A) Most proteins structures are determined by X- ray crystallography or nuclear magnetic resonance X-Ray crystallography: X-ray wave length is short, ~1.5 Å (Synchrotron), equivalent to distance of atoms (visible light 4000 Å), movie Crystal: repetitive arrangement of the same structure => diffraction pattern (darkness of spot is function electron density in the crystal) - X-ray interact with electrons (not with nuclei) -> X-ray structure is thus an electron density map of a given protein -> represents contours of atoms
A thin section through a 1.5 Å resolution electron density map of a protein that is contoured in three dimensions video
Most protein crystal structures exhibit less than atomic resolution Crystal is build up by repeating units, containing protein in native conformation, - highly hydrated (40-60% water) - soft, jellylike consistency, unlike NaCl crystals - molecules are slightly disordered and display Brownian motion -> this determines the resolution limit of a given protein crystal (typical 1.5 3 Å) - inability to crystallize a protein to form crystals of sufficiently high resolution is a major limiting factor in structure determination
Electron density maps of diketopiperazine at different resolution levels - Electron density map alone is not sufficient to determine the structure if the protein, - Amino acid sequence is also required - Computerized fitting algorithm of atoms into the experimentally determined electron density map results in protein structure determination of up to 0.1 Å resolution
Most crystallized proteins maintain their native conformations Key question: does the structure of protein in a crystal accurately reflect the structure of the protein in solution, where it normally functions? 1. The protein in the crystal is hydrated like it is in solution 2. X-ray structure is similar to NMR structure, which is determined from proteins that are in solution 3. Many enzymes remain catalytically active in the crystal
Protein structures can be determined by NMR Nuclear magnetic resonance, NMR, an atom nucleus resonates if a magnetic field is applied. This resonance is sensitive to the electronic environment of the nucleus and its interaction with nearby nuclei - Developed since 1980, R. Ernst & Kurt Wüthrich (ETH- Z), to determine protein structures in solution - Because there are many nuclei in a protein that would crowd in a conventional one-dimensional NMR -> twodimensional (2D) NMR was developed to measure atomic distances of chemically linked atoms (COSY) or of spatially close atoms (NOESY) - Size limit of about 40 kd, may reach 100 kd soon - Dynamic, can follow protein motion or folding
D) Die Struktur ist stärker konserviert als die Sequenz - Strukturen können in Familien zusammengefasst werden - 50 000 Strukturen definieren 1 400 Familien von Protein- Domänen, davon sind 200 sehr häufig - Vergleich von Cytochrom-C
E) Structural bioinformatics provides tools for storing, visualizing, and comparing protein structural information Structural data obtained by X-ray or NMR describing the room coordinates of atoms is deposited into database, similar to sequence information of DNA or proteins Bioinformatics, structural bioinformatics takes advantage of this information to address biological questions Major structural database: Protein Data Bank (PDB), each structure is assigned a unique identifier (PDBid), i.e., sperm whale myoglobin is 1MBO
Molecular graphics program interactively show macromolecules in three dimensions - Jmol is a Web browser-based application that allows you to directly visualized structures in PDB on your PC - Example potassium channel (KCSA) http://www.pdb.org/pdb/explore/jmol.do? structureid=1f6g - Swiss PDB viewer = Deep View allows protein modeling and superimposition of two structures
Structure comparisons reveal evolutionary relationships - Since evolution tends to conserve structure rather than sequence, programs have been developed to search for structurally related protein - CATH, classifies proteins in a four level hierarchy: 1. Class: mainly α / β structure 2. Architecture: gross arrangement of secondary structure 3. Topology: shape of protein domains and interconnectivity 4. Homologous superfamily, group of common ancestor - CE (combinatorial extension of the optimal path) finds all proteins in PDB that can be structurally aligned with the query structure - FSSP (Family of Structurally Similar Proteins) - SCOP (Structural Classification of Proteins) - VAST (Vector Alignment Search Tool)
4) Protein Faltung mit Hilfe von Chaperons Lernziele: 1) Die Funktion der Protein-Disulfid-Isomerase zu verstehen 2) Verstehen wie Proteine zur korrekten Faltung von misgefalteten oder denaturierten Proteinen beitragen; deren korrekte Faltung katalysieren
Protein Folding - Protein folding is directed largely by residues that occupy the interior of the folded protein - How does a protein fold into its three dimensional structure - Does not occur through sampling of all possible conformations! This would take longer than the universe exists (n residues -> 2n torsion angles, each has 3 stable conformations ->3 2n = ~10 n possible conformations, 10-13 sec for each conformation -> t = 10 n /10 13 => for n=100 residues t = 10 87 sec, 20 Mia years = 6 10 17 sec)
A) Protein Faltung ist geordnet - Faltung geschieht in wenigen Sekunden - Faltung ist nicht zufällig, sondern folgt einem vorgegebenen Weg - Lokale Ausbildung der Sekundärstrukturelemente - gefolgt von einem hydrophoben Kollaps (molten globule) - Faltungsprozess is kooperativ (dh. alles oder nichts)
Energy-Entropy Diagramm der Protein Faltung Faltungs-Trichter Temporäre Faltungs- Fallen Lokale Energie- Minima
Protein structure prediction and protein design - Sequence of 1 Mio proteins is know, but structure has been determined for only 50 000 - How is the structure encoded in the primary sequence? -> ab initio prediction of structure - Homology modeling of new sequence against existing structure - Structural genomics, determine X-ray structure for all the representative protein domains in a genome - Chou & Fasman predictions does not take into account the influence of the neighboring residues
Protein structure prediction and protein design - Protein design, inverse of structural prediction - Design an amino acid sequence that will form the target structure or even target function - 28 residue peptide that forms ββα structure
Protein disulfide isomerase acts during protein folding - Proteins fold more slowly in vitro (im Reagenzglas) than they fold in vivo (in der Zelle) - This is frequently due to the formation of nonnative disulfide bridges which are then slowly exchanged to the native ones - In vivo, disulfide bond formation is catalyzed by and enzyme: protein disulfide isomerase (PDI) - PDI binds a variety of unfolded proteins via a hydrophobic patch to form a mixed disulfide
Mechanism of protein disulfide isomerase Findet im Lumen des Endoplasmatischen Retikulums statt!!! (und nicht im Zytosol)
Mechanism of protein disulfide isomerase
B) Molecular chaperones assist protein folding - Proteins begin to fold as they are synthesized and grow on the ribosome - In vivo, a peptide chain folds in the presence of a very high concentration of other proteins - Molecular chaperones are essential proteins that help to fold newly synthesized or partially unfolded proteins to re-fold correctly - Many molecular chaperones were first described as heat shock proteins (Hsp), their expression is strongly induced upon heat treatment of cells
Chaperone activity requires ATP Classes of molecular chaperones in prokaryotes and eukaryotes 1. Hsp70 family, function as monomers with the cochaperone Hsp40, folds newly made proteins 2. Chaperonins, large multisubunit proteins (see below) 3. Hsp90 proteins, folding of proteins in signal transduction such as steroid receptors 4. Trigger factor, associate with ribosome and prevent improper folding of newly made proteins All of them operate by binding to solvent-exposed hydrophobic surfaces and subsequent release All are ATPases
The GroEL/ES chaperonin forms closed chambers in which proteins fold The chaperonins in E. coli consist of two types of subunits, GroEL and GroES Structure: 14 identical 549-residue GroEL subunits arranged in two stacked rings of seven subunits each Complex is capped at one end by domelike heptameric ring of 97 Aa GroES subunits Bullet-shaped complex with C7 symmetry Central chamber of ~45 Å in which peptides fold
X-ray structure of the GroEL-GroES-(ADP) 7 complex GroES (cap) GroEL (cis) GroEL (trans)
Note the larger size of The cavity formed by the cis ring
ATP binding and hydrolysis drive the conformational changes in GroEL/ES Each GroEL subunit can bind and hydrolyze ATP which induces a conformational change -> movement -> work All 7 subunits acts in concert to work on the unfolded substrate protein Exposure (ATP) or hiding (ADP) the hydrophobic patch domain, to allow the protein to refold within an isolated hydrophilic microenvironment Eukaryotic counterpart: TRiC
Reaction cycle of the GroEL/ES chaperonin
Einige Erkrankungen sind auf Protein- Missfaltung zurückzuführen Mindestens 20, meist tödliche, humane Erkrankungen sind auf extrazelluläre Depots von normalerweise löslichen Proteinen zurückzuführen. Amyloide = unlösliche fibröse Proteinaggregate Symptome zeigen sich meist erst spät (30-70 Jahren), progressive Verschlechterung während 5-15 Jahren bis zum Tod