Daten, Datensammlung, Datenbank Inhalte Molekülstrukturen Spektren Patentinformation Moleküleigenschaften Fachliteratur Verweise Anbieterinformation Preise Implementierung Flatfile Lokale Datenbank www-zugriff
Definition einer Datenbank Datenbank = Verwaltungskomponente + Speicherungskomponente für persistente Daten, die einem bestimmten Zweck dienen.
Chemische Datenbanken Raw data User User interface Source file Application software Filtering Index file Library file Data Data 2 Data 3
Datenbanken Database ACD a CMC a MDDR a MedChem d SPRESI d WDI e No. of molecules > 25, Beilstein b > 7,, Covers organic chemistry from 779 CSD c > 2, Cambridge Structural Database; experimentally determined threedimensional structures of small molecules > 7, > 85, > 35, > 3,4, > 5, Description Available Chemicals Directory; catalogue of commercially available specialty and bulk chemicals from over 225 international suppliers Comprehensive Medicinal Chemistry database; structures and activities of drugs having generic names (on the market) MACCS-II Drug Data Report; structures and activity data of compounds in the early stages of drug development Medicinal Chemistry database; pharmaceutical compounds Substances and bibliographic data abstracted from the world s chemical literature World Drug Index; pharmaceutical compounds from all stages of development
Datenformate Moleküle MDL SDF SMILES Molfile, Mol2 PDB Reaktionen RXN RDF SMIRKS Reaction SMILES 44 Alchemy Cactvs/Ascii Cactvs/Scan Cactvs/Binary CAR Cerius II Charmm CIF CML Compass Cosmo CTX FIG Gaussian Archives Gaussian Cube Gaussian Input GIF itlist yperchem Index JCAMP JME M3D Molconn-Z MDL Molfile MDL SDF Molgen Mopac NETCDF PDB RDF RXN SCF Sharc Shel-X SLN SMD 4 SMD 5 SMILES STF Sybyl Sybyl II Vamp VRML XBSA Xtelplot XYZ XYZR
Molekülcodierung: SE-Code () SE ierarchically rdered Spherical description of Environment gehört zu den Fragmentcodierungen Konzept: Molekülzentrum und. bis 4. Sphäre um das Zentrum
Molekülcodierung: SE-Code (2) Cl Cl Zentrum. Sphäre 2. Sphäre 3. Sphäre
Molekülcodierung: SE-Code (3) Symbol Bedeutung R Ring % Dreifachbindung = Doppelbindung * Aromatische Bindung C C N N S S X Cl Y Br & Ringschluss, Trennzeichen (//) Sphärengenerator
Molekülcodierung: SE-Code (4) Cl Cl Zentrum *C*CC( *C,*C,=C/. Sphäre 2. Sphäre
Kleinste Pfade: Flood-fill N S N Weitere Möglichkeiten: Viele! z.b. Dijkstra s Algorithmus A B Molekülgraph d AB = 8 Bindungen
Adjacency Matrix & Connection Table 2 3 4 C Cl Cl Phosgen = C Cl Cl C 2 2 = A Canonische Atomnummerierung, z.b. mittels. Morgan s Algorithmus 2. Jochum-Gasteiger Algorithmus
Distanzmatrix & Zentrum des Moleküls D d = K d A 2,, d,2 K d i A,2 K L d A,3 d, A d2, A K j atom eccentricity η = max i j ( d ) ij vertex distance degree σ = A i d ij j= Achtung! Geometrisches Molekülzentrum x = A A i= x i,analog y und z Zentrum des Graphen ) Minimal atom eccentricity 2) Minimal vertex distance degree A = Atomnummer
SMILES (Simplified Molecular Input Line Entry Specification ) http://www.daylight.com http://www.daylight.com/smiles/ ausaufgabe SMILES-Tutorial
SSMILES (Simple SMILES) Rules Atoms are represented by atomic symbols: B, C, N,, F, P, S, Cl, Br, and I. Double bonds are =, triple bonds are #. Branching is indicated by parentheses. Ring closures are indicated by pairs of matching digits. C CC C= C#N CC(C)(C)C CCCCCC N=CC=CC=C S2C=CC=C2 CC(=)
SMARTS extended SMILES for search definitions general [ ], ; : &! atom primitives * DX Rr Aa h v# bond primitives ~ - = # @ : Examples: [N;D][#6;!R]
Reaction SMILES CC(=).CC>[+].[Cl-].CC>CC(=)CC
Atom Mapping: SMARTS [C2:]=[C:2][C:3]=[C:4][C2:5][:6]>>[:6][C2:][C:2]=[C:3][C:4]=[C2:5]
Atom Mapping: SMARTS [*:][N:2](=[:3])=[:4]>>[*:][N+:2](=[:3])[-:4]
Markush-Stukturen Eugene A. Markush, 923 (USA Patent) Cl R R 2 R 2 R R = phenyl, naphtyl R 2 = [Cl, Br, I] Patente! Beschreibungen von Sets! Kombinatorische Chemie!
Markush-Stukturen: CombiChem R R2 N R3 N R4 N R R3 N R2 R R3 N R4 R2 Imidazole Diketopiperazine Pyrrolidine R2 N R R2 Ar R4 N 2 R N R3 Isoquinolinone,4-Dihydropyridine
Das Structure-Data-Format (SDF) ausaufgabe: SDF-Tutorial L-Alanin http://www.mdli.com/downloads/literature/ctfile.pdf
3D-Features
Indexierung von Strukturen: Daylight Fingerprints The fingerprinting algorithm examines the molecule and generates the following: a pattern for each atom a pattern representing each atom and its nearest neighbors (plus the bonds that join them) a pattern representing each group of atoms and bonds connected by paths up to 2 bonds long... atoms and bonds connected by paths up to 3 bonds long... continuing, with paths up to 4, 5, 6, and 7 bonds long. Structure: C=CN Fingerprint -bond paths: -bond paths: 2-bond paths: 3-bond paths: C C C=C C=CN C=C C=CN N CN... R Exhaustive!