1 Foundations of Knowledge Management Broad Knowledge Bases Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria web: 1

2 Rückblick Homonyme: Mehrdeutige Benennungen (z.b. Bank) Homophone: Gleichlautende Benennungen (z.b. Mohr, Moor) Homographen: Gleiche Schreibweisen (z.b. Wach(-)s(-)tube) Synonyme: Mehrere Bezeichnungen stehen für denselben Begriff (Auto, PKW) Antonyme: Gegensätze (z.b. hart - weich) Hyper/Hyponyme: Abstraktere / Spezifischere Begriffe (z.b. Fahrzeug / PKW) Formale Begriffssysteme zielen oft darauf ab wenig Raum für Interpretation zu lassen! Homonymzusätze (Qualifikatoren) (z.b. Ring <Schmuckstück>, Ring <Mathematik>) Korrekte Zuordnung von Begriffen und Benennungen oft erst aus dem Kontext heraus interpretierbar! Begriff Konzept Wissen Objekt Semiotisches Dreieck Reale Welt Wort Ausdruck Symbol Sprache 2

3 Rückblick Index Schlagworte Liste Katalog Lexikon Taxonomie Hierarchie Gehört zu Klassifikation Thesauraus Hierarchie Äquivalenz Assoziation Ontologie Konzepte Eigenschaften Beziehungen Regeln 3

4 Overview Knowledge Organization (last lecture) Broad Knowledge Bases Ontologies WordNet ConceptNet And more Knowledge Acquisition (next lecture) Systems Perspective Based in part on slides prepared by D. Reisinger 4

5 Reading the Web NELL: Never Ending Language Learning t / t / compute.html?_r=1 language-carnegie-tctv/ t 5

6 Konzeptueller Graph und Semantisches Netz Eine geordnete Zusammenstellung von Begriffen und Bezeichnungen, deren Zusammenhang über beliebige Beziehungen miteinander definiert wird. Graphische Begriffsnetze mit definierter Semantik Sowohl Begriffe als auch Beziehungen sind typisiert und es existiert eine e Grammatik at für deren e Verwendung e Zur Überführung von Information in anwendbares Wissen sind verwandt-mit -Relationen nicht mehr ausreichend -> Sprung vom Thesaurus zum semantischen Netz Eingeführt von Linguisten, um die Bedeutung von Wörtern entsprechend ihrer Verwendung darzustellen 6

7 Ontologie Eine Definition "An ontology is a formal, explicit it specification of a shared conceptualization of a domain of Interest.... For AI systems, what exists is that which can be represented (Gruber) Eine Ontologie ist eine formale Beschreibung von Konzepten und Beziehungen, eine abstrakte, vereinfachte Sicht auf die Welt Explicit: festgeschrieben, definiert Formal: formalisierter i Aufbau, daher maschinenlesbar Shared: Übereinkunft einer Community Domain of Interest: Wissensgebiet Conceptualisation: Begrifflichkeiten schaffen 8

8 Bestandteile von Ontologien Klassen (Concepts) Relationen zwischen Klassen Eigenschaften von Klassen Ontologie Instanzen Regeln Einschränkungen 9

9 Begrifflichkeiten Ontology Engineering: Entwicklung, Verwendung und Instandhaltung von Ontologien Meta-Ontologie: eine Ontologie, die einer anderen Ontologie zugrunde liegt = abstrahierte t Beschreibung von Ontologien und so die Verknüpfung des Wissens verschiedener Domänen Offene-Welt-Annahme: Ontologien sollten potentiell ti von anderen Ontologien verwendbar bzw. einbindbar sein Ontology Mapping: aufeinander Abbilden von Ontologien Ontology Merge: Konsolidierung, Zusammenführen von Ontologien 10

10 Nutzen To share common understanding of the structure of information among people and software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledge (Noy, McGuinness) Interoperabilität in heterogenen Landschaften erreichen Informations- und Interaktionsqualität steigern Zeitersparnis, Kostensenkung 11

11 Einsatzbereiche Eine unterstützende Technologie des Wissensmanagements Wissens-Engineering und -Repräsentation Informationsretrieval, -extraktion und -visualisierung Informationsmodellierung und -integration Künstliche Intelligenz, Entscheidungsunterstützung g Integration von Anwendungssystemen (EAI), Offene Systeme u.v.m 12

12 Anwendung: Semantic Web Warum? Probleme bei herkömmlichen Info-Retrieval am Web Hoher Recall, geringe Precision (Google!) Suchresultate hängen stark vom in der Anfrage verwendeten Vokabular ab Resultate sind einzelne Web-Seiten Suchergebnisse sind anderen Softwarewerkzeugen nicht zugänglich Verbesserungen in der Suchtechnologie können Probleme nicht lösen! Die Bedeutung des Web-Inhaltes ist den Suchmaschinen nicht zugänglich! Tim Berners Lee explaining some ideas related to the Semantic Web on Video: 13

13 Möglicher Lösungsansatz Lösungsansätze Die Repräsentation bleibt wie sie ist, aber wir entwickeln verbesserte Methoden (Künstliche Intelligenz und Computerlinguistik), die es erlauben, die das Programme die Bedeutung der Inhalte verstehen Der Ansatz des Semantic Web Die Inhalte des Webs werden in einer Form repräsentiert, die es Software- Werkzeugen leichter erlaubt, die Inhalte zu verarbeiten Andere intelligente Techniken nutzen dies aus, um neuartige Anwendungen zu ermöglichen Gesamtheit von über das Internet zugänglichen Ressourcen, die eine semantische Struktur besitzen und durch Meta-Daten beschrieben sind. 14

14 Semantic Web - Definition The Semantic Web is an extension of the current web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. (Berners-Lee, Hendler, Lassila) The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. (W3C) -> Angleichung der formalen an die natürliche Sprache 15

15 RDF / RDF Schema Subjekt Prädikat Objekt Zielsetzung: say anything about anything Bedingung: g Definition beliebiger Klassen, Properties, deren Wiederverwendung RDF = Ressource Description Framework RDF-Modell ist ein formal fundiertes grafisches Modell (gerichteter Graph) Drei Elemente: Subjekt (Knoten), Prädikat (Kante), Objekt (Knoten): Tripel Subjekt: Ressource, über die eine Aussage getroffen wird Prädikat: Art der Beziehung zwischen Subjekt und Objekt Objekt: Wert der Beziehung Vokabulare können von anderen RDF-Graphen referenziert werden (URIs) 16

16 Eine vereinfachte Napoleon-Ontologie Auf einer Website: Napoleon ist 1.50 gross und leistete einen Beitrag zur alten Geschichte. Klassenebene Thing Name rdf:subclass rdf:property Person Geb.Dt. Größe leistet Beitrag rdf:type Wissensch. Instanzenebene hat Adres. Napoleon rdf:type Name / / leistet Beitrag Napoleon Alte Geschichte Größe

17 Ontologieeditor Beispiel: Protégé d / 19

18 Freebase MOVIE DEMO: 20

19 Two current research efforts focusing on the construction of broad knowledge bases WordNet ConceptNet 21

20 WordNet is a project (started t in 1985) at the Cognitive Science Laboratory at the Princeton University. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. One purpose of the dataset is to support Natural Language Processing. RDF-Modelle unter: Wordnet

21 sense synset Wordnet Glossary [Excerpt] [ A meaning of a word in WordNet. Each sense of a word is in a different synset. Example: strike, work stoppage -- (a group's refusal to work in protest against low pay or bad work conditions; "the strike lasted more than a month before it was settled") strike -- ((baseball) a pitch that the batter swings at and misses, or that the batter hits into foul territory, or that the batter does not swing at but the umpire judges to be in the area over home plate and between the batter's knees and shoulders; "this pitcher throws more strikes than balls") A synonym set; a set of words that are interchangeable in some context (Sharing the same word sense). Example: car, auto, automobile, autocar 23

22 Wordnet Glossary [Excerpt] [ hypernym The generic term used to designate a whole class of specific instances. Y is a hyponym hypernym of X if X is a (kind of) Y. Illustration: vehicle The specific term used to designate a member of a class. X is a hyponym of Y if X is a (kind of) Y. Illustration: automotive vehicle holonym The name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y. Illustration: Car meronym sister The name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y. Illustration: Engine Matching strings that are both the immediate hyponyms of the same superordinate (or hypernym). Illustration: automotive vehicle, motor vehicle 24

23 base form Wordnet Glossary [Excerpt] [ The base form of a word or collocation is the form to which inflections are added. Illustration: Base form of playing, played, plays, play part of speech WordNet defines "part of speech" as either noun, verb, adjective, or adverb. Same as syntactic category. Illustration: {buy\verb fast\adjective skis\noun} collocation A collocation in WordNet is a string of two or more words, connected by spaces or hyphens. Examples are: man-eating shark, blue-collar, depend on, line of products. 25

24 Wordnet DEMO WordNet Browser/Babylon Each sense matching the search selected displayed as follows: Sense n [{synset_offset}] [<lex_filename>] word1[#sense_number][, word2...] synset_offset is the byte offset of the synset in the data.pos file corresponding to the syntactic category, lex_filename is the name of the lexicographer file that the synset comes from, word1 is the first word in the synset (note that this is not necessarily the search word) and sense_number number is the WordNet sense number assigned to the preceding word. synset_offset, lex_filename, and sense_number are generated if the appropriate Options are specified. 26

25 ConceptNet The ConceptNet t knowledgebase is a semantic network consisting of concepts and relations between concepts. Commonsense knowledge in ConceptNet encompasses the spatial, physical, social, temporal, and psychological l aspects of everyday life. 27

26 Wordnet vs. ConceptNet Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. In Conceptnet, t 1. nodes can be compound elements representing higher-order compound concepts Conceptnet, does not distinguish between word senses 2. Extends some of WordNet s relationships (synonym, is-a, part-of) to more than twenty semantic relations including, for example, CapableOf, EffectOf, SubeventOf, PropertyOff, MotivationOf, etc 3. Knowledge is more informal, defeasible and practically oriented Contains knowledge that is defeasible (often true, but not always e.g. EffectOf( fall of bicycle, get hurt ) 28

27 Overall Objective: ConceptNet Represent commonsense knowledge, which is knowledge that every person is assumed to possess. Commonsense knowledge is typically ommitted from social communications ConceptNet was designed to make practical context-based inferences over real-world texts. 29

28 ConceptNet Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. 30

29 ConceptNet Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. In version 2, ConceptNet t contained, 1.6 million assertions interrelating nodes. f counts the number of times a fact is uttered in the OMCS corpus. i counts how many times an assertion was inferred during the relaxation phase 31

30 ConceptNet ConceptNet t supports textual-reasoning t tasks over real-world ld documents including for example topic-jisting (e.g. a news article containing the concepts, gun, convenience store, demand money and make getaway might suggest the topics robbery and crime ) ), affect-sensing (e.g. this is sad and angry), analogy-making g (e.g. scissors, razor, nail clipper, and sword are perhaps like a knife because they are all sharp, and can be used to cut something ), text summarization and others 32

31 ConceptNet s Relations 33

32 ConceptNet Features Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Contextual t neighbourhoods h Provided by the API method Get Context() Performs spreading activation radiating outward from a source node Considering the number and strengths of all paths which connect the two nodes Topic Generation Utilizing Get Context() as well, Example: Query Expansion Entering restaurant would return related queries such as order food, waiter and menu 34

33 ConceptNet Features Liu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,. Volume 22, Kluwer Academic Publishers. Analogy making Analogy Two ConceptNet nodes are analogous if their sets of back-edges (incoming edges) overlap ConceptNet s GetAnalogousConcepts() supports Analogy making Projection Projection is graph traversal from an origin node, following a single transitive relation type (Modus ponens: If A->B and B->C then A->C) Affect Sensing Uses ConceptNet s method GuessMood() Leveraging edges between concepts and specified affect categories 35

34 ConceptNet DEMO 36

35 ConceptNet Application Example Summarize text t from Wikipedia[1]: A car accident or car crash is an incident in which an automobile collides with anything that causes damage to the automobile, including other automobiles, telephone poles, buildings or trees, or in which the driver loses control of the vehicle and damages it in some other way, such as driving into a ditch or rolling over. Sometimes a car accident may also refer to an automobile striking a human or animal. Car crashes also called road traffic accidents (RTAs), traffic collisions, auto accidents, road accidents, personal injury collisions, motor vehicle accidents (MVAs), kill an estimated 1.2 million people worldwide each year, and injure about forty times this number. [1] Text summarization provided by ConceptNet Car accident or car crash was incident in which. Automobile collided with anything that cause damaged to automobile include other automobile telephone pole building or tree. Driver lost control of vehicle and damages. Drove into ditch. Rolled. Car accident referred to automobile. Car crash called road traffic accident. Killed estimate 1. Injured about forty timed number. 37

36 FrameNet ABi Brief foverview The Berkeley FrameNet project is creating an on-line lexical l resource for English, based on frame semantics and supported by corpus evidence. The aim is to document the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses, through computer-assisted annotation of example sentences and automatic tabulation and display of the annotation results. The major product of this work, the FrameNet lexical database, currently contains more than 10,000 lexical units (defined below), more than 6,100 of which are fully annotated, in more than 825 semantic frames, exemplified in more than 135,000 annotated sentences. 38

37 FrameNet ABi Brief foverview Semantic frames are schematic representations of situation types (eating, spying, removing, classifying, etc.) together with lists of the kinds of participants, props, and other conceptual roles that are seen as components of such situations. Example: Cause_change_of_position_on_a_scale 39

38 FrameNet Source: Excerpt of Framenet, accessed on July 2nd, 2007 Verb Increase is related to Frame: Who causes increase? Increase of what numbers? What causes increase? What is increased? 40

39 FrameNet ABi Brief foverview Lexical Units invoke Frames. Example: LUs for Cause_change_of_position_on_a_scale 41

40 Cyc ABi Brief foverview Began as a research project in 1984 Initiated and conducted by Cycorp Inc. Project founder and CEO Doug Lenat: Watch his Google video of the year 2006! Computers Versus Common Sense &q=engedu Initially hand-crafted knowledge base -> now based on several strategies "Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared dto mankind dbefore writing." ~ Doug Lenat, June 21, 2001 Open Source Version available OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. ( com/cyc/opencyc/overview) 42

41 Cyc Cyc s Objective Cycorp's goal is to break the "software brittleness bottleneck" once and for all by constructing a foundation of basic "common sense" knowledge - a semantic substratum of terms, rules, and relations - that will enable a variety of knowledge-intensive products and services. What is Cyc? The Cyc knowledge base (KB) is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life. The medium of representation ti is the formal language CycL, described d below. The KB consists of terms--which constitute the vocabulary of CycL--and assertions which relate those terms. 43

42 What does Cyc know? 44

43 Next week we will talk about how to construct such knowledge bases incl. Games with a purpose and other participative forms of knowledge acquisition 45

44 Any further questions? See you next week! 46

