Unit 14. Decision Trees. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 259

Ähnliche Dokumente
Unit 4. The Extension Principle. Fuzzy Logic I 123

Unit 1. Motivation and Basics of Classical Logic. Fuzzy Logic I 6

Unit 15. A Case Study in Inductive Learning FS-FOIL. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 283

Unit 9. Prototype-Based Clustering. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 176

Unit 6. Fuzzy Inference. Fuzzy Logic I 159

Unit 5. Mathematical Morphology. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 85

Number of Maximal Partial Clones

Unit 2. A Brief Introduction to Fuzzy Logic and Fuzzy Systems

Introduction FEM, 1D-Example

Bayesian Networks. Syntax Semantics Parametrized Distributions Inference in Bayesian Networks. Exact Inference. Approximate Inference

Unit 9. Complementary Topics. Fuzzy Logic I 231

Computational Models

A Classification of Partial Boolean Clones

Data Structures and Algorithm Design

Introduction FEM, 1D-Example

Information Flow. Basics. Overview. Bell-LaPadula Model embodies information flow policy. Variables x, y assigned compartments x, y as well as values

Statistics, Data Analysis, and Simulation SS 2015

Einführung in die Computerlinguistik

Algorithmen und Datenstrukturen Musterlösung 5

a) Name and draw three typical input signals used in control technique.

Magic Figures. We note that in the example magic square the numbers 1 9 are used. All three rows (columns) have equal sum, called the magic number.

v+s Output Quelle: Schotter, Microeconomics, , S. 412f

Tube Analyzer LogViewer 2.3

Weather forecast in Accra

Algorithm Theory 3 Fast Fourier Transformation Christian Schindelhauer

Teil 2.2: Lernen formaler Sprachen: Hypothesenräume

Exercise (Part XI) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

KTdCW Artificial Intelligence 2016/17 Practical Exercises - PART A

Accelerating Information Technology Innovation

FEM Isoparametric Concept

Level 2 German, 2015

DYNAMISCHE GEOMETRIE

Schachaufgabe 05: Ma-Übung Chess Problem 05: Mate training

Analysis Add-On Data Lineage

Data Structures and Algorithm Design

Finite Difference Method (FDM)

D-BAUG Informatik I. Exercise session: week 1 HS 2018

FEM Isoparametric Concept

Logik für Informatiker Logic for computer scientists

Interpolation Functions for the Finite Elements

Ressourcenmanagement in Netzwerken SS06 Vorl. 12,

Scheduling. chemistry. math. history. physics. art

Exercise (Part II) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Informatik - Übungsstunde

Supplementary material for Who never tells a lie? The following material is provided below, in the following order:

Automatentheorie und formale Sprachen reguläre Ausdrücke

Mock Exam Behavioral Finance

Ewald s Sphere/Problem 3.7

Elementary Intensive Reading I German UN1113 Fall 2017

Geometrie und Bedeutung: Kap 5

!! Um!in!ADITION!ein!HTML51Werbemittel!anzulegen,!erstellen!Sie!zunächst!ein!neues! Werbemittel!des!Typs!RichMedia.!!!!!!

Knowledge-Based system. Inference Engine. Prof. Dr. T. Nouri.

Wortdekodierung. Vorlesungsunterlagen Speech Communication 2, SS Franz Pernkopf/Erhard Rank

Mixed tenses revision: German

Conceptual Data Modeling "Enhanced Entity-Relationship"

Seeking for n! Derivatives

Functional Analysis Final Test, Funktionalanalysis Endklausur,

Priorities (time independent and time dependent) Different service times of different classes at Type-1 nodes -

Unit 8. Applications in Image Processing with Examples. Fuzzy Logic I 205

SAMPLE EXAMINATION BOOKLET

Attention: Give your answers to problem 1 and problem 2 directly below the questions in the exam question sheet. ,and C = [ ].

H o c h s c h u l e D e g g e n d o r f H o c h s c h u l e f ü r a n g e w a n d t e W i s s e n s c h a f t e n

Willkommen zur Vorlesung Komplexitätstheorie

Parametric Spectral Estimation

Musterlösung 3. D-MATH Algebra I HS 2015 Prof. Richard Pink. Faktorielle Ringe, Grösster gemeinsamer Teiler, Ideale, Faktorringe

Rev. Proc Information

Connection between electrical resistivity and the hydraulic conductivity

DIBELS TM. German Translations of Administration Directions

Zariski-Van Kampen Method

Algorithmic Bioinformatics III Graphs, Networks, and Systems SS2008 Ralf Zimmer

Wissenschaftliche Dienste. Sachstand. Payment of value added tax (VAT) (EZPWD-Anfrage ) 2016 Deutscher Bundestag WD /16

Ein Stern in dunkler Nacht Die schoensten Weihnachtsgeschichten. Click here if your download doesn"t start automatically

Produktdifferenzierung und Markteintritte?

Einführung in die Finite Element Methode Projekt 2

Level 1 German, 2012

GERMAN LANGUAGE Tania Hinderberger-Burton, Ph.D American University

AS Path-Prepending in the Internet And Its Impact on Routing Decisions

Unterspezifikation in der Semantik Hole Semantics

Inequality Utilitarian and Capabilities Perspectives (and what they may imply for public health)

Multicriterial Design Decision Making regarding interdependent Objectives in DfX

VORANSICHT. Halloween zählt zu den beliebtesten. A spooky and special holiday Eine Lerntheke zu Halloween auf zwei Niveaus (Klassen 8/9)

Can I use an older device with a new GSD file? It is always the best to use the latest GSD file since this is downward compatible to older versions.

NEWSLETTER. FileDirector Version 2.5 Novelties. Filing system designer. Filing system in WinClient

Level 1 German, 2014

Titelmasterformat Object Generator durch Klicken bearbeiten

Übung 3: VHDL Darstellungen (Blockdiagramme)

Lehrstuhl für Allgemeine BWL Strategisches und Internationales Management Prof. Dr. Mike Geppert Carl-Zeiß-Str Jena

Application Note. Import Jinx! Scenes into the DMX-Configurator

Einführung in die Computerlinguistik reguläre Sprachen und endliche Automaten

Order Ansicht Inhalt

Wortstellung. Rule 1. The verb is the second unit of language in a sentence. The first unit of language in a sentence can be:

rot red braun brown rot red RS-8 rot red braun brown R S V~

Materialien zu unseren Lehrwerken

Harry gefangen in der Zeit Begleitmaterialien

Level 1 German, 2013

Unit 16. Feed-Forward Artificial Neural Networks. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 302

WAS IST DER KOMPARATIV: = The comparative

Causal Analysis in Population Studies

Transkript:

Unit 14 Decision Trees Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 259

Introduction Nearest prototype classifiers are computationally expensive black-box models Decisions can be made in a more structured way, e.g. by asking questions successively A decision tree is a classifier which makes classifications by asking questions successively ; each level corresponds to a question, each leaf corresponds to a final classification. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 260

Construction of Decision Trees The top-down construction of a decision tree is, more or less, straightforward. For constructing a decision tree from data, we have to determine which questions to ask in order to achieve an acceptable result. In the popular ID3 method, this is done by considering the gain of information at each node. In the following, for convenience, let us make the convention X p+1 = Y. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 261

The ID3 Algorithm 1. Given: data set X = {x i i = 1,..., n}, assume that all p + 1 variables are categorical, i.e. X i = {1,..., C i } 2. Call ID3(X,Root,{1,..., p}) 3. ID3(X,N,I) (a) If all x in X belong to the same output class, exit. (b) Determine component i I such that gain of information g i (X) is maximal (c) Divide X into disjoint subsets (for j = 1,..., C i ) (d) For all j such that X ji Generate new node N j Call ID3(X ji,n j,i\{i}) X ji = {x X x i = j}. (1) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 262

Computing the Gain of Information H(Y ) = C p+1 i=1 Y ip+1 Y log 2 Y ip+1 Y g i (X) = H(X) C i j=1 X ji X H(X ji ) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 263

Fuzzy Decision Trees Classical decision trees can only process crisp categorical attributes There are extensions that can process real-valued attributes (CART, C4.5), but they all split the real line into crisp intervals with artificially sharp boundaries; therefore, no interpolative behavior can be modeled To work with fuzzy instead of crisp predicates overcomes this problem The FS-ID3 algorithm is an efficient variant accommodating also classical decision trees Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 264

The Basic Setting Data samples (i = 1,..., n): x i = (x i 1,..., xi p, xi p+1 ) X 1 X p X p+1 A fuzzy predicate in this setting is an X 1 X p+1 [0, 1] mapping The dummy mapping t(.) gives the actual truth value (from [0, 1]) for a given linguistic expression Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 265

Crisp Categorical Attributes Assume that X r = {L r,1,..., L r,nr }, then the following two predicates can be defined: 1 if x r = L r,j t(x is L r,j ) = 0 otherwise 1 if x r L r,j t(x is not L r,j ) = 0 otherwise Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 266

Fuzzy Categorical Attributes For a fuzzy categorical attribute r we have an unstructured set of N r labels {L r,1,..., L r,nr }. X r = F ( {L r,1,..., L r,nr } ) [0, 1] Nr. x r = (t r,1,..., t r,nr ) t(x is L r,j ) = t r,j t(x is not L r,j ) = 1 t r,j Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 267

Fuzzy Attributes Given a set of linguistic labels M r,1,..., M r,nr and their corresponding semantics modeled by fuzzy sets, we can define 4 N r atomic fuzzy predicates: t(x is M r,j ) = µ Mr,j (x r ) t(x is not M r,j ) = 1 µ Mr,j (x r ) t(x is at least M r,j ) = sup{µ Mr,j (u) u x r } t(x is at most M r,j ) = sup{µ Mr,j (u) u x r } Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 268

Default Predicate for Missing Values t(x is NA r ) = 1 if x r is missing 0 otherwise Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 269

Compound Fuzzy Predicates t ( p(x) ) = 1 t(p(x)) t ( (p q)(x) ) = T ( t(p(x)), t(q(x)) ) t ( (p q)(x) ) = S ( t(p(x)), t(q(x)) ) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 270

Binary Fuzzy Decision Trees Each decision tree has a root node and child nodes Each child node is a leaf node or root node of a subtree Each non-leaf node has exactly two child nodes Each non-leaf node is associated with a fuzzy predicate Each leaf node is associated with a class assignment Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 271

The FS-ID3 Algorithm Input: Output: (fuzzy) goal predicates C = {C 1,..., C R }, fuzzy set of samples X cur, set of test predicates P tree node N cur IF stopping criterion is fulfilled THEN BEGIN compute class assignment C cur N cur is leaf node with class assignment C cur END ELSE BEGIN find best predicate p = argmax p P G(p, X cur) compute new memberships for the left branch µ X (xi ) = t ( (x i is X cur ) p(x i ) ) compute left branch N = FS-ID3(C, X, P) compute new memberships for the right branch µ X (xi ) = t ( (x i is X cur ) p(x i ) ) compute right branch N = FS-ID3(C, X, P) N cur is parent node with children N and N END Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 272

Computing the Gain of Information N = µ N (x) p(x) = t ( p(x) ) x X x X G(p, X) =E({p i (X) i = 1,..., R}) ( r i (X) E({p i r i (X) E({p i (X) i = 1,..., R})+ (X) i = 1,..., R})) E(P ) = q P q log 2 q Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 273

Computing the Gain of Information (cont d) p i (X) = p i (X) = p i (X) = r (X) = r (X) = c i(x) Ri=1 c i (X) (p c i)(x) Ri=1 (p c i )(X) ( p c i)(x) Ri=1 ( p c i )(X) Ri=1 (p c i )(X) Ri=1 c i (X) Ri=1 ( p c i )(X) Ri=1 c i (X) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 274

Stopping Criteria No more samples: if the number of samples decreases under a certain threshold Only one class remains: if most samples belong to the same class Maximum depth reached: if the depth of the tree reaches a predefined maximum No new rules found: if no new rule which increases the classification quality is found Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 275

Applying FS-ID3 Goal predicates: R = N p+1, t(c j (x)) = t(x is L p+1,j ), (with i = 1,..., N p+1 ) Test predicates: set of all predicates defined for variables 1,..., p Sample set: sample set is first initialized with Xcur = X Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 276

Class Assignments (1/3) At each leaf node, a fuzzy sample set Xcur remains The class assignment C cur can either be crisp (the leaf node is assigned to one goal predicate) or fuzzy (the leaf node is fuzzily assigned to the goal predicates) Crisp majority decision: the leaf node is assigned to that goal predicate C j for which is maximal. x X t(c j (x p+1 )) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 277

Class Assignments (2/3) Proportional assignment: the leaf node is assigned to each goal predicate C j with a degree of x X t(c j(x p+1 )) Ri=1 x X t(c j(x p+1 )) Normalized assignment: the leaf node is assigned to each goal predicate C j with a degree of max R i=1 x X t(c j(x p+1 )) x X t(c j(x p+1 )) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 278

Class Assignments (3/3) Note that, although the approach seems plausible at first glance, the results of proportional assignment cannot be understood as fuzzy membership degrees (since relative frequencies are not truth-functional) More correctly, one can understand the relative frequencies in the leaf nodes (proportional assignment) as probabilities to which a sample potentially belongs to the respective class Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 279

Fuzzy Decision Trees vs. Fuzzy Rules Every fuzzy decision tree can be interpreted as a set of fuzzy rules Each leaf node corresponds to one rule The antecedent (i.e. IF part) of each rule is the conjunction of predicates corresponding to the path from the root note to the respective leaf node The consequent part (i.e. THEN part) is determined by the class assignment of the respective leaf node Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 280

Example: FS-ID3 Decision Tree for the Iris Data Set class_is_iris setosa class_is_iris versicol class_is_iris virginica T F petal_length_isatleast_l 97 petal_width_isatleast_h 68 T class_is_iris virginica 35 F class_is_iris setosa 33 class_is_iris versicol 29 Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 281

Example: FS-ID3 Decision Tree for the Wine Data Set Class_Is_C1 Class_Is_C2 Class_Is_C3 T F Flavanoids_IsAtLeast_M 134 Alcohol_IsAtLeast_M 78 T Class_Is_C1 48 F Class_Is_C2 30 ColorIntensity_IsAtLeast_L 56 T Class_Is_C3 45 F Class_Is_C2 11 Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 282