Praktische Optimierung

Praktische Optimierung Dozent: Günter Rudolph Vertretung: Nicola Beume Wintersemester 2007/08 Universität Dortmund Fachbereich Informatik Lehrstuhl für Algorithm Engineering (LS11) Fachgebiet Computational Intelligence 17.10.2007 Nicola Beume (LS11) PrOp 17.10.2007 1 / 20

Grundlagen von Optimierproblemen Am Montag gelernt: Formalitäten Überblick über dieses Semester Def.: Optimierproblem Optimum, Optimalstelle lokales, globales Optimum Minimierung wie Maximierung konvexe Funktionen Nicola Beume (LS11) Rückblick 17.10.2007 2 / 20

Grundlagen klassischer Optimiertechniken Themen der heutigen Vorlesung: Exkurs: Differentialrechnung Optimalitätsbedingungen für unrestringierte Optimierprobleme Analyse von quadratischen, konvexen Funktionen Gradientenverfahren Nicola Beume (LS11) Vorschau 17.10.2007 3 / 20

Exkurs: Differentialrechnung Grundlagen der Differentialrechnung f : R R differenzierbar in x existiert und ist endlich lim h 0 f (x+h) f (x) h Der Limes wird mit f (x) bezeichnet und heißt Ableitung von f an der Stelle x Einige grundlegende Ableitungsregeln Sei f : R R, g : R R, x, α, n R (α) = 0 Konstante (αf ) = αf Faktorregel (f + g) = f + g Summenregel (fg) = gf + fg Produktregel ( f g ) = gf fg g 2 Quotientenregel (f g) (x) = (f (g(x))) = g (x)f (g(x)) Kettenregel (x n ) = nx n 1 Potenzregel (ln x) = 1 x logarithmische Ableitung Nicola Beume (LS11) Differentialrechnung 17.10.2007 4 / 20

Partielle Funktion und Ableitung f : R n R Argument x = (x 1,..., x n ) ist ein n-dimensionaler Vektor ˆzeigt an, dass eine Variable (Veränderliche) als konstant angenommen wird Sei ˆx R n und f ( ˆx 1,..., x k 1 ˆ, x k, x k+1 ˆ,..., ˆx n ) eine partielle Funktion mit existierender Ableitung in x k. Partielle Ableitung nach x i : f (x 1,...,x n) x i = f x i (x 1,..., x n ) = f x i = f xi Handwerkliches: Alle Veränderliche außer x i als Konstanten auffassen und die nunmehr nur noch von x i abhängende Funktion wie gewohnt ableiten. Nicola Beume (LS11) Differentialrechnung 17.10.2007 5 / 20

Beispiel: Einfache partielle Ableitungen f (x, y, z) = x 2 + xy 2 + 2z 3 Partielle Ableitungen: f x (x, y, z) = 2x + y 2 f y (x, y, z) = 2xy f z (x, y, z) = 6z 2 f (x, y) = y log(2xy), x, y > 0 Partielle Ableitungen: 1 f x (x, y) = y 2y 2xy = y x 1 f y (x, y) = log(2xy) + y 2x 2xy = log(2xy) + 1 Regel: log(g(x)) = g (x) g(x) Nicola Beume (LS11) Differentialrechnung 17.10.2007 6 / 20

Partielle Ableitung zweiter Ordnung Ordnung: Anzahl abzuleitender Variablen 2 f x i x j (x 1,..., x n ) = f xi x j (höhere Ordnungen analog) Prinzip: x i ( x j ) Handwerkliches: erst nach x j ableiten, dann Resultat nach x i ableiten Reihenfolge egal, falls f C m (G) (f m-fach stetig differenzierbar) Satz von Schwarz: Für jedes f C m (G) mit G R n offen ist die Reihenfolge der partiellen Ableitungen bis zur m-ten Ordnung irrelevant. Nicola Beume (LS11) Differentialrechnung 17.10.2007 7 / 20

Beispiel: Partielle Ableitungen zweiter Ordnung f (x, y) = x 3 2x 2 y 2 + 4xy 3 + y 4 + 10 Partielle Ableitungen f x = 3x 2 4xy 2 + 4y 3 f y = 4x 2 y + 12xy 2 + 4y 3 1. Ordnung f xx = 6x 4y 2 f yx = 8xy + 12y 2 2. Ordnung f xy = 8xy + 12y 2 f yy = 4x 2 + 24xy + 12y 2 f xy = f yx Nicola Beume (LS11) Differentialrechnung 17.10.2007 8 / 20

Gradient f : R n R Gradient f (x): Vektor der partiellen Ableitungen von f (x). f (x) f (x) f (x) f (x) := ( x 1, x 2,..., x n ) ( heißt Nabla-Operator, steht für transponiert) Hessematrix 2 f (x): Matrix der partiellen zweiten Ableitungen von f (x). 2 f (x) 2 f (x) x 1 x 1 x 1 x 2 2 f (x) x 1 x n 2 f (x) :=... 2 f (x) x n x 1 2 f (x) x n x n Falls f C 2 Satz von Schwarz Hessematrix symmetrisch Nicola Beume (LS11) Differentialrechnung 17.10.2007 9 / 20

Taylor Gemäß Taylor-Formel nähert man eine Funktion durch eine Taylor-Reihe bestehend aus Taylor-Polynomen an f (x + h) = f (x) + ( f (x)) h + 1 2 h 2 f (x)h + h 2 ρ(h) mit lim h 0 ρ(h) = 0 Abweichung kann beliebig klein werden Andere Darstellung mit h = x x 0 f (x) f (x 0 ) + ( f (x 0 )) (x x 0 ) + 1 2 (x x 0) 2 f (x 0 )(x x 0 ) Approx. durch Gerade Approx. durch Paraboloid Ende Exkurs Differentialrechnung Nicola Beume (LS11) Differentialrechnung 17.10.2007 10 / 20

Optimalitätsbedingungen für unrestringierte Probleme f C 2, x R n mit f (x ) = 0 und 2 f (x ) positiv definit f hat in x ein lokales Minimum negativ definit lokales Maximum, indefinit kein lokales Optimum Definitheitskriterium für (n n)-matrix A: a 11 a 1k Sei k :=... k-te Abschnittsdeterminate a k1 a kk (von Hauptminor: quadr. Untermatrix beginnend links oben) A pos. def. k : k > 0 A neg. def. k : ( 1) k k > 0 ( ) a b Beispiel: b c a > 0 und ac b 2 > 0 pos. def. a < 0 und ac b 2 > 0 neg. def. a < 0 und ac b 2 < 0 indef. Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 11 / 20

Beispiel: Methodik bestimmt lokales Optium globales Optimum? f (x, x) + für x + und f (x, x) für x f unbeschränkt lokales Opt. ist kein globales Opt. Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 12 / 20 f (x, y) = x 3 + y 3 3xy f x = 3x 2 3y f y = 3y 2 3x partielle Ableitungen bestimmen Nullstellen der partiellen Ableitungen suchen Gleichungssystem mit zwei Lösungen: (0, 0) und (1, 1) ( ) 6x 3 2 f (x, y) = Hessematrix berechnen 3 6y Eigenschaften( der Hessematrix ) an Lösungsstellen 0 3 2 f (0, 0) = indefinit (0, 0) keine Optimalstelle 3 0 ( ) { 6 3 2 1 = 6 > 0 und f (1, 1) = 3 6 2 = 6 6 ( 3)( 3) = 27 > 0 pos.def. lokales Optimum: f (1, 1) = 1

Beispiel: Methodik hat erste Schwierigkeiten f (x, y) = x 2 + y 2 2xy + 1 f x = 2x 2y f y = 2y 2x Nullstellen der partiellen Ableitungen suchen alle (x, y) mit x = y partielle Ableitungen bestimmen Hessematrix berechnen ( ) { 2 2 2 1 > 0 f (x, y) = 2 2 2 = 0 so keine Entscheidung möglich Analyse durch Nachdenken und Abschätzung der Funktion: f (x, y) = x 2 + y 2 2xy + 1 = (x y) 2 +1 1 }{{} 0 Lösung raten: x : f (x, x) = 1 alle Elemente in {(x, y) R 2 : x = y} sind lokale und globale Minimalstellen Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 13 / 20

Beispiel: Methodik versagt f (x, y) = x 2 + y 2 offensichtlich: globales Minimum in (0, 0) Analyse durch partielle Ableitung: f x = 2x 1 2 x 2 +y 2 f y = 2y 1 2 x 2 +y 2 suche Nullstelle der partiellen Ableitung: f x (0, y) = 0 für y 0 und f y (x, 0) = 0 für x 0, aber f x (0, 0) und f y (0, 0) undefiniert es existieren keine partiellen Ableitungen in (0, 0) Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 14 / 20

Optimum einer quadratischen, konvexen Funktion (1) f (x) = 1 2 x Ax + b x + c f (x) = Ax + b =! 0 2 f (x) = A A pos. def., da f konvex lokales Opt. = globales Opt. Umformung des Gradienten Ax = b A 1 linksseitig A 1 Ax = A 1 b (Erinnerung: für transponiert) suche Optimalstellen x = A 1 b =: x Optimalstelle eines globalen Optimums Inverse Matrix A 1 existiert immer: A pos. def. det A > 0 det A 0 A 1 existiert Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 15 / 20

Optimum einer quadratischen, konvexen Funktion (2) Umformung von f durch ausmultiplizieren f (x) = 1 2 x Ax + b x + c = 1 ( ) ( ) ( ) 2 (x a11 a 1, x 2 ) 12 x1 x1 + (b a 12 a 22 x 1, b 2 ) + c A symmetrisch 2 x 2 = 1 ( ) 2 (x x1 1a 11 + x 2 a 12, x 1 a 12 + x 2 a 22 ) + b 1 x 1 + b 2 x 2 + c = 1 2 (a 11x 2 1 + 2a 12 x 1 x 2 + x 2 a 22 ) + b 1 x 1 + b 2 x 2 + c x 2 Gradient ( ) fx1 f (x) = f x2 ( ) a11 x = 1 + a 12 x 2 + b 1 a 22 x 1 + a 12 x 1 + b 2 = Ax + b Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 16 / 20

Optimum einer quadratischen, konvexen Funktion (3) Berechne globales Optimum an Optimalstelle x f (x ) = f ( A 1 b) = 1 2 ( A 1 b) AA 1 b b A 1 b + c = 1 2 b A 1 b b A 1 b + c = 1 2 b A 1 b + c Nicola Beume (LS11) Optimalitätsbedingungen 17.10.2007 17 / 20

Gradientenverfahren f (x) zeigt in Richtung des stärksten Anstiegs an der Stelle x f (x) stärkster Abstieg Idee: iterative Annäherung an lokales Optimum durch Verfolgen des neg. Gradienten Schrittweite s R +, Hochzahl bezeichnet Iterationsschritt x (k+1) = x (k) s (k) f (x (k) ) f (x (k) ) = x (k) s (k) d (k) Vektorlänge normiert optimale Schrittweite: wähle s (k) so, dass f (x (k) s (k) d (k) ) = min{f (x (k) s (k) d (k) : s R + } Wie macht man das? Fortsetzung am Montag... Nicola Beume (LS11) Gradientenverfahren 17.10.2007 18 / 20

Zusammenfassung Heute (mindestens) gelernt: Grundlagen der Differenzialrechnung... sind immer noch wie in der Schule Partielle Ableitungen... betrachten nicht abzuleitende Variablen als Konstanten existieren nicht immer sind oft nicht zum Auffinden einer Optimalstelle geeignet Konvexe, quadratische Funktion... sind harmlos haben als lokale Optima nur das globale Optimum lassen sich mit Hilfe partieller Ableitungen lösen Gradientenverfahren... sind Optimierverfahren arbeiten iterativ benötigen eine passend gewählte Schrittweite Nicola Beume (LS11) Zusammenfassung 17.10.2007 19 / 20

Literatur zu Analysis H. Heuser: Lehrbuch der Analysis (Teil 1) 3. Auflage, Teubner, Stuttgart, 1984 (Kapitel 6) H. Heuser: Lehrbuch der Analysis (Teil 2) 9. Auflage, Teubner, Stuttgart, 1995 (Kapitel 20) (ganz viele andere) Nicola Beume (LS11) Literatur 17.10.2007 20 / 20