Statistics, Data Analysis, and Simulation SS 2015

Ähnliche Dokumente
Statistics, Data Analysis, and Simulation SS 2017

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2017

Stochastic Processes SS 2010 Prof. Anton Wakolbinger. Klausur am 16. Juli 2010

Finite Difference Method (FDM)

FEM Isoparametric Concept

Bayesian Networks. Syntax Semantics Parametrized Distributions Inference in Bayesian Networks. Exact Inference. Approximate Inference

Unit 4. The Extension Principle. Fuzzy Logic I 123

FRAGESTUNDE WS 2016/17 QM 2. Dr. Christian Schwarz 1

Final Exam. Friday June 4, 2008, 12:30, Magnus-HS

FEM Isoparametric Concept

Introduction FEM, 1D-Example

Computer in der Wissenschaft

Introduction FEM, 1D-Example

Priorities (time independent and time dependent) Different service times of different classes at Type-1 nodes -

Mitglied der Leibniz-Gemeinschaft

A Classification of Partial Boolean Clones

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL USER GUIDE June 2016

Magic Figures. We note that in the example magic square the numbers 1 9 are used. All three rows (columns) have equal sum, called the magic number.

Willkommen zur Vorlesung Statistik (Master)

Statistik, Datenanalyse und Simulation

Interpolation Functions for the Finite Elements

Unit 1. Motivation and Basics of Classical Logic. Fuzzy Logic I 6

Bayesian updating in natural hazard risk assessment

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL - USER GUIDE June 2016

Number of Maximal Partial Clones

Attention: Give your answers to problem 1 and problem 2 directly below the questions in the exam question sheet. ,and C = [ ].

Teil XI. Hypothesentests für zwei Stichproben. Woche 9: Hypothesentests für zwei Stichproben. Lernziele. Beispiel: Monoaminooxidase und Schizophrenie

Wie man heute die Liebe fürs Leben findet

Teil 2.2: Lernen formaler Sprachen: Hypothesenräume

Redress Procedure in Horizont 2020

Ein universelles Bayes-Design für einarmige Phase II-Studien mit binärem zeitlich erfasstem Endpunkt

SAMPLE EXAMINATION BOOKLET

Taxation in Austria - Keypoints. CONFIDA Klagenfurt Steuerberatungsgesellschaft m.b.h

Chi-Quadrat-Verteilung

Weather forecast in Accra

Causal Analysis in Population Studies

Analysis Add-On Data Lineage

Word-CRM-Upload-Button. User manual

Syntax. Ausgabe *Ü12. *1. corr it25 with alter li_re kontakt.

Primärspannungsmessungen mit der. CSIRO Triaxialzelle

Can I use an older device with a new GSD file? It is always the best to use the latest GSD file since this is downward compatible to older versions.

GERMAN LANGUAGE Tania Hinderberger-Burton, Ph.D American University

Teil VIII Hypothesentests für zwei Stichproben

Supplementary material for Who never tells a lie? The following material is provided below, in the following order:

Ein Stern in dunkler Nacht Die schoensten Weihnachtsgeschichten. Click here if your download doesn"t start automatically

das Kleingedruckte...

Übungsblatt 6. Analysis 1, HS14

Statistik. Sommersemester Prof. Dr. Stefan Etschberger Hochschule Augsburg

v+s Output Quelle: Schotter, Microeconomics, , S. 412f

Accelerating Information Technology Innovation

Fakultät für Informatik Übung zu Kognitive Systeme Sommersemester Lösungsblatt 3 Maschinelles Lernen und Klassifikation

Martin Luther. Click here if your download doesn"t start automatically

Context-adaptation based on Ontologies and Spreading Activation

Einige Grundbegriffe der Statistik

Statistische Methoden in den Umweltwissenschaften

LS Kopplung. = a ij l i l j. W li l j. = b ij s i s j. = c ii l i s i. W li s j J = L + S. L = l i L = L(L + 1) J = J(J + 1) S = s i S = S(S + 1)

KURZANLEITUNG. Firmware-Upgrade: Wie geht das eigentlich?

Exercise (Part II) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Statistik. Sommersemester Prof. Dr. Stefan Etschberger Hochschule Augsburg. für Betriebswirtschaft und internationales Management

Tube Analyzer LogViewer 2.3

DAS ZUFRIEDENE GEHIRN: FREI VON DEPRESSIONEN, TRAUMATA, ADHS, SUCHT UND ANGST. MIT DER BRAIN-STATE-TECHNOLOGIE DAS LEBEN AUSBALANCIEREN (GE

Order Ansicht Inhalt

Die besten Chuck Norris Witze: Alle Fakten über den härtesten Mann der Welt (German Edition)

Statistische Tests (Signifikanztests)

Automatentheorie und formale Sprachen endliche Automaten

1D-Example - Finite Difference Method (FDM)

4.1 Stichproben, Verteilungen und Schätzwerte. N(t) = N 0 e λt, (4.1)

Zufallsvariablen. Diskret. Stetig. Verteilung der Stichprobenkennzahlen. Binomial Hypergeometrisch Poisson. Normal Lognormal Exponential

Musterlösung 3. D-MATH Algebra I HS 2015 Prof. Richard Pink. Faktorielle Ringe, Grösster gemeinsamer Teiler, Ideale, Faktorringe

Correlational analysis

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

Big Data Analytics. Fifth Munich Data Protection Day, March 23, Dr. Stefan Krätschmer, Data Privacy Officer, Europe, IBM

Mock Exam Behavioral Finance

Vorlesung: Statistik II für Wirtschaftswissenschaft

Level 2 German, 2015

Transkript:

Mainz, June 11, 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler <distler@uni-mainz.de> Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 1 / 23

Statistical hypothesis testing So far: statistical analysis of a data sample in order to extract unknown parameters. Now we have prior assumptions about the value of those parameters a hypothesis We need to check those hypotheses: the procedure is called statistical test Caveat: a test can never prove a hypothesis to be true. However, one can reject a hypothesis because of observations. The degree of statistical compatibility will be quantified using confidence limits. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 2 / 23

The testing process There is an initial research hypothesis of which the truth is unknown. The first step is to state the relevant null and alternative hypotheses. This is important as mis-stating the hypotheses will muddy the rest of the process. The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid. Decide which test is appropriate, and state the relevant test statistic T. Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For example the test statistic might follow a Student s t distribution or a normal distribution. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 3 / 23

The testing process Select a significance level (α), a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%. The distribution of the test statistic under the null hypothesis partitions the possible values of T into those for which the null hypothesis is rejected the so-called critical region and those for which it is not. The probability of the critical region is α. Compute from the observations the observed value t obs of the test statistic T. Decide to either reject the null hypothesis in favor of the alternative or not reject it. The decision rule is to reject the null hypothesis H0 if the observed value t obs is in the critical region, and to accept or fail to reject the hypothesis otherwise. http://en.wikipedia.org/wiki/statistical_hypothesis_testing Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 4 / 23

The testing process An alternative process is commonly used: 1 Compute from the observations the observed value t obs of the test statistic T. 2 Calculate the p-value. This is the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed. 3 Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than the significance level (the selected probability) threshold. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 5 / 23

clairvoyance example Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 6 / 23

Chi-square distribution If x 1, x 2,..., x n are independend random variables distributed according to the standard Gaussian distribution with mean 0 and variance 1, then the sum u = χ 2 = n i=1 x 2 i ist distributed according to a χ 2 distribution f n (u) = f n (χ 2 ) where n is called the number of degrees of freedom. f n (u) = ( 1 u ) n/2 1 2 2 e u/2 Γ(n/2) The χ 2 distribution has a maximum at (n 2). The mean is found to be n and the variance is 2n. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 7 / 23

Chi-square distribution 0.3 0.25 0.2 pdf(2,x) pdf(3,x) pdf(4,x) pdf(5,x) pdf(6,x) pdf(7,x) pdf(8,x) pdf(9,x) 0.15 0.1 0.05 0 0 2 4 6 8 10 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 8 / 23

Chi-square cumulative distribution function The probability for χ 2 n to take on a value in the interval [0, x]. 1 0.8 cdf(2,x) cdf(3,x) cdf(4,x) cdf(5,x) cdf(6,x) cdf(7,x) cdf(8,x) cdf(9,x) 0.6 0.4 0.2 0 0 2 4 6 8 10 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 9 / 23

Chi-square distribution with 5 d.o.f. 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 95% c.l. [0.831... 12.83] 0 0 2 4 6 8 10 12 14 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 10 / 23

Student s t-test A t-test is any statistical hypothesis test in which the test statistic follows a Student s t distribution if the null hypothesis is supported. A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 11 / 23

t-verteilung Die t-verteilung tritt auf bei Tests der statistischen Verträglichkeit eines Stichproben-Mittelwertes x mit einem vorgegebenen Mittelwert µ, oder der statistischen Verträglichkeit zweier Stichproben-Mittelwerte. Die Wahrscheinlichkeitsdichte der t-verteilung ist gegeben durch f n (t) = 1 ( ) (n+1)/2 Γ((n + 1)/2) 1 + t2 nπ Γ(n/2) n Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 12 / 23

t-verteilung Die Studentschen t-verteilungen f (t) (links) im Vergleich zur standardisierten Gauß-Verteilung (gestrichelt) sowie die integrierten Studentschen t-verteilungen t f (x)dx (rechts). Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 13 / 23

t-verteilung Quantile der t-verteilung, P = t f n(x)dx. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 14 / 23

F -Verteilung Gegeben sind n 1 Stichprobenwerte einer Zufallsvariablen x und n 2 Stichprobenwerte derselben Zufallsvariablen. Die beste Schätzung der Varianzen aus beiden Datenkollektionen seien s1 2 und s2 2. Die Zufallszahl F = s2 1 s 2 2 folgt dann einer F-Verteilung mit (n 1, n 2 ) Freiheitsgraden. Es ist Konvention, dass F immer größer als eins ist. Die Wahrscheinlichkeitsdichte von F ist gegeben durch f (F) = ( n1 n 2 ) n1 /2 ( Γ((n 1 + n 2 )/2) Γ(n 1 /2)Γ(n 2 /2) F (n 1 2)/2 1 + n ) (n1 +n 2 )/2 1 F n 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 15 / 23

Quantile der F -Verteilung, Konfidenz = 0.68 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 16 / 23

Quantile der F -Verteilung, Konfidenz = 0.90 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 17 / 23

Quantile der F -Verteilung, Konfidenz = 0.95 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 18 / 23

Quantile der F -Verteilung, Konfidenz = 0.99 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 19 / 23

5.3 Kolmogorov-Smirnov-Test Dieser Test reagiert empfindlich auf Unterschiede in der globalen Form oder in Tendenzen von Verteilungen. Die theoretische Wahrscheinlichkeitsdichte f (x) und ihre Verteilungsfunktion F(x) = x f (x )dx sei gegeben. Die x i werden nach ihrer Größe geordnet und die kumulative Größe gebildet: F n = Anzahl der x i-werte x n Die Testgröße ist t = n max F n (x) F (x) Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 20 / 23

Kolmogorov-Smirnov-Test Die Wahrscheinlichkeit P, einen Wert t 0 für die Testgröße t zu erhalten, ist P = 1 2 ( 1) k 1 e 2k 2 t0 2 k=1 Werte für den praktischen Gebrauch: P 1% 5% 50% 68% 95% 99% 99.9% t 0 0.44 0.50 0.83 0.96 1.36 1.62 1.95 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 21 / 23

Kolmogorov-Smirnov-Test Beispiel: Die Daten 7, -1, 8, 5, 6 sollen einer Normalverteilung mit µ = 5 und σ = 2 entnommen worden sein. Für die Testgröße ergibt sich t = 5 0.3 = 0.67. 1 0.8 Verteilungsfunktion F(x) 0.6 0.4 0.2 0 0.2 0.4 2 0 2 4 6 8 10 12 Zufallsvariable x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 22 / 23

Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 23 / 23