Abstractband. Wir schätzen Leben! seit Jahren

Ähnliche Dokumente

Statistics, Data Analysis, and Simulation SS 2015

Darstellung und Anwendung der Assessmentergebnisse

Unit 1. Motivation and Basics of Classical Logic. Fuzzy Logic I 6

Causal Analysis in Population Studies

Support Technologies based on Bi-Modal Network Analysis. H. Ulrich Hoppe. Virtuelles Arbeiten und Lernen in projektartigen Netzwerken

Geostatistics for modeling of soil spatial variability in Adapazari, Turkey

Mitglied der Leibniz-Gemeinschaft

Englisch-Grundwortschatz

Efficient Design Space Exploration for Embedded Systems

Einkommensaufbau mit FFI:

prorm Budget Planning promx GmbH Nordring Nuremberg

Bayesian Networks. Syntax Semantics Parametrized Distributions Inference in Bayesian Networks. Exact Inference. Approximate Inference

Ein universelles Bayes-Design für einarmige Phase II-Studien mit binärem zeitlich erfasstem Endpunkt

FEM Isoparametric Concept

Wie man heute die Liebe fürs Leben findet

Kostenreduktion durch Prävention?

Mock Exam Behavioral Finance

Multicriterial Design Decision Making regarding interdependent Objectives in DfX

Tools in a Clinical Information System Supporting Clinical Trials at a Swiss University Hospital (Clin Trials, published online 12 August 2014)

FIVNAT-CH. Annual report 2002

Titelbild1 ANSYS. Customer Portal LogIn

Lehrstuhl für Allgemeine BWL Strategisches und Internationales Management Prof. Dr. Mike Geppert Carl-Zeiß-Str Jena

Inequality Utilitarian and Capabilities Perspectives (and what they may imply for public health)

FEM Isoparametric Concept

Extended Ordered Paired Comparison Models An Application to the Data from Bundesliga Season 2013/14

Level 2 German, 2015

Unit 4. The Extension Principle. Fuzzy Logic I 123

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL USER GUIDE June 2016

CABLE TESTER. Manual DN-14003

WAS IST DER KOMPARATIV: = The comparative

Security Patterns. Benny Clauss. Sicherheit in der Softwareentwicklung WS 07/08

«Zukunft Bildung Schweiz»

Mensch-Maschine-Interaktion 2 Übung 1

Level 2 German, 2016

Challenges for the future between extern and intern evaluation

Algorithms for graph visualization

Prof. Guillaume Habert

Level 1 German, 2014

ELBA2 ILIAS TOOLS AS SINGLE APPLICATIONS

HIR Method & Tools for Fit Gap analysis

Risk of Suicide after Bariatric Surgery

Attention: Give your answers to problem 1 and problem 2 directly below the questions in the exam question sheet. ,and C = [ ].

Bayesian updating in natural hazard risk assessment

Corporate Digital Learning, How to Get It Right. Learning Café

v+s Output Quelle: Schotter, Microeconomics, , S. 412f

Zurich Open Repository and Archive. Anatomie von Kommunikationsrollen. Methoden zur Identifizierung von Akteursrollen in gerichteten Netzwerken

Die besten Chuck Norris Witze: Alle Fakten über den härtesten Mann der Welt (German Edition)

Call Centers and Low Wage Employment in International Comparison

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

Materialien zu unseren Lehrwerken

Analysis Add-On Data Lineage

Level 2 German, 2013

Prof. Dr. Bryan T. Adey

USBASIC SAFETY IN NUMBERS

Dienstleistungsmanagement Übung 5

Funktion der Mindestreserve im Bezug auf die Schlüsselzinssätze der EZB (German Edition)

Customer-specific software for autonomous driving and driver assistance (ADAS)

FACHKUNDE FüR KAUFLEUTE IM GESUNDHEITSWESEN FROM THIEME GEORG VERLAG

Algorithms & Datastructures Midterm Test 1

Die Bedeutung neurowissenschaftlicher Erkenntnisse für die Werbung (German Edition)

Survival Analysis (Modul: Lebensdaueranalyse)

Climate change and availability of water resources for Lima

Sport Northern Ireland. Talent Workshop Thursday 28th January 2010 Holiday Inn Express, Antrim

Number of Maximal Partial Clones

Security of Pensions

DYNAMISCHE GEOMETRIE

Level 1 German, 2012

Preisliste für The Unscrambler X

Combined financial statements as of December 31, 2017

Interpolation Functions for the Finite Elements

Die Kunst des Programmierens...

There are 10 weeks this summer vacation the weeks beginning: June 23, June 30, July 7, July 14, July 21, Jul 28, Aug 4, Aug 11, Aug 18, Aug 25

LEBEN OHNE REUE: 52 IMPULSE, DIE UNS DARAN ERINNERN, WAS WIRKLICH WICHTIG IST (GERMAN EDITION) BY BRONNIE WARE

Schreiben auf Englisch

Large-Scale Mining and Retrieval of Visual Data in a Multimodal Context

DevCan - neue Wege der Kommunikation von Registerdaten?

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL - USER GUIDE June 2016

FRAGESTUNDE WS 2016/17 QM 2. Dr. Christian Schwarz 1

1. General information Login Home Current applications... 3

Ressourcenmanagement in Netzwerken SS06 Vorl. 12,

Bosch Rexroth - The Drive & Control Company

Wer bin ich - und wenn ja wie viele?: Eine philosophische Reise. Click here if your download doesn"t start automatically

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

Gern beraten wir auch Sie. Sprechen Sie uns an!

Aufnahmeuntersuchung für Koi

Level 1 German, 2016

POST MARKET CLINICAL FOLLOW UP

Critical Chain and Scrum

Ich habe eine Nachricht für Sie

Introduction FEM, 1D-Example

Contents. Interaction Flow / Process Flow. Structure Maps. Reference Zone. Wireframes / Mock-Up

Social Innovation and Transition

Where are we now? The administration building M 3. Voransicht

Johannes Bachmann, Silvia Keilholz

RECHNUNGSWESEN. KOSTENBEWUßTE UND ERGEBNISORIENTIERTE BETRIEBSFüHRUNG. BY MARTIN GERMROTH

Killy Literaturlexikon: Autoren Und Werke Des Deutschsprachigen Kulturraumes 2., Vollstandig Uberarbeitete Auflage (German Edition)

A Classification of Partial Boolean Clones

Martin Luther. Click here if your download doesn"t start automatically

Big Data Analytics. Fifth Munich Data Protection Day, March 23, Dr. Stefan Krätschmer, Data Privacy Officer, Europe, IBM

Transkript:

Wir schätzen Leben! seit Jahren 60. Biometrisches Kolloquium der Deutschen Region der Internationalen Biometrischen Gesellschaft (IBS-DR) 10. - 13. März 2014 in Bremen Abstractband

Inhaltsverzeichnis Teil I Vorträge 3 Teil II Poster 161 Teil III Liste der Vortragenden 183

Teil I Vorträge

Detektion von Nanoobjekten in Graustufenbildern und Bildsequenzen mittels robuster Zeitreihenmethoden zur Strukturbrucherkennung Abbas, Sermad TU Dortmund, Deutschland Abstract. Der PAMONO-Biosensor (Plasmon Assisted Microscopy of Nano-Objects) dient dem indirekten Nachweis von Nanoobjekten, wie zum Beispiel biologischen Viren. Er erzeugt eine Folge von Graustufenbildern der Sensoroberfläche. Ein anhaftendes Objekt führt zu einer Änderung des Graustufenwertes in der Bildfolge ab dem Anhaftungszeitpunkt. Für die Zeitreihen der Graustufenwerte der betro enen Pixelkoordinaten bedeutet dies einen Strukturbruch. Das Ziel dieser Arbeit [1] ist die automatische Detektion der anhaftenden Nanoobjekte. Ein in der Literatur [2] verbreiteter Ansatz basiert auf dem Vergleich der einzelnen Zeitreihen mit einer Musterfunktion. Die identifizierten Kandidatenpixel werden mit Hilfe eines Marching-Squares-Algorithmus zu Segmenten zusammengefasst. Diese dienen als Anhaftungskandidaten. Im Gegensatz dazu werden in dieser Arbeit auf Zeitreihenebene Zweistichprobentests in gleitenden Zeitfenstern durchgeführt, um die Strukturbrüche zu detektieren. Die Segmentierung der resultierenden Kandidatenpixel erfolgt mit Hilfe des geschätzten Strukturbruchzeitpunktes sowie der räumlichen Lage. In einer Simulationsstudie werden verschiedene parametrische und nichtparametrische Zweistichprobentests hinsichtlich ihrer Fähigkeit zur Detektion der Strukturbrüche verglichen. Anschließend folgt ein Vergleich der vorgeschlagenen Detektionsmethode mit einer Referenzmethode, die auf dem Ansatz aus der Literatur basiert. Dabei können mit der hier vorgestellten Methode ähnlich hohe Detektionsraten wie mit der Referenzmethode erreicht werden, wobei der Anhaftungszeitpunkt im Mittel deutlich genauer geschätzt werden kann. Insgesamt stellt die Verwendung von Zweistichprobentests zur Identifikation der Kandidatenpixel einen vielversprechenden neuen Ansatz zur Auswertung der PAMONO-Daten dar. Literaturverzeichnis 1 Abbas, S. (2013). Detektion von Nanoobjekten in Graustufenbildern und Bildsequenzen mittels robuster Zeitreihenmethoden zur Strukturbrucherkennung. Masterarbeit, Technische Universität Dortmund. 2 Timm, C., Libuschewski, P., Siedho, D., Weichert, F., Müller, H. und Marwedel, P. (2011). Improving nanoobject detection in optical biosensor data. Proceedings of the 5th International Symposium on Bio- and Medical Information and Cybernetics (BMIC 2011), 2, S. 236 240. Young Statisticians, HS, Großer Hörsaal, Mittwoch, 08:50 10:10 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 5

Adjusted excess length-of-stay in hospital due to hospital-acquired infections Allignol, Arthur 1 ; Schumacher, Martin 2 ; Harbarth, Stephan 3 ; Beyersmann, Jan 1 1: Institute for Statistics, Ulm University, Germany 2: Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, Germany 3: Infection Control Program, University of Geneva Hospitals and Faculty of Medicine, Geneva, Switzerland Abstract. The occurrence of a hospital-acquired infection (HAI) is a major complication having severe consequences both in terms of mortality and morbidity. HAI prompts a prolonged length of stay (LoS) in hospital, which is one of the main driver for extra costs induced by HAIs. The excess LoS due to HAI is often used in costbenefit studies that weigh the costs of infection control measures such as isolation rooms against the costs raised by HAIs. Estimation of excess LoS is complicated by the fact that the occurrence of HAI is time-dependent. Cox proportional hazards models that include HAI as a timedependent variable could be used, but they do not allow a direct quantification of the extra days spent in hospital following an infection. Within the multistate model framework, Schulgen and Schumacher (1) proposed to quantify the excess LoS by comparing the mean LoS given current HAI status. However, the impact of covariates on the excess LoS can only be studied through stratified analyses or through Cox models for all the transition hazards of the multistate model. A predicted excess LoS can then be computed based on the predicted transition probabilities. As an alternative, we fit a direct regression model to the excess LoS using the flexible pseudo values regression technique (2). Motivated by a recent study on hospital-acquired infection, we investigate the use of pseudo values regression for identifying risk factors that influence the excess LoS. The proposed model is also compared to the predicted excess LoS obtained through Cox models. Literaturverzeichnis 1. Allignol, A., Schumacher, M. and Beyersmann J. (2011). Estimating summary functionals in multistate models with an application to hospital infection data. Computational Statistics, 26(2):181 197. 2. Andersen, P. K. and Perme, M. P. (2010). Pseudo-observations in survival analysis. Statistical methods in medical research, 19(1):71 99. Survival Methods, GW2, B 2.900, Donnerstag, 08:50 10:10 6 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

The quality of research publications past, present and future Altman, Doug University of Oxford, Großbritannien Abstract. In the early part of the last century some enlightened scientists and clinicians began to realise the critical importance of statistics in medical research. There was a rapid move from very little use of statistics to a great deal. Unfortunately that rise was not accompanied by a comparable improvement in statistical thinking. Complaints about misuse of statistics have become ever more common. Many of the concerns relate to misuse of relatively simple methods and failure to understand basic principles, although the ever greater use of complex methods has brought additional problems. On top of those issues, the last 50 years have seen a growing recognition of widespread deficiencies in how research is reported. Major concerns are the selective reporting of research findings, leading to bias, and inadequate reporting of research methods and findings that prevents readers using the information. Well-recognised deficiencies of published research has led to a growing amount of work on reporting guidelines. I will discuss changes over time in the quality and reporting of empirical studies of medical research publications, summarise the current situation, and speculate about how things may change in the coming years. Although the focus is medical research, the problems caused by poor analysis and reporting a ict all areas of science in which empirical data play a major role. Closing Ceremony, HS, Großer Hörsaal, Donnerstag, 15:20 16:40 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 7

Non-parametric multiple contrast tests with covariates Asendorf, Thomas Universitätsmedizin Göttingen, Deutschland Abstract. We present simultaneous rank-based analysis of covariance methods in unbalanced designs with independent observations. The hypotheses are formulated in terms of purely non-parametric treatment e ects, adjusted by covariates. In particular, we derive rank based multiple contrast tests and simultaneous confidence intervals, where the individual test decisions and the simultaneous confidence intervals are compatible, meaning individual test decisions can be made through the corresponding simultaneous confidence intervals. The procedures allow for testing arbitrary, purely non-parametric, multiple linear hypotheses (e.g. many-to-one, all-pairs, changepoint or even average comparisons). The procedures are compared with parametric competitors in an extensive simulation study. A real data set illustrates the application. Nonparametric Statistics, HS, Kleiner Hörsaal, Donnerstag, 13:30 14:50 8 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Making sense of multivariate data Bathke, Arne 1,2 ; Ellis, Amanda 2 ;Burchett,Woodrow 2 ; Harrar, Solomon 2 1: Universität Salzburg 2: University of Kentucky Abstract. We present a nonparametric method for inference between multivariate data samples. Application is demonstrated using the R package npmv. Unlike in classical MANOVA, multivariate normality is not required for the data. Di erent response variables may even be measured on di erent scales (binary, ordinal, quantitative). In addition to permutation tests and F approximations for the overall hypothesis test, we present a multiple testing procedure which identifies significant subsets of response variables and factor levels, while controlling the familywise error rate. Multivariate Methods, HS, Kleiner Hörsaal, Donnerstag, 10:40 12:00 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 9

Global tests for high-dimensional repeated measures under weak distributional assumptions Becker, Benjamin Universität Göttingen, Deutschland Abstract. Uncertainty concerning the distribution and the dependency structure of repeated measures poses a major challenge in proper testing hypotheses for clinical studies. High-dimensionality of the data exacerbates this, as usual robust techniques become unavailable. This talk presents test statistics for multiple samples of repeated measures data that rely only on very weak assumptions: The data do not need to follow a multivariate normal distribution, the sample sizes may be small and may di er among the samples and the covariance matrices may have any structure. In the first step, the test statistics are constructed in a way such that it can be argued that they are distributed approximately in the same way as if the data were normally distributed. This does not involve asymptotics but a finite error bound for an asymptotic result. In the second step, the test statistics distribution under the null hypothesis is calculated in a way that involves parameters, which can be estimated uniformly consistently for every dimensionality of the repeated measures. Regression and Repeated Measurement Analysis in Clinical Trials, GW2, B 3.009, Donnerstag, 13:30 14:50 10 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Dichotomizing the survival endpoint in biomarker screening: a good idea? Becker, Natalia; Benner, Axel DKFZ, Deutschland Abstract. When evaluating potentially prognostic molecular markers from a large list of candidates with respect to time to event outcomes a common approach is to test each of the markers individually using univariate Cox regression models [1,2]. Increasingly often, survival times are dichotomized into poor and good prognosis groups to convert the prediction problem into a classification task [3]. This typically appears in biomedical research when large numbers of biomarkers are tested, as is done in microarray marker screening. We therefore aimed to evaluate the e ect of dichotomization on the result of screening high dimensional molecular data with respect to survival prognosis. Royson et al. [4] discussed in details the impact of dichotomization of continuous predictors in multiple regressions and showed that the data simplification overestimates e ects. To our knowledge, the comparison of dichotomized surrogate outcome and survival end points has not been done so far. To assess the finite-sample performance of screening procedures based on dichotomized survival time data we performed a simulation study. Herein we as well considered reasons for dichotomizing survival times, including violation of the proportional hazards assumption, cure, informative censoring, etc. We compared Cox regression models for the original survival times with approaches based on logistic regression and nearest shrunken centroid classification for dichotomized endpoints. Literaturverzeichnis 1. Tibshirani R. Univariate Shrinkage in the Cox Model for High Dimensional Data. SAGMB 2009; 8: 1-18. 2. Fan J, Feng Y, Wu Y. High-dimensional variable selection for Cox s proportional hazards model. In IMS Collections 2010; 6: 70-86. 3. van t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415: 530-536. 4. Royston et al. (2006) Dichotomizing continuous predictors in multiple regression: a bad idea. Statistics in Medicine 25(1), 127-41. New Measures for the Assessment of Risk and Prognostic Factors, GW2, B 2.900, Dienstag, 09:10 10:10 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 11

Choosing e cient stratifications in logistic two-phase studies based on administrative data Behr, Sigrid; Schill, Walter; Pigeot, Iris Leibniz-Institut für Präventionsforschung und Epidemiologie - BIPS, Deutschland Abstract. A common problem of studies based on secondary data from health insurances is unmeasured confounding. Obtaining additional information from another data source for at least a subsample of the study population resolves this problem. Two-phase designs can be employed for the combined analysis of both data sources. Two-phase designs have been developed for field studies which usually comprise very few covariates in phase-1. In these traditional two-phase studies, the stratification is simply defined by cross-classification of all available phase-1 covariates. Twophase database studies, however, include a multitude of phase-1 covariates. Crossclassification of all available covariates would lead to a tremendously high number of strata and, due to the restricted size of the phase-2 sample, to empty cells. Since twophase methods cannot be applied in this situation, new stratification strategies are needed which account for all relevant phase-1 covariates but do not result in empty cells. As an alternative stratification strategy, we propose to use percentiles of a disease score which is derived from a logistic model applied to the phase-1 data. Furthermore, a design criterion is developed based on the weighted likelihood approach to identify e cient stratifications at the planning stage of the study when only phase-1 information is available. Both the new stratification strategy and the design criterion are applied to an empirical two-phase database study investigating the risk of serious bleedings associated with phenprocoumon exposure. Additionally, the novel approaches are evaluated in a simulation study mimicking the empirical study. It is shown that those stratifications assessed as e cient by the design criterion result in the smallest standard errors for the estimates of most phase-1 covariates in the simulation study. The best stratifications regarding unbiased and e cient parameter estimation are defined by cross-classification of variables used for sampling of the phase-2 data and percentiles of the disease score. Combined Analysis of Multiple Data Sources, GW2, B 2.880, Mittwoch, 09:10 10:10 12 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Berechnung von Konfidenzintervallen für Risikodi erenzen mit Hilfe von MOVER-R Bender, Ralf 1 ; Newcombe, Robert G. 2 1: Ressort Medizinische Biometrie", Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG), Köln, Deutschland 2: Cochrane Institute of Primary Care & Public Health, Cardi University, Cardi, Wales, UK Abstract. In Cochrane Reviews sowie im GRADE-System werden häufig absolute Behandlungse ekte mit Hilfe von Schätzern des relativen Risikos aus einer Meta- Analyse und einem hiervon unabhängig geschätzten Basisrisiko ermittelt. Hierbei wird jedoch die Schätzunsicherheit des Basisrisikos in der Regel vernachlässigt [1]. In dem Beitrag wird gezeigt, dass mit Hilfe des Ansatzes Method of Variance Estimates Recovery (MOVER-R) nach Newcombe [2] Konfidenzintervalle für Risikodi erenzen berechnet werden können, die beide Unsicherheiten berücksichtigen. Die MOVER-R- Methode wird erläutert und es wird ein Excel Spreadsheet vorgestellt, mit der die notwendige Berechnungen durchgeführt werden können [3]. Anhand von Beispielen wird der Einfluss der Berücksichtigung beider Unsicherheitsquellen auf die Konfidenzintervalle für absolute Behandlungse ekte diskutiert. Literaturverzeichnis 1. Spencer, F.A., Iorio, A., You, J., Murad, M.H., Schünemann, H.J., Vandvik, P.O., Crowther, M.A., Pottie, K., Lang, E.S., Meerpohl, J.J., Falck-Ytter, Y., Alonso-Coello, P. & Guyatt, G.H. (2012): Uncertainties in baseline risk estimates and confidence in treatment e ects. BMJ 345, e7401. 2. Newcombe, R.G. (2014): MOVER-R confidence intervals for ratios and products of two independently estimated quantities. Stat. Methods Med. Res. 23 (in press). 3. Newcombe, R.G. & Bender, R. (2014): Implementing GRADE calculating the risk di erence from the baseline risk and the relative risk. Evid. Based Med. 19 (in press). Meta-Analysis, GW2, B 2.880, Dienstag, 13:30 14:50 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 13

Die Epidemiologie in Deutschland. 60+ Blettner, Maria IMBEI, Deutschland Abstract. Epidemiologische Forschung war in Deutschland bis die 80er Jahre des letzten Jahrhunderts vor allem geprägt durch die DHP-Studie (Deutsche-Herz-Kreislauf- Präventionsstudie), die eine groß angelegte epidemiologische Interventionsstudie war. Allerdings erschien schon 1964 ein Artikel im Deutschen Gesundheitswesen on the need for epidemiologic cancer research. In dem Vortrag werde ich einige Anekdoten erzählen, wie sich die Epidemiologie in Deutschland in den letzten 60 Jahren entwickelt hat von der der Gründung einer kleinen Arbeitsgruppe epidemiologische Methoden in der Biometrischen Gesellschaft Anfang der 90er Jahre über den Workshop in Bremen im Jahr 1992 und der Gründung der Deutschen Arbeitsgemeinschaft für Epidemiologie (DAE) im Jahre 1963 unter der Trägerschaft der drei Fachgesellschaften (GMDS, DGSMP, IBS-DR) bis zur großen Initiative der nationalen Kohorte. Einen Schwerpunkt werde ich dabei auch auf die Entwicklung der Ausbildung der Epidemiologen legen. Während es in den 80er Jahren fast nur self-made - Epidemiologen gab, hat später der DAAD ein Programm aufgelegt, das es vielen Wissenschaftlerinnen und Wissenschaftlern erlaubt hat, im Ausland Epidemiologie zu lernen. Erst 1999 konnte ein eigener Master of Epidemiology in Deutschland (Berlin, Bielefeld, München) etabliert werden. Heute sind weit über 100 Personen, die in Deutschland ausgebildet wurden, in diversen Bereichen der Forschung, Politik, Behörden und auch in internationalen Organisationen tätig. Die Geschichte der Epidemiologie der letzten Jahre ist also eine Erfolgsgeschichte, und ich freue mich beim Geburtstag der Biometrischen Gesellschaft diese Geschichte erzählen zu dürfen. IBS 60+, HS, Großer Hörsaal, Mittwoch, 13:30 14:50 14 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Exakte SNP-basierte Berechnung der additiven und dominanten genetischen Streuung von Nachkommen in der Anpaarungsplanung Bonk, Sarah; Teuscher, Friedrich; Reinsch, Norbert Leibniz-Institut für Nutztierforschung, Deutschland Abstract. Bei einer typischen Anpaarungssituation stehen für eine Kuh immer mehrere Bullen zur Verfügung. Die Wahl des besten Anpaarungspartners kann mit Hilfe der Verteilung des interessierenden Merkmals der Nachkommen bestimmt werden. Dazu werden der Erwartungswert und die Streuung des Merkmals (Mendelian-Sampling Varianz) für jede Anpaarung benötigt. In diesem Beitrag wird ein Ansatz zur exakten Berechnung der Mendelian-Sampling Varianz vorgestellt, der auf den Diplotypen der Eltern basiert. Bei diesem Verfahren wird die Korrelationsmatrix zwischen den additiv genetischen Markere ekten aufgestellt, welche zur Berechnung der Varianz benötigt wird. Die Korrelationsmatrix kann im Allgemeinen nicht nur für additiv, sondern auch für dominant genetische E ekte berechnet werden. Dieser Beitrag beschäftigt sich daher unter anderem mit der Berechnung und den wichtigsten Eigenschaften der Korrelationsmatrix. Schließlich werden die Ergebnisse des hier beschriebenen Ansatzes (Berechnung der Varianz mit Hilfe der Korrelationsmatrix) mit einer bereits in der Praxis angewendeten Methode verglichen [1], welche auf der Simulation von Gameten beruht. Literaturverzeichnis 1. Segelke et al. Prediction of expected variation in o spring groups and its application to mating decisions (submitted) Mixed Models and Extensions for Breeding Experiments and Epidemiological Studies, GW2, B 2.900, Mittwoch, 15:20 16:20 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 15

Interpretation of linear regression coe under model miss-specification cients Brannath, Werner; Scharpenberg, Martin Universität Bremen, Deutschland Abstract. Linear regression is an important and frequently used tool in medical and epidemiological research. However, its validity and interpretability relies on strong model assumptions. While robust estimates of the coe cient s covariance matrix extends the validity of hypothesis tests and confidence intervals, a clear and simple interpretation of the regression coe cients is lacking under model miss-specifications. To overcome this deficiency, we suggest a new mathematically rigorous interpretation of the regression coe cients which is independent from specific model assumptions. Instead of asking for the e ect of covariate changes in individuals, we ask for the effects in the population. The idea is to quantify how much the (unconditional) mean of the dependent variable Y can be changed by changing the distribution of the independent variable X. We show that, with a suitable standardization of the distributional changes, the maximum change in the mean of Y is well defined and equals zero if and only if the conditional mean of Y given X is independent of X. Restriction to linear functions for the distributional change in X provides the link to linear regression. It leads to a conservative approximation of the newly defined and generally non-linear measure of association. The conservative linear approximation can then be estimated by linear regression. We show how the new interpretation can be extended to multiple regression with the goal of adjusting for confounding covariates. We illustrate the utility (and limitations) of the new interpretation by examples and simulations, and we point to perspectives for new regression analysis strategies. Literaturverzeichnis 1. Martin Scharpenberg (2012). A population-based approach to analyse the influence of covariates. Diploma Thesis in Mathematics, University of Bremen. New Measures for the Assessment of Risk and Prognostic Factors, GW2, B 2.900, Dienstag, 09:10 10:10 16 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

The case-control study: origins, conception and extensions Breslow, Norman University of Washington, Seattle Abstract. The case-control study had its origins in the work of 19th century epidemiologists such as PCA Louis and John Snow. By the mid 20th century it was well established as a rigorous method for studying the etiology of disease. Statisticians contributed enormously to this success by demonstrating invariance of the odds ratio under case-control sampling and the role of logistic regression in data analysis. Conceptualization of the case-control design as involving sampling from an underlying cohort study, whether real or imagined, paved the way for modern extensions. Stratification of the control sample and development of semi-parametric e cient estimates helped epidemiologists to use more of the available data. Viewing the case-control design from the standpoint of survey sampling provided new insights, including the realization that more general binary response and time-to-event models could be fitted to the sampled data. The popularity of the case-control study has brought with it new challenges. Statisticians have a role to play in preventing abuse of their methods by promoting fundamental principles of statistical good practice. Opening Ceremony, HS, Großer Hörsaal, Dienstag, 10:40 12:00 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 17

Using the whole cohort in the analysis of case-control data: application to the women s health initiative Breslow, Norman 1 ; Amorim, Gustavo 2 ; Pettinger, Mary 3 ; Roussow, Jacques 4 1: University of Washington, Seattle, USA 2: University of Auckland, New Zealand 3: Fred Hutchinson Cancer Research Center, Seattle, USA 4: National Heart, Lung and Blood Institute, Bethesda, MD, USA Abstract. Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub study. This paper reviews several methods designed to increase estimation e ciency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women s Health Initiative, modest but increasing gains in precision of regression coe cients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coe cients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important di erence in one key regression coe cient estimated using the weighted compared with the more e cient methods. Literaturverzeichnis 1. Stat Biosci (2013) 5:232 249 Combined Analysis of Multiple Data Sources, GW2, B 2.880, Mittwoch, 09:10 10:10 18 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Der Wilcoxon-Mann-Whitney Test Die Entwicklung ab 1950 Brunner, Edgar Universitätsmedizin Göttingen, Deutschland Abstract. Dieser Vortrag bringt die Entwicklung und Verallgemeinerung der Idee des Wilcoxon-Mann-Whitney Tests ab 1950. Es werden die Möglichkeiten der schnellen und vereinfachten Berechnung der Permutationsverteilung unter H0 F : F 1 = F 2 gebracht, die Erweiterung auf den Fall von Bindungen sowie die Verallgemeinerung auf mehr als 2 Stichproben einschließlich der Verfahren für multiple Vergleiche und geordnete Alternativen. Auch wird die historische Entwicklung von Verfahren zur Untersuchung der allgemeineren Hypothese H p 0 : p = s F 1 df 2 = 1 berichtet sowie 2 die Herleitung von Konfidenzintervallen für p. Auch die Entwicklung der Idee zur Berechnung der Power, der relativen E zienz und der Stichprobenplanung werden von den 50er Jahren bis in die neueste Zeit verfolgt. Die Erweiterung der Idee des Wilcoxon-Mann-Whitney Tests auf gepaarte Stichproben, allgemeine Repeated Measures Designs, Clusterdaten und faktorielle Versuchsanlagen lässt sich bis in die 50er und 60er Jahre zurück verfolgen. Dabei werden auch Irrwege, Denkblockaden und deren Auflösung in den letzten 20 Jahren betrachtet. Eine interessante Anwendung für diagnostische Studien in der Medizin geht bis in die 70er Jahre zurück, während in der neueren Literatur die Anwendungen und Interpretation des relativen E ektes p in der Medizin diskutiert wird, um einen allgemeineren Behandlungse ekt als den wenig realistischen Verschiebungse ekt zu untersuchen. 100 Jahre Wilcoxon-Test, HS, Großer Hörsaal, Mittwoch, 15:20 16:20 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 19

Adaptive and sequential methods based on the average hazard ratio Brückner, Matthias; Brannath, Werner Universität Bremen, Deutschland Abstract. In clinical trials involving a time-to-event endpoint the hazard ratio is a standard measure of the treatment e ect between two groups. Usage of the hazard ratio requires proportional hazards, i.e. their ratio does not depend on time. In many situations however, this assumption is violated or at least questionable. Kalbfleisch and Prentice [1] proposed the average hazard ratio, as the average of the time-dependent hazard ratio over a fixed time interval. This parameter has a meaningful interpretation in the non-proportional hazards case. In adaptive and/or group-sequential clinical trials, with long-term follow-up, such as overall survival, it is often desirable to base interim decisions also on correlated short-term data, such as response to treatment. Inflation of the type I error may occur in classical group-sequential and adaptive designs, if short-term data of patients, whose primary endpoint has not yet been observed, is used in the interim decisions. We extend the method of [1] to allow the usage of discrete short-term endpoint data to stop the trial or change the sample size, while maintaining control of the type I error rate. We show that the sequentially computed test statistics have the independent increments property. We show how these results can be used in adaptive enrichment designs with subgroup selection based on discrete surrogate information. Literaturverzeichnis 1. Kalbfleisch, J.D. and Prentice, R. L., Estimation of the average hazard ratio. Biometrika 1981, 68(1):105-112. Adaptive Designs with Several Treatments, Endpoints or Subgroups, GW2, B 3.009, Donnerstag, 10:40 12:00 20 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Exploring time-dependency of a gene-signature for breast cancer patients Buchholz, Anika; Sauerbrei, Willi Center for Medical Biometry and Medical Informatics, Medical Center - University of Freiburg, Germany Abstract. Survival studies with microarray data often focus on identifying a set of genes with significant influence on a time-to-event outcome for building a gene expression signature (i.e. predictor). Most of these predictors are derived using the multivariable Cox proportional hazards (PH) model assuming that e ects are constant over time. However, there might be time-varying e ects, i.e. violation of the PH assumption, for the predictor and some of the genes. Ignoring this may lead to false conclusions about their influence. Hence, it is important to investigate for time-varying e ects. We will explore a publicly available gene expression data set with time-to-event outcome from breast cancer patients, for which a strong time-dependence of the identified gene signature has been found [1]. However, for the analysis of the time-varying e ect, the genetic risk is dichotomized, an approach which is known to have severe disadvantages [2]. To investigate the time-dependency in more detail, we will refine the analysis of the gene signature (and associated individual genes) in several steps to use the full information and to come towards a better understanding of the underlying mechanisms. The shape of time-varying e ects will be modelled using fractional polynomials [3]. To improve reporting and transparency, we will illustrate our analyses by a REMARK type profile [4]. Literaturverzeichnis 1. Desmedt et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res 2007, 13:3207-3214. 2. Royston et al. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006, 25:127-41. 3. Sauerbrei et al. A new proposal for multivariable modelling of time-varying e ects in survival data based on fractional polynomial time-transformation. Biom J 2007, 49:453-473. 4. Altman et al. Reporting recommendations for tumor marker prognostic studies (REMARK): Explanation and Elaboration. PLoS Med 2012, 9(5):e1001216. Regression and Repeated Measurement Analysis in Clinical Trials, GW2, B 3.009, Donnerstag, 13:30 14:50 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 21

Erfassung der räumlichen Verfügbarkeit von urbanen Punktcharakteristika zur Erklärung von körperlicher Aktivität bei Kindern: Evaluation von Kerndichte und Nachbarschaft Buck, Christoph 1 ; Kneib, Thomas 2 ; Tkaczick, Tobias 3 ; Konstabel, Kenn 4 ; Pigeot, Iris 1 1: Leibniz-Institut für Präventionsforschung und Epidemiologie - BIPS 2: Georg-August-Universität Göttingen 3: Institut für Geographie, Universität Bremen 4: National Institute for Health Development, Tallinn Abstract. Studien, die den Einfluss der urbanen Umgebung auf das Bewegungsverhalten durch GIS-basierte Messgrößen untersuchen, variieren in der Operationalisierung urbaner Charakteristika wie der Straßenkonnektivität und der Verfügbarkeit von ö entlichem Nahverkehr oder Erholungsräumen [1]. Das Walkability-Konzept bietet ein standardisiertes Konzept zur Erfassung von bewegungsfreundlichen Nachbarschaften [2], nutzt aber zur Erfassung von Punktcharakteristika eine einfache Dichteschätzung. Betrachtet man diese als inhomogenen Poissonprozess N(A) Poi(» A ), kann die mittlere Intensität» A = s (s)ds die Erfassung basierend auf A Netzwerk-abhängigen Nachbarschaften A optimieren [3]. Bisher sind die Wahl einer geeigneten Bandweite des Kerndichteschätzers, die z.b. fest gewählt über Kreuzvalidierung oder adaptiv in Abhängigkeit der Einwohnerdichte bestimmt werden kann [4], und die geeignete Distanz der Nachbarschaft zur Definition des räumlichen Kontextes nicht systematisch untersucht [5]. Im Rahmen eines DFG-Projektes wird der Einfluss der urbanen Umgebung auf das Bewegungsverhalten von Kindern modelliert, wobei zur Optimierung der Methoden beide Parameter, Bandweite und Distanz, spezifiziert werden sollen. Dazu wurden räumliche Variablen über verschiedene Kerndichteschätzer in Nachbarschaften aufsteigender Größe erfasst und separat in einer log-gamma Regression, adjustiert für individuelle Faktoren, eingesetzt. Datengrundlage sind u.a. Akzelerometriemessungen der Basiserhebung der europäischen IDEFICS-Studie von 394 zwei- bis neunjährigen Kindern aus der deutschen Untersuchungsregion Delmenhorst. Geodaten urbaner Charakteristika wurden aus amtlichen und Open Source Datenbanken zusammengetragen [3]. Anhand der Modellanpassung (AIC) ließ sich somit eine geeignete Kerndichteschätzung und die Distanz für den räumlichen Kontext gleichzeitig bestimmen, wobei die Bandweite einen geringeren Einfluss auf die Modellanpassung hatte als die Größe des räumlichen Kontexts. 22 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Literaturverzeichnis 1. Brownson et al. (2009). American Journal of Preventive Medicine, 36 (4S): S99- S123. 2. Frank et al. (2005). American Journal of Preventive Medicine, 28 (2S2): 117-125. 3. Buck et al. (2011). Health & Place, 17 (6): 1191-1201. 4. Carlos et al. (2010). International Journal of Health Geographics, 9: 39-46. 5. Kwan et al. (2012). Annals of the Association of American Geographers, 102(5): 958-968. Spatio-Temporal Methods, HS, Großer Hörsaal, Dienstag, 13:30 14:50 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 23

Just more of the same? Statistical challenges in the Million Women Study and other large-scale cohorts Cairns, Benjamin J. University of Oxford, United Kingdom Abstract. In recent decades, large-scale cohort studies have made it possible to obtain reliable estimates of associations between common exposures and diseases. Large-scale might mean di erent things in di erent contexts, but such studies have in common a greater statistical power, and a greater scope to investigate interactions and potential causal mechanisms, to re-sample for cohort enrichment or for nested sub-studies, and to generate a wealth of new data on rarer exposures and outcomes. However, With great power comes great responsibility. For example, a statistic that is biased due to confounding, measurement error, multiple testing, etc., remains biased when it is more precisely estimated. Rightly or not, the importance of bias seems amplified when it cannot hide behind uncertainty. In order to realise the full potential of large-scale cohort studies, investigators are obligated to give greater attention to assessing and countering these biases. Other challenges in cohort studies span the range of study design, analysis, and interpretation; in other words, statistics, in its broadest sense. Large-scale cohorts usually fail to be representative, for example by recruiting too few individuals with lower socioeconomic status. But a more important question than whether the study is representative, is whether study design and analysis can ensure that study results can be generalised to the wider population. Large-scale cohorts have also raised new questions about how best to discuss new findings, such as by presenting estimates as floating absolute risks, rates or cumulative risks, and putting results in context by meta-analysis when prior studies are much smaller. I will discuss these and other challenges in the context of the Million Women Study, a cohort study of 1.3 million UK women, and with reference to other large-scale European cohort studies such as the EPIC cohort, and also UK Biobank, which is open to bona fide researchers worldwide. Analysis of Epidemiological Mega Studies, HS, Kleiner Hörsaal, Dienstag, 17:10 18:30 24 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Pathogenicity prediction via machine learning Deneke, Carlus; Renard, Bernhard Robert Koch Institut, Deutschland Abstract. In recent years, the number of sequenced and annotated organism has continuously increased and a large variety of pathogenic organisms have been collected. It has become a major goal of virologists and bacteriologists to pinpoint those sequences that ultimately induce pathogenicity. Since information of large data sets needs to be processed, methods from bioinformatics and statistical learning are in strong demand to support this process. This contribution reports on the application of machine learning for the classification of virulence-related protein sequences in bacteria. In a first step, we define positive and negative data sets from the Virulence factor data base and annotated entries in Uniprot, respectively. From these data, we generate a variety of features based on protein sequences. These features range from simple count statistics of the occurrence of amino acids to more indirectly inferred information related to the protein structure. We implement various supervised learning strategies and evaluate their respective advantages. Additionally, we experiment with a semi-supervised learning strategy to be able to handle insu ciently annotated protein data which commonly occurs, especially with regard to newly occurring pathogens e.g. in zoonosis. We can furthermore discern the most relevant features via feature importance which in turn aids the biological interpretation of the findings. In conclusion, the approach presented here describes a powerful method to discriminate previously unknown protein sequences in respect of their potential to a ect the virulence of an organism. Resampling and Machine Learning, GW2, B 2.880, Donnerstag, 10:40 12:00 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 25

The multistate model as nested competing risks experiments with an application to hematopoietic cell transplantation data Di Termini, Susanna 1 ; Schmoor, Claudia 2 ; Schumacher, Martin 1 ; Beyersmann, Jan 3 1: Institut für Medizinische Biometrie und Medizinische Informatik Universitätsklinikum Freiburg; 2: Universitätsklinikum Freiburg Studienzentrum; 3: Institut fuer Statistik, Universität Ulm Abstract. We consider the situation of a randomized study on hematopoietic cell transplantation (HSCT) in leukemia patients. After HSCT patients may experience di erent complications either because they contract acute or chronic graft versus host diseases (GVHD) or because they have to undergo immunosuppressive therapy (IST) [1]. This therapy is performed to prevent the body from rejecting the transplant and represents an indicator for graft versus host disease burden in HSCT studies. Since patients may repeatedly switch back and forth between the states IST and no IST until death or censoring, we consider a multistate structure. Mimicking the study data we simulate the multistate pattern through a series of nested competing risk experiments. This procedure, in analogy to the generation of competing risk data, generates event time and event type by using cause specific hazard functions as building blocks. The algorithm is based on the following key-ideas: the time until moving out of the individual s initial state is determined by the all-cause specific hazards; the state entered at the transition time is determined by a binomial experiment and the type of event by the relative magnitude of the cause specific functions. Further competing risk experiments are carried out until reaching an absorbing state. We demonstrate how the algorithm generates data as in our motivating study. We also use this simulation approach for studying resampling based methods of inference for complex outcome probabilities such as the probability to be alive and free of immunosuppressive treatment, which is neither estimable by Kaplan-Meier or competing risks methodology. Literaturverzeichnis 1. Schmoor, C., Schumacher, M., Finke, J., Beyersmann, J. Competing risks and multistate models. Clinical Cancer Research 2013, 19:12-21 Evaluation of Surrogate Endpoints and Multistate Models in Clinical Trials, HS, Großer Hörsaal, Dienstag, 15:20 16:40 26 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Randomized p-values for multiple testing of composite null hypotheses Dickhaus, Thorsten Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Deutschland Abstract. We are considered with the problem of m simultaneous statistical test problems with composite null hypotheses. Usually, marginal p-values are computed under least favorable parameter configurations (LFCs), thus being over-conservative under non-lfcs. Our proposed randomized p-value leads to a tighter exhaustion of the marginal (local) significance level. In turn, it is stochastically larger than the LFCbased p-value under alternatives. While these distributional properties are typically nonsensical for m = 1, the exhaustion of the local significance level is extremely helpful for cases with m>1 in connection with data-adaptive multiple tests as we will demonstrate by considering multiple one-sided tests for Gaussian means. The presentation is based on [1]. Literaturverzeichnis 1. Dickhaus, T. (2013). Randomized p-values for multiple testing of composite null hypotheses. Journal of Statistical Planning and Inference, Vol. 143, No. 11 (2013), 1968-1979. Multiple Testing with Dependent or Non-Uniform p-values, GW2, B 3.009, Mittwoch, 15:20 16:20 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 27

On the statistical comparison of diagnostic tests Dinev, Todor Universität Trier, Deutschland Abstract. A (dichotomous) diagnostic test is any procedure whose final result is a classification of objects into two states (e.g., presence/absence of a disease for a patient, of a defect for a product,... ). After a short presentation of some fundamental notions required for the comparison of diagnostic tests we formulate a (generally nonidentifiable) statistical model for the investigation of a finite number of diagnostic tests, without imposing any conditional independence assumption. As a main result, extending previous work by Mattner et al. (2012, 2013), we show how confidence bounds for certain properties of a pair of diagnostic tests can be derived from confidence bounds in usual, identifiable multinomial models. The case of an arbitrary finite number of diagnostic tests leads to the problem of determining linear images of certain semialgebraic sets. Literaturverzeichnis 1. Mattner, F., Winterfeld, I., and Mattner, L. (2012): Diagnosing toxigenic Clostridium di cile: New confidence bounds show culturing increases sensitivity of the toxin A/B enzyme immunoassay and refute gold standards. Scand. J. Infect. Dis., 44, 578 585. 2. Mattner, L. and Mattner, F. (2013): Confidence bounds for the sensitivity lack of a less specific diagnostic test, without gold standard. Metrika, 76, 239 263. Latent Variable Models, GW2, B 2.880, Donnerstag, 13:30 14:50 28 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Graphical tools for investigating variable selection instability caused by correlated variables Doerken, Sam 1 ; De Bin, Riccardo 2 ; Boulesteix, Anne-Laure 2 ; Sauerbrei, Willi 1 1: Department of Medical Biometry and Statistics, University Medical Center Freiburg 2: Department of Medical Informatics, Biometry and Epidemiology, University of Munich Abstract. If a number of candidate variables are available, variable selection is a key task aiming to identify those candidates which influence the outcome of interest. A desirable property for a selection procedure is the stability of the final model, in the sense that the variables included are the same when new data is acquired. In order to investigate this property and to provide insight into the e ect of specific variables, various data-dependent methods have been proposed, often based on resampling procedures, such as bootstrapping [1]. The main idea of these approaches is to include into the final model those variables which are most often selected when applying a variable selection procedure to a large number of bootstrap samples. A bootstrap-based technique is used also by Murray, Heritier & Müller [2] for visualizing the stability of model selection procedures through novel graphical tools. Here we take advantage of and extend these graphical tools in order to investigate the e ect of correlated variables in the model building process. With particular regard to stability, it may happen that misleading results are obtained in this situation due to the fact that the variable selection procedure may alternatively identify as relevant only one of the correlated variables in the di erent bootstrap samples. This may lead to a smaller frequency of selection for those models which include at least one of these correlated variables. Using a benchmark publicly available dataset, we illustrate that suitable graphical tools provide further insight into the model building process and its stability. Literaturverzeichnis 1. Sauerbrei W. & Schumacher M. (1992). A bootstrap resampling procedure for model building: application to the Cox regression model. Statistics in Medicine, 11, 2093-2109. 2. Murray K., Heritier S. & Müller S. (2013). Graphical tools for model selection in generalized linear models. Statistics in Medicine, 32, 4438-4451. Statistical Modeling, GW2, B 2.880, Dienstag, 17:10 18:30 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 29

Biometrie in der klinischen Forschung in Deutschland Edler, Lutz Abt Biostatistik, Deutsches Krebsforschungszentrum, Heidelberg, Deutschland Abstract. Statistik und Medizin haben in früheren Jahrhunderten eher sporadisch zusammengefunden. Erst nach den großen Kriegen des vergangenen Jahrhunderts und e zient erst nach Beginn der 3. Industriellen Revolution, unter der elektronische Datenverarbeitung verfügbar wurde, haben sie einen für beide Seiten wissenschaftlich fruchtbaren Bund geschlossen. Im Westen Deutschlands begann dies nach den Reformdiskussionen in den 1960er Jahren und unterstütz durch die damaligen Empfehlungen des Wissenschaftsrats, in denen die Unentbehrlichkeit der medizinischen Statistik für die medizinische Forschung niedergeschrieben wurde. Nach Diskussionen in den biometrisch orientierten Fachgesellschaften und vor allem nach Verabschiedung eines neuen Arzneimittelgesetzes (2. AMG) entwickelte sich ungefähr ab 1976 der Zweig der Biometrie für klinische Forschung und klinische Studien sowohl an ö entlichen wissenschaftlichen Einrichtungen als auch in der forschenden Arzneimittelindustrie, in dessen Folge sich dann die Biometrie in der klinischen Forschung auch in der Deutschen Region (DR) der Int. Biometrischen Gesellschaft etablierte. In weniger als einem Jahrzehnt bildeten für klinische Projekte arbeitende Biometriker/innen die stärkste Fraktion unter den Mitgliedern der DR im Westen und entsprechend hoch war der Anteil von Beiträgen aus diesem Zweig bei Biometrischen Colloquien. In dem Beitrag wird die Entwicklung der Biometrie in der klinischen Forschung und der biometrischen Methodik für klinische Studien unter besonderer Berücksichtigung der Rolle der DR und ihrer Biometriker/innen beschrieben. Es werden die wichtigsten methodischen Themen angesprochen und ihre Wandlung in einer Epoche von fast 40 Jahren an Hand der Programme der Kolloquien und einer Auswahl von Publikationen aufgezeigt. Es wird versucht den zeitlichen Verlauf mit relevanten klinischen und statistischen Faktoren in Verbindung zu bringen, z.b. mit der Computational Statistik, der Theorie und Methodik Multipler Vergleiche und der Notwendigkeiten bei Hochdimensionalen Daten in Folge der Entwicklung der Molekularbiologie. Schließlich wird diese Entwicklung mit derjenigen des Biometrical Journals korreliert, für welches die DR mit Wiley-VCH (Weinheim) zusammenarbeitet. IBS 60+, HS, Großer Hörsaal, Mittwoch, 13:30 14:50 30 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen

Zipf s law in RNA sequencing Eilers, Paul Erasmus University Medical Center, Niederlande Abstract. Zipf discovered the inverse power law when studying the frequencies of words in written text. The largest frequency is approximately two times the second largest, three times the third largest, and so on. Zipf s distribution has been observed in many other areas of science and technology. In RNA sequencing it occurs too. RNA sequencing measures the expression of a gene by counting RNA fragments that correspond to that gene. In modern experiments the total number of fragments (commonly call reads ) for all genes can be many millions. Nevertheless, for a large fraction of genes the counts of reads are very small: some 30 to 50 per cent are below 10. The frequency distribution of genes with 1, 2, 3, and so on, counts follows Zipf s law remarkably accurately. A refined model assumes an inverse power distribution for the expected counts and combines that with the Poisson distribution to model observed counts. This way zero counts can also be incorporated in the model. I present the model and illustrate its use on experimental data. Statistical Applications in Genetics and Molecular Biology, GW2, B 3.009, Dienstag, 08:50 10:10 60. Biometrisches Kolloquium, 10. 13. März 2014, Bremen 31