Mathematical Institute, University of Freiburg, Teaching

Torben Martinussen:
Estimation of direct effect for survival data using the Aalen additive hazards model

Time and place

Friday, 7.5.10, 09:30-10:30, IMBI, Stefan-Meier-Str.26

Abstract

We are interested in estimating the direct effect of an exposure variable X on a survival outcome T. In case of an intermediate variable K and an unobserved confounder U for the effect of K on T standard regression techniques will render a biased estimate of the direct effect of X on T. This problem may be solved with the inclusion of additional information, L, that removes the effect of U on K. However, if L is also affected by X then standard methods are still not appropriate. Marginal structural models have been suggested to tackle this problem but they need estimation of specific weights that may be quite unstable. To overcome this problem, Goetgeluk et al. (JRSSB, 2009) suggested a so-called G-estimation approach in the case of an un-censored response variable. In this talk I show how to generalize their approach to the setting of survival data. I start out by describing the dynamic path analysis approach and point out that it may give wrong answers in case of an un-measured confounder.

Dr. Alexandra Graf:
MODEL SELECTION BASED ON FDR-THRESHOLDING -- OPTIMIZING THE AREA UNDER THE RECEIVER -- OPERATING CHARACTERISTIC CURVE

Time and place

Friday, 11.6.10, 11:15-12:15, Raum 404, Eckerstr. 1

Abstract

In gene expression or proteomic studies large numbers of variables are investigated. We generally can not assume that a few of the investigated variables show large effects. Instead we often hope that there is at least a combination of several variables, which, e.g. allow prediction of the response of an individual patient to a particular therapy. The task of selecting useful variables with rather moderate effects from a very large number of candidates and estimating suitable scores to be used for the prediction of a clinical outcome in future patients is a hard exercise. \n\nWe evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of a clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to combine both goals: prediction and controlled structure estimation. We show that the FDR-threshold which provides the ROC-curve with the largest area under the curve (AUC) varies largely over the different parameter constellations not known in advance. \n\nHence, we investigated a cross validation procedure based on the maximum rank correlation estimator to determine the optimal selection threshold. This procedure (i) allows to choose an appropriate selection criterion, (ii) provides an estimate of the FDR close to the true FDR and (iii) is simple and computationally feasible also for rather moderate to small sample sizes. Low estimates of the cross validated AUC (the estimates generally being positively biased) and large estimates of the cross validated FDR may indicate a lack of sufficiently prognostic variables and/or too small sample sizes. The method is applied to an example dataset.

Talks at the mathematical institute

Seminar über Datenanalyse, Modellbildung und AI

Summer term 2010