Seminar on Statistics and Data Science

This seminar series is organized by the research group in statistics and features talks on advances in methods of data analysis, statistical theory, and their applications. The speakers are external guests as well as researchers from other groups at TUM. All talks in the seminar series are listed in the Munich Mathematical Calendar.

The seminar takes place in room 8101.02.110, if not announced otherwise. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Upcoming talks

07.11.2024 14:00 Mats Julius Stensrud (Ecole Polytechnique Fédérale de Lausanne): On optimal treatment regimes assisted by algorithms

Decision makers desire to implement decision rules that, when applied to individuals in the population of interest, yield the best possible outcomes. For example, the current focus on precision medicine reflects the search for individualized treatment decisions, adapted to a patient's characteristics. In this presentation, I will consider how to formulate, choose and estimate effects that guide individualized treatment decisions. In particular, I will introduce a class of regimes that are guaranteed to outperform conventional optimal regimes in settings with unmeasured confounding. I will further consider how to identify or bound these "superoptimal" regimes and their values. The performance of the superoptimal regimes will be illustrated in two examples from medicine and economics.
Source

07.11.2024 15:30 Nan Ke (Google Deepmind): Scaling Causal Inference with Deep Learning and Foundation Models

A fundamental challenge in causal induction is inferring the underlying graph structure from observational and interventional data. While traditional algorithms rely on candidate graph generation and score-based evaluation, my research takes a different approach. I developed neural causal models that leverage the flexibility and scalability of neural networks to infer causal relationships, enabling more expressive mechanisms and facilitating analysis of larger systems. Building on this foundation, I further explored causal foundation models, inspired by the success of large language models. These models are trained on massive datasets of diverse causal graphs, learning to predict causal structures from both observational and interventional data. This "black box" approach achieves remarkable performance and scalability, significantly surpassing traditional methods. This model exhibits robust generalization to new synthetic graphs, resilience under train-test distribution shifts, and achieves state-of-the-art performance on naturalistic graphs with low sample complexity. We then leverage LLMs and their metacognitive processes to causally organize skills. By labeling problems and clustering them into interpretable categories, LLMs gain the ability to categorize skills, which acts as a causal variable enabling skill-based prompting and enhancing mathematical reasoning. This integrated perspective, combining causal induction in graph structures with emergent skills in LLMs, advances our understanding of how skills function as causal variables. It offers a structured pathway to unlock complex reasoning capabilities in AI, paving the way from simple word prediction to sophisticated causal reasoning in LLMs.
Source

13.11.2024 13:15 Tom Claassen (Radboud University Nijmegen, Netherlands): Anytime-anywhere FCI+ : a true ‘anytime’ algorithm for causal discovery with high-density regions

Applying causal discovery algorithms to real-world problems can be challenging, especially when unobserved confounders may be present. In theory, constraint-based approaches like FCI are able to handle this, but implicitly rely on sparse networks to complete within a reasonable amount of time. However, in practice many relevant systems are anything but sparse. For example, in protein-protein interaction networks, there are often high-density nodes (‘hub nodes’) representing key proteins that are central to many processes in the cell. Other systems, like electric power grids and social networks, can exhibit high-clustering tendencies, leading to so-called small-world networks. In such cases, even basic constrained based algorithms like PC can get stuck in the high-density regions, failing to produce any useful output. Existing approaches to deal with this, like Anytime-FCI, were designed to interrupt the search at any stage and then produce a sound causal model output a.s.a.p., but unfortunately can require even more time to complete than just letting FCI run by itself. In this talk I will present a new approach to causal discovery in graphs with high-density regions. Based on a radically new search strategy and modified orientation rules, it builds up the causal graph on the fly, updating the model after each validated edge removal, while remaining sound throughout. It exploits the intermediate causal model to efficiently reduce number and size of the conditional independence tests, and automatically prioritizes ‘low-hanging fruit', leaving difficult (high-density) regions until last. The resulting ‘anytime-anywhere FCI+’ is a true anytime algorithm that is not only faster than its traditional counterparts, but also more flexible, and can easily be adapted to handle arbitrary edge removals as well, opening up new possibilities for e.g. targeted cause-effect recovery in large graphs.
Source

20.11.2024 12:15 Dmitrii Pavlov (TU Dresden): t.b.a.

t.b.a.
Source

27.11.2024 12:15 Siegfried Hörmann (Graz University of Technology): Measuring dependence between a scalar response and a functional covariate

We extend the scope of a recently introduced dependence coefficient between a scalar response Y and a multivariate covariate X to the case where X takes values in a general metric space. Particular attention is paid to the case where X is a curve. While on the population level, this extension is straight forward, the asymptotic behavior of the estimator we consider is delicate. It crucially depends on the nearest neighbor structure of the infinite-dimensional covariate sample, where deterministic bounds on the degrees of the nearest neighbor graphs available in multivariate settings do no longer exist. The main contribution of this paper is to give some insight into this matter and to advise a way how to overcome the problem for our purposes. As an important application of our results, we consider an independence test.
Source

04.12.2024 12:15 Heather Battey (Imperial College London): t.b.a.

t.b.a.
Source

08.01.2025 12:15 Hannah Laus (TUM) : Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. In this talk we discuss a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. In this talk we will address this issue and derive a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, this analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings, discussed in this talk, bridge the gap between established theory and the practical applicability of such debiased methods. This talk is based on joint work with Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer and Holger Rauhut.
Source

15.01.2025 12:15 David Huk (University of Warwick, Coventry, UK): t.b.a.

t.b.a.
Source

22.01.2025 12:15 Ingrid van Keilegom (KU Leuven, BE): Semiparametric estimation of the survival function under dependent censoring

This paper proposes a novel estimator of the survival function under dependent random right censoring, a situation frequently encountered in survival analysis. We model the relation between the survival time T and the censoring C by using a parametric copula, whose association parameter is not supposed to be known. Moreover, the survival time distribution is left unspecified, while the censoring time distribution is modeled parametrically. We develop sufficient conditions under which our model for (T,C) is identifiable, and propose an estimation procedure for the distribution of the survival time T of interest. Our model and estimation procedure build further on the work on the copula-graphic estimator proposed by Zheng and Klein (1995) and Rivest and Wells (2001), which has the drawback of requiring the association parameter of the copula to be known, and on the recent work by Czado and Van Keilegom (2023), who suppose that both marginal distributions are parametric whereas we allow one margin to be unspecified. Our estimator is based on a pseudo-likelihood approach and maintains low computational complexity. The asymptotic normality of the proposed estimator is shown. Additionally, we discuss an extension to include a cure fraction, addressing both identifiability and estimation issues. The practical performance of our method is validated through extensive simulation studies and an application to a breast cancer data set.
Source

Previous talks

within the last 180 days

23.10.2024 12:15 Ernst C. Wit (Università della Svizzera italiana, Lugano): Causal regularization for risk minimization.

Recently, the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data has received more attention. We propose a sequence of causal-like models from in-sample data that provide out-of-sample risk guarantees when predicting a target variable from a set of covariates. Whereas ordinary least squares provides the best in-sample risk with limited out-of-sample guarantees, causal models have the best out-of-sample guarantees by sacrificing in-sample risk performance. We introduce causal regularization by defining a trade-off between these properties. As the regularization increases, causal regularization provides estimators whose risk is more stable at the cost of increasing their overall in-sample risk. The increased risk stability is shown to result in out-of-sample risk guarantees. We provide finite sample risk bounds for all models and prove the adequacy of cross-validation for attaining these bounds.
Source

25.09.2024 09:00 Niels Richard Hansen, Negar Kiyavash, Martin Huber, Niklas Pfister, Leonard Henckel, Jakob Runge, Francesco Locatello, Isabel Valera, Sara Magliacane, Qingyuan Zhao, Jalal Etesami: Miniworkshop on Causal Inference 2024

**September 25, 2024** 09:00-09:45 Niels Richard Hansen (University of Copenhagen) 09:45-10:30 Negar Kiyavash (EPFL) break 11:00-11:45 Martin Huber (University of Fribourg) 11:45-12:30 Niklas Pfister (University of Copenhagen) lunch 14:00-14:45 Leonard Henckel (University College Dublin) 14:45-15:30 Jakob Runge (TU Dresden) **September 26, 2024** 10:00-10:45 Francesco Locatello (ISTA) 10:45-11:30 Isabel Valera (Saarland University) break 11:45-12:30 Sara Magliacane (University of Amsterdam) lunch 14:00-14:45 Qingyuan Zhao (University of Cambridge) 14:45-15:30 Jalal Etesami (Technical University of Munich) See https://collab.dvb.bayern/display/TUMmathstat/Miniworkshop+on+Causal+Inference+2024 for more details.
Source

06.08.2024 10:15 Sven Wang (Humboldt University Berlin): Statistical algorithms for low-frequency diffusion data: A PDE approach.

We consider the problem of making nonparametric inference in multi-dimensional diffusion models from low-frequency data. Statistical analysis in this setting is notoriously challenging due to the intractability of the likelihood and its gradient, and computational methods have thus far largely resorted to expensive simulation-based techniques. In this article, we propose a new computational approach which is motivated by PDE theory and is built around the characterisation of the transition densities as solutions of the associated heat (Fokker-Planck) equation. Employing optimal regularity results from the theory of parabolic PDEs, we prove a novel characterisation for the gradient of the likelihood. Using these developments, for the nonlinear inverse problem of recovering the diffusivity (in divergence form models), we then show that the numerical evaluation of the likelihood and its gradient can be reduced to standard elliptic eigenvalue problems, solvable by powerful finite element methods. This enables the efficient implementation of a large class of statistical algorithms, including (i) preconditioned Crank-Nicolson and Langevin-type methods for posterior sampling, and (ii) gradient-based descent optimisation schemes to compute maximum likelihood and maximum-a-posteriori estimates. We showcase the effectiveness of these methods via extensive simulation studies in a nonparametric Bayesian model with Gaussian process priors. Interestingly, the optimisation schemes provided satisfactory numerical recovery while exhibiting rapid convergence towards stationary points despite the problem nonlinearity; thus our approach may lead to significant computational speed-ups.
Source

02.07.2024 14:00 Thomas Richardson (University of Washington, Seattle): Short Course on “Graphical causal modeling” (Lecture 3/3)

This short course covers recent developments in graphical and causal modeling in Statistics/Machine Learning. It is comprised of the following three lectures, each two hours long. \[ \] June 25, 2024; Lecture 1: “Learning from conditional independence when not all variables are measured: Ancestral graphs and the FCI algorithm” \[ \] June 27, 2024; Lecture 2: “Identification of causal effects: A reformulation of the ID algorithm via the fixing operation” \[ \] July 2, 2024; Lecture 3: “Nested Markov models” \[ \] The course targets an audience with exposure to basic concepts in graphical and causal modeling (e.g., conditional independence, DAGs, d-separation, Markov equivalence, definition of causal effects/the do-operator).
Source

27.06.2024 14:00 Thomas Richardson (University of Washington, Seattle): Short Course on “Graphical causal modeling” (Lecture 2/3)

This short course covers recent developments in graphical and causal modeling in Statistics/Machine Learning. It is comprised of the following three lectures, each two hours long. \[ \] June 25, 2024; Lecture 1: “Learning from conditional independence when not all variables are measured: Ancestral graphs and the FCI algorithm” \[ \] June 27, 2024; Lecture 2: “Identification of causal effects: A reformulation of the ID algorithm via the fixing operation” \[ \] July 2, 2024; Lecture 3: “Nested Markov models” \[ \] The course targets an audience with exposure to basic concepts in graphical and causal modeling (e.g., conditional independence, DAGs, d-separation, Markov equivalence, definition of causal effects/the do-operator).
Source

25.06.2024 14:00 Thomas Richardson (University of Washington, Seattle): Short Course on “Graphical causal modeling” (Lecture 1/3)

This short course covers recent developments in graphical and causal modeling in Statistics/Machine Learning. It is comprised of the following three lectures, each two hours long. \[ \] June 25, 2024; Lecture 1: “Learning from conditional independence when not all variables are measured: Ancestral graphs and the FCI algorithm” \[ \] June 27, 2024; Lecture 2: “Identification of causal effects: A reformulation of the ID algorithm via the fixing operation” \[ \] July 2, 2024; Lecture 3: “Nested Markov models” \[ \] The course targets an audience with exposure to basic concepts in graphical and causal modeling (e.g., conditional independence, DAGs, d-separation, Markov equivalence, definition of causal effects/the do-operator).
Source

17.06.2024 09:00 Saber Salehkaleybar (Leiden University): Causal Inference in Linear Structural Causal Models.

The ultimate goal of causal inference is so-called causal effect identification (ID), which refers to quantifying the causal influence of a subset of variables on a target set. A stepping stone towards performing ID is learning the causal relationships among the variables which is commonly called causal structure learning (CSL). In this talk, I mainly focus on the problems pertaining to CSL and ID in linear structural causal models, which serve as the basis for problem abstraction in various scientific fields. In particular, I will review the identifiability results and algorithms for CSL and ID in the presence of latent confounding. Then, I will present our recent result on the ID problem using cross-moments among observed variables and discuss its applications to natural experiments and proximal causal inference. Finally, I conclude the presentation with possible future research directions.
Source

10.06.2024 10:30 Adèle Ribeiro (Philipps-Universität Marburg): Recent Advances in Causal Inference under Limited Domain Knowledge.

One pervasive task found throughout the empirical sciences is to determine the effect of interventions from observational (non-experimental) data. It is well-understood that assumptions are necessary to perform causal inferences, which are commonly articulated through causal diagrams (Pearl, 2000). Despite the power of this approach, there are settings where the knowledge necessary to fully specify a causal diagram may not be available, particularly in complex, high-dimensional domains. In this talk, I will briefly present two recent causal effect identification results that relax the stringent requirement of fully specifying a causal diagram. The first is a new graphical modeling tool called cluster DAGs (for short, C-DAGs) that allows for the specification of relationships among clusters of variables, while the relationships between the variables within a cluster are left unspecified [1]. The second includes a complete calculus and algorithm for effect identification from a Partial Ancestral Graph (PAG), which represents a Markov equivalence class of causal diagrams, fully learnable from observational data [2]. These approaches are expected to help researchers and data scientists to identify novel effects in real-world domains, where knowledge is largely unavailable and coarse. \[ \] References: [1] Anand, T. V., Ribeiro, A. H., Tian, J., & Bareinboim, E. (2023). Causal Effect Identification in Cluster DAGs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, No. 10, pp. 12172-12179. [2] Jaber, A., Ribeiro, A., Zhang, J., & Bareinboim, E. (2022). Causal identification under markov equivalence: Calculus, algorithm, and completeness. Advances in Neural Information Processing Systems, 35, 3679-3690.
Source

05.06.2024 12:15 Han Li (The University of Melbourne): Constructing hierarchical time series through clustering: Is there an optimal way for forecasting?

Forecast reconciliation has attracted significant research interest in recent years, with most studies taking the hierarchy of time series as given. We extend existing work that uses time series clustering to construct hierarchies, with the goal of improving forecast accuracy. First, we investigate multiple approaches to clustering, including not only different clustering algorithms, but also the way time series are represented and how distance between time series is defined. Second, we devise an approach based on random permutation of hierarchies, keeping the structure of the hierarchy fixed, while time series are randomly allocated to clusters. Third, we propose an approach based on averaging forecasts across hierarchies constructed using different clustering methods, that is shown to outperform any single clustering method. Our findings provide new insights into the role of hierarchy construction in forecast reconciliation and offer valuable guidance on forecasting practice.
Source

15.05.2024 17:00 Richard Samworth (University of Cambridge): Optimal convex M-estimation via score matching.

In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency.
Source

13.05.2024 15:15 Chandler Squires (MIT, Cambridge): Decision-centric causal structure learning: An algorithm of data-driven covariate adjustment.

When learning a causal model of a system, a key motivation is the use of that model for downstream decision-making. In this talk, I will take a decision-centric perspective on causal structure learning, focused on a simple setting that is amenable to careful statistical analysis. In particular, we study causal effect estimation via covariate adjustment, when the causal graph is unknown, all variables are discrete, and the non-descendants of treatment are given. \[ \] We propose an algorithm which searches for a data-dependent "approximate" adjustment set via conditional independence testing, and analyze the bias-variance tradeoff entailed by this procedure. We prove matching upper and lower bounds on omitted confounding bias in terms of small violations of conditional independence. Further, we provide a finite-sample bound on the complexity of correctly selecting an "approximate" adjustment set and of estimating the resulting adjustment functional, using results from the property testing literature. \[ \] We demonstrate our algorithm on synthetic and real-world data, outperforming methods which ignore structure learning or which perform structure learning separately from causal effect estimation. I conclude with some open questions at the intersection of structure learning and causal effect estimation.
Source

For talks more than 180 days ago please have a look at the Munich Mathematical Calendar (filter: "Oberseminar Statistics and Data Science").