08.01.2025 12:15 Hannah Laus (TUM) : Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning
Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the
confidence of a given predictor. In this talk we discuss a new data-driven approach for UQ in regression that applies both to classical
regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. In this talk we will address this issue and derive a
data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the
bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, this analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings, discussed in this talk, bridge the gap between established theory and the practical applicability of such debiased methods. This talk is based on joint work with Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer and Holger Rauhut.
Source
15.01.2025 11:15 Elisabeth Maria Griesbauer (Institute of Basic Medical Sciences, Oslo, NO): Synthetic data generation balancing privacy and utility, using vine copulas
The availability of diverse, high-quality data has led to tremendous advances in science, technology and society at large, when analysed by means of statistical and machine learning (ML) methods. However, real-world data, in many cases, cannot be made public to the research community due to privacy restrictions, obstructing progress, especially in bio-medical research. Synthetic data can substitute the sensitive real data, and as long as they do not disclose private aspects. This has proven to be successful in training downstream ML applications.
We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve differential privacy (DP) by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.
Source
15.01.2025 12:15 David Huk (University of Warwick, Coventry, UK): Quasi-Bayes meets Vines
Recently developed quasi-Bayesian (QB) methods proposed a stimulating change of paradigm in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, however, existing constructions for higher dimensional densities are only possible by relying on restrictive assumptions on the model's multivariate structure. In this talk, we discuss a wholly different approach to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem, by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. We use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg. energy score) and show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with an analytical form and convergence rate independent of the dimension of the data in some situations. Our experiments illustrate that the QB-Vine is appropriate for high dimensional distributions (64), needs very few samples to train (200), and outperforms state-of-the-art methods with analytical forms for density estimation and supervised tasks by a considerable margin.
Source
20.01.2025 13:45 Yuhao Wang (Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, CN; Shanghai Qi Zhi Institute, CN): Residual permutation test for regression coefficient testing
We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates p can be up to a constant fraction of sample size n. In this regime, an important topic is to propose tests with finite-population valid size control without requiring the noise to follow strong distributional assumptions. In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p n/2. Moreover, RPT is shown to be asymptotically powerful for heavy-tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n^{-t/(1+t)} for t \in [0, 1]. We further proved that this signal size requirement is essentially rate-optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions.
Source
22.01.2025 12:15 Ingrid van Keilegom (KU Leuven, BE): Semiparametric estimation of the survival function under dependent censoring
This paper proposes a novel estimator of the survival function under dependent random right censoring, a situation frequently encountered in survival analysis. We model the relation between the survival time T and the censoring C by using a parametric copula, whose association parameter is not supposed to be known. Moreover, the survival time distribution is left unspecified, while the censoring time distribution is modeled parametrically. We develop sufficient conditions under which our model for (T,C) is identifiable, and propose an estimation procedure for the distribution of the survival time T of interest. Our model and estimation procedure build further on the work on the copula-graphic estimator proposed by Zheng and Klein (1995) and Rivest and Wells (2001), which has the drawback of requiring the association parameter of the copula to be known, and on the recent work by Czado and Van Keilegom (2023), who suppose that both marginal distributions are parametric whereas we allow one margin to be unspecified. Our estimator is based on a pseudo-likelihood approach and maintains low computational complexity. The asymptotic normality of the proposed estimator is shown. Additionally, we discuss an extension to include a cure fraction, addressing both identifiability and estimation issues. The practical performance of our method is validated through extensive simulation studies and an application to a breast cancer data set.
Source
29.01.2025 12:15 Siegfried Hörmann (Graz University of Technology, AT): Measuring dependence between a scalar response and a functional covariate
We extend the scope of a recently introduced dependence coefficient between a scalar response Y and a multivariate covariate X to the case where X takes values in a general metric space. Particular attention is paid to the case where X is a curve. While on the population level, this extension is straight forward, the asymptotic behavior of the estimator we consider is delicate. It crucially depends on the nearest neighbor
structure of the infinite-dimensional covariate sample, where deterministic bounds on the degrees of the nearest neighbor graphs available in multivariate settings do no longer exist. The main contribution of this paper is to give some insight into this matter and to advise a way how to overcome the problem for our purposes. As an important application of our results, we consider an independence test.
Source
19.02.2025 12:15 Jane Coons (Max Planck Institute of Molecular Cell Biology and Genetics, Dresden): Algebraic Statistics of Hüsler-Reiss graphical models
In a Hüsler-Reiss graphical model, a $d$-vertex undirected graph encodes extremal conditional independence statements among $d$ random variables. These models are parametrized by a signed Laplacian matrix, which is the analogue of the concentration matrix in the Gaussian setting. Signed Laplacians are in one-to-one correspondence with variogram matrices, which are the analogue of the covariance matrix. In this talk, we study Hüsler-Reiss graphical models from the perspective of algebraic geometry. In particular, we investigate the vanishing ideal of the set of all variogram matrices arising from a given graph. In a similar manner to the Gaussian setting, we find that separation statements in the graph correspond to rank constraints on submatrices of the Cayley-Menger matrix of the variogram. We also investigate the geometry of likelihood inference in the Hüsler-Reiss setting and give upper and lower bounds on the maximum likelihood threshold.
Source
19.03.2025 12:15 Vincent Fortuin (Helmholtz/TUM): t.b.a.
t.b.a.
Source