Statistics Seminar
Causal Inference with Cocycles
To join this seminar virtually: Please request Zoom connection details from ea@stat.ubc.ca.
Abstract: Many interventions in causal inference can be represented as transformations of the variables of interest. Abstracting interventions in this way allows us to identify a local symmetry property exhibited by many causal models under interventions. Where present, this symmetry can be characterized by a type of map called a cocycle, an object that is central to dynamical systems theory. We show that such cocycles exist under general conditions and are sufficient to identify interventional distributions and, under suitable assumptions, counterfactual distributions. We use these results to derive cocycle-based estimators for causal estimands and show that they achieve semiparametric efficiency under standard conditions. Since entire families of distributions can share the same cocycle, these estimators can make causal inference robust to mis-specification by sidestepping superfluous modelling assumptions. We demonstrate both robustness and state-of-the-art performance in several simulations, and apply our method to estimate the effects of 401(k) pension plan eligibility on asset accumulation using a real dataset.
Joint work with Hugh Dance (UCL/Gatsby Unit): https://arxiv.org/abs/2405.13844
Online Kernel-Based Mode Learning
To join this seminar virtually: Please request Zoom connection details from ea@stat.ubc.ca.
Abstract: The presence of big data, characterized by exceptionally large sample size, often brings the challenge of outliers and data distributions that exhibit heavy tails. An online learning estimation that incorporates anti-outlier capabilities while not relying on historical data is therefore urgently required to achieve robust and efficient estimators. In this talk, we introduce an innovative online learning approach based on a mode kernel-based objective function, specifically designed to address outliers and heavy-tailed distributions in the context of big data. The developed approach leverages mode regression within an online learning framework that operates on data subsets, which enables the continuous updating of historical data using pertinent information extracted from a new data subset. We demonstrate that the resulting estimator is asymptotically equivalent to the mode estimator calculated using the entire dataset. Monte Carlo simulations and an empirical study are presented to illustrate the finite sample performance of the proposed estimator.