Seminar

Making a Training Dataset from Multiple Data Distributions

Over time we might accumulate lots of data from several different populations: e.g., the spread of a virus across different countries. Yet what we wish to model is not any one of these populations. One might want a model for the spread of the virus that is robust to the different countries, or is predictive on a new location we have only limited data for. We overview and formalize the objectives these present for mixing different distributions to make a training dataset, which have historically been hard to optimize. We show that by assuming we train models near "optimal" for our training distribution these objectives simplify to convex objectives, and provide methods to optimize these reduced objectives. Experimental results show improvements across language modeling, bio-assays, and census data tasks.

To join this seminar virtually, please request Zoom connection details from ea@stat.ubc.ca. 

Introduction to the Computer-Based Testing Facility (CBTF)

The Computer-Based Testing Facility (CBTF) aims to solve a key problem in teaching and learning: helping instructors run digital assessments at scale securely and equitably. The easiest way to describe the CBTF is to imagine it as a network-filtered computer lab dedicated to running digital assessments in 50-minute increments throughout the day, invigilated by trained proctors. Students are typically given a multi-day window to write their tests at a time and location convenient for them. With network filtering, centralized invigilation, a distributed exam model, and flexibility for students and instructors, the mission of the CBTF is to spur pedagogical innovation at the university in a broad range of classes, programs, and departments. Though not the initial motivation, recently the CBTF has also been used to maintain exam integrity in the face of modern AI tools, particularly for computer-based exams where any element of programming is needed under controlled environments. This session will be useful for a range of people including faculty members teaching courses, administrators, IT staff, grad students as well as anyone with an interest in pedagogy and innovative teaching methods. We’ll also hear about the experience of several Statistics faculty members in using the CBTF in their courses as pilots in previous terms. There will be plenty of time for Q&A and a larger conversation around migration to computer-based testing and different learning technologies. The CBTF currently supports a variety of assessment options including Canvas, PrairieLearn, MTA and others. We will also discuss the advantages and disadvantages of different learning technologies in the CBTF from a pedagogical, logistical, and financial perspective.

To join this seminar virtually, please request Zoom connection details from hr.ops@stat.ubc.ca. 

Tags

UBC Statistics Department Colloquium: Statistical methods for single-cell and spatial data science

The UBC Statistics Department Colloquium Series features talks that are broad, accessible, and engaging - and open to everyone!

The third talk of our series will take place on Monday, June 8th where we will welcome Stephanie Hicks, Associate Professor of Biomedical Engineering and Biostatistics at Johns Hopkins University.

Date: Monday, June 8, 2026
Time: 3 - 4 PM
Location: ESB 5104/5106

Title: Statistical methods for single-cell and spatial data science

Abstract: Genomics is going through a data revolution where we can now profile gene expression at a single-cell or 2D spatial resolution. However, these data present unique challenges that have required the development of specialized statistical and computational methods and software infrastructure to successfully derive biological insights. Compared to bulk RNA-seq, there is an increased scale of the number of observations (or cells) that are measured and there is increased sparsity of the data, or fraction of observed zeros. Furthermore, as single-cell technologies mature, the increasing complexity and volume of data require fundamental changes in data access, management, and infrastructure alongside specialized methods to facilitate scalable analyses. I will discuss some challenges in the analysis of data and present some solutions that we have made towards addressing these challenges.

This colloquium series is sponsored in part by the Constance van Eeden Endowment.

Tags

Chatterjee's graph correlation

This talk surveys recent advances in the understanding of Chatterjee's nearest-neighbor graph-based correlation coefficient. I will present, for the first time, a comprehensive theoretical framework for statistical inference based on this coefficient, including results on asymptotic normality, bias correction, and inconsistency of bootstrap methods. I will also discuss several open problems that may be of interest to researchers wishing to explore this area further.

To join this seminar virtually, please request Zoom connection details from hr.ops@stat.ubc.ca.

Event Photo
Fang Han

The Two Cultures of Prevalence Mapping: Small Area Estimation and Model-Based Geostatistics

In low- and middle-income countries (LMICs), accurate estimates of subnational health and demographic indicators are critical for guiding policy and identifying disparities. Many indicators of interest are proportions of binary outcomes and the task of estimating these fractions is often called prevalence mapping. In LMICs, health and vital records data are limited, so prevalence mapping relies on data from household surveys with complex sampling designs. However, estimates are often desired at spatial resolutions at which data are insufficient for reliable weighted estimation. We review two families of approaches to prevalence mapping: small area estimation (SAE) methods (from the survey statistics literature) and model-based geostatistics (MBG) methods (from the spatial statistics literature). SAE models can be "area-level" or "unit-level" and commonly use area-specific random effects and rely upon high-quality covariate data, often obtained from administrative sources. Unit-level models for binary responses are relatively underdeveloped. MBG approaches explicitly specify binary response models, incorporate continuous spatial random effects, and leverage alternative sources of data such as those arising from satellite imagery. These models are usually studied under a Bayesian framework. SAE methods often address the design by incorporating sampling weights or modeling the sampling mechanism. Two delicate issues arise when using MBG methods for prevalence mapping. First, aggregating unit level predictions to create area-level summaries requires population-level information that is rarely directly available. Second, MBG approaches typically assume the sampling design is ignorable. We review both SAE and MBG approaches to prevalence mapping, and argue that binary response models can be improved using insights from both the survey sampling and the spatial statistics literature. We highlight these issues using household survey data from different Demographic and Health Surveys, and with various indicators.

This is joint work with Geir-Arne Fuglstad, Peter Gao and Zehang Richard Li.

To join this seminar virtually, please request Zoom connection details from hr.ops@stat.ubc.ca.

Event Photo
Jon Wakefield

UBC Statistics Department Colloquium: Nonparametrics in causal inference: densities, heterogeneity, & beyond

Much work in causal inference focuses on finite-dimensional targets like average treatment effects. However, many substantively important causal questions involve inherently infinite-dimensional objects, such as counterfactual outcome distributions, heterogeneous treatment effect surfaces, and continuous treatment curves. These targets occupy a hybrid space between classical parameter estimation and nonparametric function estimation. In this talk, I survey some recent work involving these infinite-dimensional causal estimands, highlighting both model-based and model-free nonparametric approaches. I discuss how, despite the impossibility of root-n-rate estimation, ideas from semiparametric theory (like double robustness) continue to play a central role. Throughout I emphasize the relevance of these methods in applications in social sciences and medicine.

This talk is part of the UBC Statistics Colloquium Series, which features broad and accessible seminars throughout the term and is sponsored in part by the Constance van Eeden Endowment.

 

Event Photo
Edward Kennedy
Tags

SEDI Seminar Series: Dr. Aleksandra Korolova

Registration & Talk details

We invite you to a speaker series focused on learning about equity, diversity and inclusion practices and initiatives in Statistics and Data Science. Our next speaker will be Dr. Aleksandra Korolova, Assistant Professor of Computer Science and Public Affairs, Princeton University. 

Date/Time: March 26, 2026, 11:00am – 12:00pm

Talk title: Lessons from auditing the hidden societal impacts of ad delivery algorithms

Abstract: Although targeted advertising has been touted as a way to give advertisers a choice in who they reach, increasingly, ad delivery algorithms designed by the ad platforms are invisibly refining those choices. In this talk, I will present our findings from "black-box" auditing of the role of ad delivery algorithms in shaping who sees opportunity and political ads using only the tools and data accessible to any advertiser. I will then discuss legal and policy efforts to mitigate the harmful effects of ad delivery in these domains, including their shortcomings and potential paths forward.

Bio: Aleksandra Korolova is an Assistant Professor of Computer Science and Public Affairs at Princeton University, where she is also affiliated with the Center for Information Technology Policy. She studies societal impacts of AI, and develops and deploys algorithms and technologies that enable data-driven innovations while preserving privacy, fairness, and robustness. She also designs and performs algorithm and AI audits. Aleksandra is a co-winner of the 2011 PET Award for outstanding research in privacy enhancing technologies for being among the first to identify privacy risks of microtargeted advertising. Her work on RAPPOR, the first commercial deployment of differential privacy, has been recognized by ACM Conference on Computer and Communications Security 2024 Test-of-Time Award. Aleksandra's research on discrimination in ad delivery has received the 2019 CSCW Honorable Mention Award and Recognition of Contribution to Diversity and Inclusion, was a runner-up for the 2021 WWW Best Student Paper Award, and was a winner of the 2025 FAccT Best Paper Award. Aleksandra is a recipient of the Presidential Early Career Award for Scientists and Engineers, a Sloan Research Fellowship and the NSF CAREER Award.

If you would like to attend this virtual talk, please register using the link below:

https://ubc.zoom.us/meeting/register/bTdpB5a2S5SngAX1d1ch6Q

***

This talk is one of the Statistics Equity, Diversity and Inclusion Speaker Series. For more information, please visit: https://www.stat.ubc.ca/seminar-series

Event Photo
Dr. Aleksandra Korolova
Tags

Stochastic Localization via Iterative Posterior Sampling

Score-based diffusion models have emerged as a powerful framework for generative modeling, progressively transforming noise into structured data samples when access to a dataset is available. In this talk, we explore how to extend these ideas to the sampling setting, where the target distribution is only known up to a normalizing constant. After reviewing the fundamentals of diffusion models, we highlight a key observation: the score function central to these methods can be expressed as an expectation with respect to a time-dependent distribution with known unnormalized density. This perspective motivates Stochastic Localization via Iterative Posterior Sampling (SLIPS), an approach that estimates the score function using Monte Carlo methods and leverages it to construct a denoising process. We will examine the theoretical underpinnings of SLIPS, with particular emphasis on its main limitation, the duality of log-concavity, which restricts its practical applicability. Building on this, I will present a new approach to Iterative Posterior Sampling (forthcoming work) that bypasses explicit score estimation altogether, leading to significantly improved scalability. While this method remains affected by the same duality phenomenon, we will see that its impact is mitigated in practice.

To join this seminar virtually, please request Zoom connection details from hr.ops@stat.ubc.ca.

Event Photo
Louis Grenioux

Dr. Constance van Eeden Seminar

The van Eeden seminar is a yearly event in which graduate students vote for their favorite statisticians. The winner is contacted by the organizing committee and invited to give a talk in the department’s seminar. The speaker spends one or two days on-campus, and graduate students have the opportunity to have lunch and dinner with them.

THIS YEAR'S SPEAKER

The Constance van Eeden Speaker for 2026 is Dr. Ryan Tibshirani, Professor in the Department of Statistics at the University of California, Berkeley, and Principal Investigator in the Delphi Research Group. Before joining Berkeley, Dr. Tibshirani served as a faculty member in the Departments of Statistics and Machine Learning at Carnegie Mellon University from 2011 to 2022. He earned his Ph.D. in Statistics from Stanford University in 2011 under the supervision of Professor Jonathan Taylor, and his B.S. in Mathematics from Stanford University in 2007. 

Seminar Title: Online Conformal Prediction, Multi-Level Quantile Tracking, and Gradient Equilibrium

Event Date: Thursday, April 2nd, 2026. 10:30-12:00.
Location: ESB 5104, University of British Columbia*

Event registration: https://ubc.zoom.us/meeting/register/htGYWnrFSCWhHIs8-d4wqw

Abstract:

This talk is about uncertainty quantification for time series prediction.

The overarching goal is to provide easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general distribution shifts. We will then discuss an extension of these ideas to the setting of probabilistic forecasting, which is essentially a generalization of the framework to handle vector-valued predictions, i.e., predictions which take the form of a set of ordered quantile forecasts at different probability levels. Finally, we will generalize this even further to discuss an abstract property in online learning called gradient equilibrium, which encapsulates these settings, and more.

Dr. Ryan Tibshirani has been invited to be this year’s van Eeden speaker by the graduate students in the Department of Statistics at the University of British Columbia. A van Eeden speaker is a prominent statistician who is chosen each year to give a lecture, supported by the UBC Constance van Eeden Fund. The 2024 seminar is additionally sponsored by the Canadian Statistical Sciences Institute (CANSSI), the Pacific Institute for the Mathematical Sciences (PIMS), and the Walter H. Gage Memorial Fund.

*The room location may change.

Event Photo
Dr. Ryan Tibshirani

Distributional Balancing for Causal Inference: A Unified Framework via Characteristic Function Distance

Weighting methods are essential tools for estimating causal effects in observational studies, with the goal of balancing pre-treatment covariates across treatment groups. Traditional approaches pursue this objective indirectly, for example, via inverse propensity score weighting or by matching a finite number of covariate moments, and therefore do not guarantee balance of the full joint covariate distributions. Recently, distributional balancing methods have emerged as robust, nonparametric alternatives that directly target alignment of entire covariate distributions, but they lack a unified framework, formal theoretical guarantees, and valid inferential procedures. We introduce a unified framework for nonparametric distributional balancing based on the characteristic function distance (CFD) and show that widely used discrepancy measures, including the maximum mean discrepancy and energy distance, arise as special cases. Our theoretical analysis establishes conditions under which the resulting CFD-based weighting estimator achieves root-N consistency. Since the standard bootstrap may fail for this estimator, we propose subsampling as a valid alternative for inference. We further extend our approach to an instrumental variable setting to address potential unmeasured confounding. Finally, we evaluate the performance of our method through simulation studies and a real-world application, where the proposed estimator performs well and exhibits results consistent with our theoretical predictions.

The paper is available at https://arxiv.org/abs/2601.15449

Bio:

Dr. Chan Park is an assistant professor at the University of Illinois Urbana-Champaign. His research focuses on causal inference in complex settings, including dependence among units and omitted variables. He specializes in applying nonparametric methods and semiparametric theory to address these challenges.

To join this seminar virtually, please request Zoom connection details from hr.ops@stat.ubc.ca.