Valid Inference After Hierarchical Clustering

To join via Zoom: To join this seminar, please request Zoom connection details from headsec@stat.ubc.ca

Title: Valid Inference After Hierarchical Clustering 

Abstract: Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the type I error rate when the groups are defined a priori. However, if the groups are instead defined using a clustering algorithm, then applying a classical test yields an extremely inflated type I error rate. Surprisingly, this problem persists even if two separate and independent data sets are used for clustering and for hypothesis testing.

In this talk, I will propose a test for a difference in means between two estimated clusters that accounts for the fact that the null hypothesis is a function of the data, using a selective inference framework. Then, I will describe how to efficiently compute exact p-values for clusters obtained using hierarchical clustering. I will also show an application in the context of single-cell RNA-sequencing data, where it is common for researchers to cluster the cells, then test for a difference in mean gene expression between the clusters.

This talk is based on joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).

Event Type
Location
Zoom
Speaker
Lucy Gao, Assistant Professor, Department of Statistics and Actuarial Science, University of Waterloo
Event date time
-