We are a computational biology laboratory in the Department of Pathology and Laboratory Medicine, and the Department of Statistics at the University of British Columbia. We usually work in BC Cancer Research Centre, or some places where we can open a terminal.

We love high-dimensional, large-volume biological data. If you give us data, we would love to understand the mechanisms of the data-generating process. If you give us a cool hypothesis, we will be excited to test the causality using data. We also love pretty scientific figures with a healthy dose of obsession.


Lightweight tools for integrative single-cell data analysis

TL;DR. We develop scalable and memory-efficient tools for single-cell data analysis. There are already many tools. Ever since single-cell technology was introduced to the world, every year, every month, we see more tools.

Causal Inference in Genomics

TL;DR. We reinvent traditional bioinformatics methods to ascertain the causality of discoveries made in high-dimensional omics data. Massive data generation in genomics has transformed the methodology and practice of researches in medicine and human biology.

Bayesian modelling of summary statistics data

TL;DR. We develop Bayesian machine learning methods that excavate a latent structure or underlying joint models of summary statistics data. What are summary statistics data? There is no clear boundary to define that data are made available at the summary or observation level.

Recent Posts

Adjusting batch effect in pseudobulk samples is a causal inference problem

Inspiration Our previous work demonstrates that fiding causal difference at a pseudobulk level is easier than dealing with noisy cell-level data: Counterfactual inference for single-cell gene expression analysis

STAT 548 PhD Qualifying Papers (2022 - 2023)

Introduction I am interested in almost all problems in computational biology and genomics. I expect a student to propose novel statistical approaches that can address challenges in data analysis and modelling of high-dimensional, large-volume biological problems.

Recipes for GWAS data conversion/extraction

Useful tools Download PLINK 1.9 from https://www.cog-genomics.org/plink2. Install bcftools from https://samtools.github.io/bcftools/ but for MAC OS X it is easier to install UNIX tools via homebrew. After having homebrew installed, just tap in by brew install bcftools.

On the analysis methods used in the "omnigenic" paper

A recent quanta magazine article on the omnigenic hypothesis re-ignited my interest. So I read the original omnigenic paper again. I am not against the model itself First, I should make it clear: please do not get me wrong.


Graduate Student


Kevin Lam

PhD Student in Statistics

Causal Inference, Computational Biology, Machine Learning, Public Health


Ming Yuan

PhD student

Variable selection, Causal inference


Pattie Ye

Master’s student

Computational Biology, Statistical Genomics, Complex Disease, Autoimmune and Inflammatory Disease, Immunometabolism


Sishir Subedi

PhD student in Bioinformatics

Cancer genomics, Machine learning, Causal inference, Bayesian statistics


Yichen Zhang

PhD Student

Computational Biology, Bayesian Statistics, Deep Learning, Graphical Model

Co-op Student


Sam Khalilitousi

Data Analysis and Scientific Computing Research Co-op

Computational Biology, Cellular Bioengineering

Principal Investigator


Yongjin Park

Assistant Professor of Pathology and Statistics

Bayesian statistics, Causal inference, Computational Biology, Network science, Single-cell genomics