Welcome

We are a computational biology laboratory in the Department of Pathology and Laboratory Medicine, and the Department of Statistics at the University of British Columbia. We usually work in BC Cancer Research Centre, or some places where we can open a terminal.

We love high-dimensional, large-volume biological data. If you give us data, we would love to understand the mechanisms of the data-generating process. If you give us a cool hypothesis, we will be excited to test the causality using data. We also love pretty scientific figures with a healthy dose of obsession.

Projects

Lightweight tools for integrative single-cell data analysis

TL;DR. We develop scalable and memory-efficient tools for single-cell data analysis. There are already many tools. Ever since single-cell technology was introduced to the world, every year, every month, we see more tools.

Causal Inference in Genomics

TL;DR. We reinvent traditional bioinformatics methods to ascertain the causality of discoveries made in high-dimensional omics data. Massive data generation in genomics has transformed the methodology and practice of researches in medicine and human biology.

Bayesian modelling of summary statistics data

TL;DR. We develop Bayesian machine learning methods that excavate a latent structure or underlying joint models of summary statistics data. What are summary statistics data? There is no clear boundary to define that data are made available at the summary or observation level.

Recent Posts

Adjusting batch effect in pseudobulk samples is a causal inference problem

Inspiration Our previous work demonstrates that fiding causal difference at a pseudobulk level is easier than dealing with noisy cell-level data: Counterfactual inference for single-cell gene expression analysis Interestingly, other people considered identifying batch effects should be treated as a causal effect estimation problem: Batch Effects are Causal Effects: Applications in Human Connectomics

STAT 548 PhD Qualifying Papers (2022 - 2023)

Introduction I am interested in almost all problems in computational biology and genomics. I expect a student to propose novel statistical approaches that can address challenges in data analysis and modelling of high-dimensional, large-volume biological problems.

Recipes for GWAS data conversion/extraction

Useful tools Download PLINK 1.9 from https://www.cog-genomics.org/plink2. Install bcftools from https://samtools.github.io/bcftools/ but for MAC OS X it is easier to install UNIX tools via homebrew. After having homebrew installed, just tap in by brew install bcftools.

On the analysis methods used in the "omnigenic" paper

A recent quanta magazine article on the omnigenic hypothesis re-ignited my interest. So I read the original omnigenic paper again. I am not against the model itself First, I should make it clear: please do not get me wrong.

People

Graduate Student

Avatar

Kevin Lam

PhD Student in Statistics

Causal Inference, Computational Biology, Machine Learning, Public Health

Avatar

Maxwell Douglas

Graduate Student

Computational Biology, Computational Oncology, Open-source software, Causal Inference, Data-Science, Translational Science, Public Health

Avatar

Ming Yuan

PhD student

Variable selection, Causal inference

Avatar

Sishir Subedi

PhD student in Bioinformatics

Cancer genomics, Machine learning, Causal inference, Bayesian statistics

Avatar

Wanxin Li

PhD student

Computational biology, Single cell problems

Avatar

Yaoyue (Yolanda) Feng

PhD student

Cancer Biology, Bioinformatics

Principal Investigator

Avatar

Yongjin Park

Assistant Professor of Pathology and Statistics

Bayesian statistics, Causal inference, Computational Biology, Network science, Single-cell genomics

Alumni

Avatar

Pattie Ye

Master’s student

Computational Biology, Statistical Genomics, Complex Disease, Autoimmune and Inflammatory Disease, Immunometabolism

Avatar

Sam Khalilitousi

Data Analysis and Scientific Computing Research Co-op

Computational Biology, Cellular Bioengineering

Avatar

Yichen Zhang

PhD Student

Computational Biology, Bayesian Statistics, Deep Learning, Graphical Model

Contact