STAT 548 PhD Qualifying Papers

Posted on 2025-08-28 :: Tags: stat548

Introduction

I am interested in almost all problems in computational biology and genomics. I expect a student to propose novel statistical approaches that can address challenges in data analysis and modelling of high-dimensional, large-volume biological problems.

Feel free to contact me (ypp@stat.ubc.ca).

Format

You may organize your report including the following sections.

Problem definition (1 page): Extract mathematical/statistical problems from the paper and organize them. What are the input data? What is the expected output?
Significance (1 paragraph): Why is this an interesting problem? What can be learned by studying this problem? Why is it exciting for you? Author contribution: How did the author(s) find the solution? What was a novel contribution beyond traditional approaches?
Limitations/challenges (1 paragraph): What are the assumptions? Are they realistic? What are the technical limitations that the authors acknowledge or not?
Novel idea/methods (1-2 pages): Propose your idea and statistical methods. You could interpret the underlying problem in a different formulation. What are related problems/frameworks, but not adopted by the authors?
Results (1-2 pages): Include one figure that sketches your approaches. Show tables and figures that clearly demonstrate your methods.
Discussion (1 page): Briefly discuss what you have learned and what you would achieve if you were to develop this to a full paper. How would you validate your findings in independent studies, including wet-lab experiments?

Available Papers

Smith, L. H., & VanderWeele, T. J. (2019). Bounding bias due to selection. Epidemiology (Cambridge, Mass.), 30(4), 509–516.
Brown, B. C., Tokolyi, A., Morris, J. A., Lappalainen, T., & Knowles, D. A. (2025). Large-scale causal discovery using interventional data sheds light on gene network structure in k562 cells. Nature Communications, 16(1), 9628.
Deshpande, S. K. (2025). FlexBART: Flexible Bayesian regression trees with categorical predictors. Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, 34(3), 1117–1126.
Marton, S., Lüdtke, S., Bartelt, C., & Stuckenschmidt, H. (2023, October 13). GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data. The Twelfth International Conference on Learning Representations.
Bakhtiari, M., Bonn, S., Theis, F., Zolotareva, O., & Baumbach, J. (2025). FedscGen: privacy-preserving federated batch effect correction of single-cell RNA sequencing data. Genome Biology, 26(1), 216.
TAKEN Rautenstrauch, P., & Ohler, U. (2025). Shortcomings of silhouette in single-cell integration benchmarking. Nature Biotechnology, 1–5.
TAKEN Zhang, L., Liu, L., Ji, J., Yan, R., Guo, P., Gong, W., Xue, F., Zhou, X., & Yuan, Z. (2025). Efficient Mendelian randomization analysis with self-adaptive determination of sample structure and multiple pleiotropic effects. The American Journal of Human Genetics, 0(0). https://doi.org/10.1016/j.ajhg.2025.06.002

Table of Contents

Introduction

Format

Available Papers