Posted on :: Tags:

Introduction

I am interested in almost all problems in computational biology and genomics. I expect a student to propose novel statistical approaches that can address challenges in data analysis and modelling of high-dimensional, large-volume biological problems.

Feel free to contact me (ypp@stat.ubc.ca).

Format

You may organize your report including the following sections.

  • Problem definition (1 page): Extract mathematical/statistical problems from the paper and organize them. What are the input data? What is the expected output?

  • Significance (1 paragraph): Why is this an interesting problem? What can be learned by studying this problem? Why is it exciting for you? Author contribution: How did the author(s) find the solution? What was a novel contribution beyond traditional approaches?

  • Limitations/challenges (1 paragraph): What are the assumptions? Are they realistic? What are the technical limitations that the authors acknowledge or not?

  • Novel idea/methods (1-2 pages): Propose your idea and statistical methods. You could interpret the underlying problem in a different formulation. What are related problems/frameworks, but not adopted by the authors?

  • Results (1-2 pages): Include one figure that sketches your approaches. Show tables and figures that clearly demonstrate your methods.

  • Discussion (1 page): Briefly discuss what you have learned and what you would achieve if you were to develop this to a full paper. How would you validate your findings in independent studies, including wet-lab experiments?

Available Papers

  1. Deshpande, S. K. (2025). FlexBART: Flexible Bayesian regression trees with categorical predictors. Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, 34(3), 1117–1126.

  2. Marton, S., Lüdtke, S., Bartelt, C., & Stuckenschmidt, H. (2023, October 13). GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=XEFWBxi075

  3. Rautenstrauch, P., & Ohler, U. (2025). Shortcomings of silhouette in single-cell integration benchmarking. Nature Biotechnology, 1–5.

  4. Zhang, L., Liu, L., Ji, J., Yan, R., Guo, P., Gong, W., Xue, F., Zhou, X., & Yuan, Z. (2025). Efficient Mendelian randomization analysis with self-adaptive determination of sample structure and multiple pleiotropic effects. The American Journal of Human Genetics, 0(0). https://doi.org/10.1016/j.ajhg.2025.06.002

  5. Bakhtiari, M., Bonn, S., Theis, F., Zolotareva, O., & Baumbach, J. (2025). FedscGen: privacy-preserving federated batch effect correction of single-cell RNA sequencing data. Genome Biology, 26(1), 216.