Table of Contents
Introduction
I am interested in almost all problems in computational biology and genomics. I expect a student to propose novel statistical approaches that can address challenges in data analysis and modelling of high-dimensional, large-volume biological problems.
Feel free to contact me (ypp@stat.ubc.ca
).
Format
You may organize your report including the following sections.
-
Problem definition (1 page): Extract mathematical/statistical problems from the paper and organize them. What are the input data? What is the expected output?
-
Significance (1 paragraph): Why is this an interesting problem? What can be learned by studying this problem? Why is it exciting for you? Author contribution: How did the author(s) find the solution? What was a novel contribution beyond traditional approaches?
-
Limitations/challenges (1 paragraph): What are the assumptions? Are they realistic? What are the technical limitations that the authors acknowledge or not?
-
Novel idea/methods (1-2 pages): Propose your idea and statistical methods. You could interpret the underlying problem in a different formulation. What are related problems/frameworks, but not adopted by the authors?
-
Results (1-2 pages): Include one figure that sketches your approaches. Show tables and figures that clearly demonstrate your methods.
-
Discussion (1 page): Briefly discuss what you have learned and what you would achieve if you were to develop this to a full paper. How would you validate your findings in independent studies, including wet-lab experiments?
Available Papers
-
Deshpande, S. K. (2025). FlexBART: Flexible Bayesian regression trees with categorical predictors. Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, 34(3), 1117–1126.
-
Marton, S., Lüdtke, S., Bartelt, C., & Stuckenschmidt, H. (2023, October 13). GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=XEFWBxi075
-
Rautenstrauch, P., & Ohler, U. (2025). Shortcomings of silhouette in single-cell integration benchmarking. Nature Biotechnology, 1–5.
-
Zhang, L., Liu, L., Ji, J., Yan, R., Guo, P., Gong, W., Xue, F., Zhou, X., & Yuan, Z. (2025). Efficient Mendelian randomization analysis with self-adaptive determination of sample structure and multiple pleiotropic effects. The American Journal of Human Genetics, 0(0). https://doi.org/10.1016/j.ajhg.2025.06.002
-
Bakhtiari, M., Bonn, S., Theis, F., Zolotareva, O., & Baumbach, J. (2025). FedscGen: privacy-preserving federated batch effect correction of single-cell RNA sequencing data. Genome Biology, 26(1), 216.