Variational inference of matrix factorization

fit.fqtl.factorize(
  y,
  k = 1,
  svd.init = TRUE,
  model = c("gaussian", "nb", "logit", "voom", "beta"),
  x.mean = NULL,
  c.mean = NULL,
  x.var = NULL,
  y.loc = NULL,
  y.loc2 = NULL,
  x.mean.loc = NULL,
  cis.dist = 1e+06,
  do.hyper = FALSE,
  tau = NULL,
  pi = NULL,
  tau.lb = -10,
  tau.ub = -4,
  pi.lb = -4,
  pi.ub = -1,
  tol = 1e-04,
  gammax = 1000,
  rate = 0.01,
  decay = 0,
  jitter = 0.1,
  nsample = 10,
  vbiter = 2000,
  verbose = TRUE,
  print.interv = 10,
  nthread = 1,
  rseed = NULL,
  options.mf = list(),
  options.reg = list()
)

Arguments

y: [n x m] response matrix
k: Rank of the factorization (default: 1)
svd.init: Initalize by SVD (default: TRUE)
model: choose an appropriate distribution for the generative model of y matrix from c('gaussian', 'nb', 'logit', 'voom', 'beta') (default: 'gaussian')
x.mean: [n x p] primary covariate matrix for mean change (can specify location)
c.mean: [n x q] secondary covariate matrix for mean change (dense)
x.var: [n x r] covariate marix for variance#'
y.loc: m x 1 genomic location of y variables
y.loc2: m x 1 genomic location of y variables (secondary)
x.mean.loc: p x 1 genomic location of x.mean variables
cis.dist: distance cutoff between x and y
do.hyper: Hyper parameter tuning (default: FALSE)
tau: Fixed value of tau
pi: Fixed value of pi
tau.lb: Lower-bound of tau (default: -10)
tau.ub: Upper-bound of tau (default: -4)
pi.lb: Lower-bound of pi (default: -4)
pi.ub: Upper-bound of pi (default: -1)
tol: Convergence criterion (default: 1e-4)
gammax: Maximum precision (default: 1000)
rate: Update rate (default: 1e-2)
decay: Update rate decay (default: 0)
jitter: SD of random jitter for mediation & factorization (default: 0.01)
nsample: Number of stochastic samples (default: 10)
vbiter: Number of variational Bayes iterations (default: 2000)
verbose: Verbosity (default: TRUE)
print.interv: Printing interval (default: 10)
nthread: Number of threads during calculation (default: 1)
rseed: Random seed
options.mf: A combined list of inference options for matrix factorization.
options.reg: A combined list of inference options for regression effects.

Value

a list of variational inference results

Details

Correct hidden confounders lurking in expression matrix using low-rank matrix factorization including genetic and other biological covariates. We estimate the following model:

mean $$\mathsf{E}[Y] = U V^{\top} + X \theta_{\mathsf{local}} + C \theta_{\mathsf{global}}$$

and variance $$\mathsf{V}[Y] = X_{\mathsf{var}} \theta_{\mathsf{var}}$$

We determined ranks by group-wise spike-slab prior imposed on the columns of U and V.

Author

Yongjin Park, ypp@stat.ubc.ca, ypp.ubc@gmail.com

Examples


 require(fqtl)
 require(Matrix)
 n <- 500
 m <- 100
 k <- 3
 p <- 200
 u <- matrix(rnorm(n * k), n, k)
 v <- matrix(rnorm(m * k), m, k)
 p.true <- 3
 theta.true <- matrix(sign(rnorm(1:p.true)), p.true, 1)
 X <- matrix(rnorm(n * p), n, p)
 y.resid <- X[,1:p.true] %*% theta.true
 y <- u %*% t(v) + 0.5 * matrix(rnorm(n * m), n, m)
 y[,1] <- y[,1] + y.resid
 y <- scale(y)
 x.v <- matrix(1, n, 1)
 xx <- as.matrix(cbind(X, 1))
 mf.opt <- list(tol=1e-8, rate=0.01, pi.ub=0, pi.lb=-2, svd.init = TRUE,
 jitter = 1e-1, vbiter = 1000, gammax=1e4, mf.pretrain = TRUE, k = 10)
 reg.opt <- list(pi.ub=-2, pi.lb=-4, gammax=1e4, vbiter = 1000)

## full t(xx) * y adjacency matrix
 mf.out <- fit.fqtl.factorize(y, x.mean = xx, x.var = x.v, options.mf = mf.opt)
 image(Matrix(y), main = 'Y')

 image(Matrix(mf.out$U$theta), main = 'U')

 image(Matrix(mf.out$V$theta), main = 'V')

 image(Matrix(mf.out$mean$theta[1:20,]))


## sparse t(xx) * y adjacency matrix
 mf.out <- fit.fqtl.factorize(y, x.mean = xx, x.var = x.v, x.mean.loc = 1:(p+1),
                   y.loc = 1:m, cis.dist = 3,
                   options.mf = mf.opt,
                   options.reg = reg.opt)
 image(Matrix(mf.out$U$theta), main = 'U')

 image(Matrix(mf.out$V$theta), main = 'V')

 image(Matrix(mf.out$mean$theta[1:20,]))


## mixed, sparse and dense
 c.m <- matrix(1, n, 1)
 mf.out <- fit.fqtl.factorize(y, x.mean = xx, x.var = x.v, x.mean.loc = 1:(p+1),
                   y.loc = 1:m, cis.dist = 3, c.mean = c.m,
                   options.mf = mf.opt,
                   options.reg = reg.opt)
 image(Matrix(mf.out$U$theta), main = 'U')

 image(Matrix(mf.out$V$theta), main = 'V')