Run ALS.
Run ALS. <P>
Example:
val (u,v,errors) = als(input, k).toTuple
ALS runs until (rmse[i-1]-rmse[i])/rmse[i-1] < convergenceThreshold, or i==maxIterations, whichever earlier. <P>
row key type of the input (100 is probably more than enough)
The input matrix
required rank of decomposition (number of cols in U and V results)
regularization rate
maximum iterations to run regardless of convergence
stop sooner if (rmse[i-1] - rmse[i])/rmse[i - 1] is less than this value. If <=0 then we won't compute RMSE and use convergence test.
{ @link org.apache.mahout.math.drm.decompositions.ALS.Result}
Distributed _thin_ QR.
Distributed _thin_ QR. A'A must fit in a memory, i.e. if A is m x n, then n should be pretty controlled (<5000 or so). <P>
It is recommended to checkpoint A since it does two passes over it. <P>
It also guarantees that Q is partitioned exactly the same way (and in same key-order) as A, so their RDD should be able to zip successfully.
Distributed Stochastic PCA decomposition algorithm.
Distributed Stochastic PCA decomposition algorithm. A logical reflow of the "SSVD-PCA options.pdf" document of the MAHOUT-817.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations (hint: use either 0 or 1)
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to MapR-FS in order to trigger their computation.
Distributed Stochastic Singular Value decomposition algorithm.
Distributed Stochastic Singular Value decomposition algorithm.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to MapR-FS in order to trigger their computation.
PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD.
PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD. This follows the solution outlined in MAHOUT-817. For in-core version it, for most part, is supposed to save some memory for sparse inputs by removing direct mean subtraction.<P>
Hint: Usually one wants to use AV which is approsimately USigma, i.e.u %*%: diagv(s).
If retaining distances and orignal scaled variances not that important, the normalized PCA space
is just U.
Important: data points are considered to be rows.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s)
In-core SSVD algorithm.
In-core SSVD algorithm.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s)
This package holds all decomposition and factorization-like methods, all that we were able to make distributed engine-independent so far, anyway.