Compare nipals/empca with no missing Compare nipals/empca with missing, equal weight Compare nipals/empca with missing, unequal weight
Complete data example
# Python: Coeff (scores)
[[-2.809 0.097 0.244 0.050]
[-1.834 0.286 0.010 -0.135]
[-0.809 0.963 -0.341 0.078]
[-0.155 -1.129 0.548 0.026]
[0.707 -0.723 -0.736 -0.024]
[1.830 -0.290 -0.157 0.030]
[3.070 0.796 0.431 -0.026]]
# m1e <- empca(x=B1, w=B1wt, ncomp=4)
# Un-sweep the eigenvalues to compare to python results
# R round( sweep( m1e$scores, 2, m1e$eig, "*"), 3)
PC1 PC2 PC3 PC4
G1 -2.809 0.097 -0.244 0.050
G2 -1.834 0.286 -0.010 -0.135
G3 -0.809 0.963 0.341 0.078
G4 -0.155 -1.129 -0.548 0.026
G5 0.707 -0.723 0.736 -0.024
G6 1.830 -0.290 0.157 0.030
G7 3.070 0.796 -0.431 -0.026
# Matlab: P (scores)
0.5590 0.0517 0.2210 0.2910
0.3650 0.1520 0.0095 -0.7840
0.1610 0.5120 -0.3080 0.4530
0.0309 -0.6010 0.4950 0.1510
-0.1410 -0.3850 -0.6640 -0.1380
-0.3650 -0.1540 -0.1420 0.1760
-0.6110 0.4230 0.3890 -0.1490
# R: round(m1e$scores, 3)
PC1 PC2 PC3 PC4
G1 -0.559 -0.052 0.221 -0.291
G2 -0.365 -0.152 0.009 0.784
G3 -0.161 -0.512 -0.308 -0.453
G4 -0.031 0.601 0.495 -0.151
G5 0.141 0.385 -0.664 0.138
G6 0.365 0.154 -0.142 -0.176
G7 0.611 -0.423 0.389 0.149
Missing data example
# Python with initial Identity matrix
[[2.791 0.125 0.325 -0.035]
[1.528 -0.989 -0.211 0.172]
[0.990 -0.651 -0.117 -0.186]
[0.159 1.463 0.530 0.020]
[-0.628 0.862 -0.730 -0.032]
[-1.738 0.406 -0.139 -0.071]
[-2.917 -0.712 0.520 -0.047]]
Eigvec (loadings)
[[-0.309 -0.839 -0.298 0.300]
[-0.502 0.014 0.154 -0.615]
[-0.470 -0.086 0.766 0.219]
[-0.441 0.521 -0.236 0.615]
[-0.487 0.128 -0.496 -0.326]]
# R
R> m2e <- empca(x=B2, w=B2wt, ncomp=4, seed=NULL)
# # Un-sweep the eigenvalues to compare to python results
R> round( sweep( m2e$scores, 2, m2e$eig, "*"), 3)
PC1 PC2 PC3 PC4
G1 -2.791 0.216 -0.356 0.066
G2 -1.528 -0.942 0.187 -0.150
G3 -0.990 -0.620 0.101 0.207
G4 -0.159 1.472 -0.522 -0.032
G5 0.628 0.844 0.744 0.021
G6 1.738 0.351 0.161 0.050
G7 2.917 -0.808 -0.493 0.019
R> round( m2e$loadings, 3)
PC1 PC2 PC3 PC4
E1 0.309 -0.839 0.298 -0.300
E2 0.502 0.014 -0.154 0.615
E3 0.470 -0.086 -0.766 -0.219
E4 0.441 0.521 0.236 -0.615
E5 0.487 0.128 0.496 0.326
Python
Python code by Bailey (2012), retrieved 1 Mar 2019 from https://github.com/sbailey/empca .
The Python code is difficult to read in places for a person [like me] not well-versed with Python. Three examples:
- It is not clear what values
k
takes infor k in range(self.nvec)
. - Gram-Schmidt orthogonalization is accomplished with a pair of nested
for
loops instead of a function. - The
Model
class structure makes it a bit tricky to figure out what objects have actually been modified inside a function.
The Python code iterates these two EM steps:
- Calculate the coefficient matrix C.
- Calculate ALL
ncomp
principal components P simultaneously (iterate each to convergence). Orthogonalize P.
For a complete-data problem, python and R give similar results. Note
the Coeff
matrix in python does NOT have eigenvalues swept
out of the columns.
For the missing-data problem, the python results are somewhat different from R.
Matlab
Matlab code by Vicente Parot, retrieved 1 Mar 2019 from https://www.mathworks.com/matlabcentral/fileexchange/45353-empca.
The Matlab code feels similar to R.
The Matlab code calculates principal components sequentially, one at time. For each principal component, the algorithm iterates between these two steps:
- Calculate C[,h]
- Calculate P[,h]
While this is a type of EM algorithm, it is NOT the algorithm described by Bailey (2012) and is considered further.