Using Matlab “princomp” for Easy Dimension Reduction Using Principal Component Analysis (PCA)

Although I have detailed another way of doing dimension reduction in Matlab I recently found the command “princomp” which does everything for you. The following code reads in .csv files from a directory and reduces them to a set number of dimensions (“OutputSize” in this case). This is a lot easier than doing it yourself with the eigenvectors etc:

function [output_args]=ReduceUsingPCA2(DirName,OutputSize)

files = dir(fullfile(DirName, ‘*.csv’));
for i=1:length(files)
% read files(i).name and process
FileName= [DirName '/' files(i).name];
% read in csv file from FileName and store in x
x = csvread(FileName);

% calculate PCs and project data onto principal components
[COEFF,SCORE] = princomp(x);

[infile, remain] = strtok(FileName,’/’);
infile = strtok(remain,’.’);
mkdir([num2str(OutputSize) 'PC']);
outputfilename = [num2str(OutputSize) 'PC' infile '_' num2str(OutputSize) 'PCs.csv'];
csvwrite(outputfilename,SCORE(:,1:OutputSize));
end
end

The important method is   [COEFF,SCORE] = princomp(x); which takes in your data “x” and stores its projection into PCA space in “SCORE” which I then output to csv. I still need to find out how to project back into normal space but I think it should be just as straightforward as this was. For more info on “princomp” type “help princomp” into matlab and have a look at the help files.

Using Matlab and Principal Component Analysis (PCA) to Reduce Dimensionality of .csv Data

This information is out of date really, I have a much easier method here that does away with doing everything yourself.

I used Matlab to reduce the number of dimensions in my gesture data. After a bit of experimentation with different numbers of dimensions I found I could reduce the number of dimensions by half using PCA and still get quite low errors between the original data and the reduced dimension reconstructed data. Some gesturers made such consistent movements I could use just 2 dimensions to describe almost their entire range of motion.

The method is relatively clear in Matlab, although I am still a bit unsure of the multiple transforms made in the following code. I think I may have performed a few too many, but at least it works! The code “ReduceUsingPCA.m” takes in a directory to perform the conversion on and the number of output dimensions you require. So to convert every .csv in “c:input” to 20 dimensional data you run it as “ReduceUsingPCA(“c:input”,20) in Matlab.

% FileName is the name of the file to work on, OutputSize is no. of
% dimensions to output after PCA
function [output_args]=ReduceUsingPCA(DirName,OutputSize)

files = dir(fullfile(DirName, ‘*.csv’));

for i=1:length(files)
% read files(i).name and process
FileName= [DirName '/' files(i).name];
% read in csv file from FileName and store in x

x = csvread(FileName);

[Rows, Columns] = size(x);  % find size of input matrix
m=mean(x);                  % find mean of input matrix
y=x-ones(size(x,1),1)*m;    % normalise by subtracting mean
c=cov(y);                   % find covariance matrix
[V,D]=eig(c);               % find eigenvectors (V) and eigenvalues (D) of covariance matrix
[D,idx] = sort(diag(D));    % sort eigenvalues in descending order by first diagonalising eigenvalue matrix, idx stores order to use when ordering eigenvectors
D = D(end:-1:1)’;
V = V(:,idx(end:-1:1));     % put eigenvectors in order to correspond with eigenvalues
V2d=V(:,1:OutputSize);        % (significant Principal Components we use, OutputSize is input variable)
prefinal=V2d’*y’;
final=prefinal’;            % final is normalised data projected onto eigenspace

[infile, remain] = strtok(FileName,’/’);
infile = strtok(remain,’.’);
mkdir([num2str(OutputSize) 'PC']);
outputfilename = [num2str(OutputSize) 'PC' infile '_' num2str(OutputSize) 'PCs.csv'];

csvwrite(outputfilename,final);
end
end

The files are saved in the same directory as the input data, eg: “filename20PCs.csv”