Using Matlab and Principal Component Analysis (PCA) to Reduce Dimensionality of .csv Data

This information is out of date really, I have a much easier method here that does away with doing everything yourself.

I used Matlab to reduce the number of dimensions in my gesture data. After a bit of experimentation with different numbers of dimensions I found I could reduce the number of dimensions by half using PCA and still get quite low errors between the original data and the reduced dimension reconstructed data. Some gesturers made such consistent movements I could use just 2 dimensions to describe almost their entire range of motion.

The method is relatively clear in Matlab, although I am still a bit unsure of the multiple transforms made in the following code. I think I may have performed a few too many, but at least it works! The code “ReduceUsingPCA.m” takes in a directory to perform the conversion on and the number of output dimensions you require. So to convert every .csv in “c:input” to 20 dimensional data you run it as “ReduceUsingPCA(“c:input”,20) in Matlab.

% FileName is the name of the file to work on, OutputSize is no. of
% dimensions to output after PCA
function [output_args]=ReduceUsingPCA(DirName,OutputSize)

files = dir(fullfile(DirName, ‘*.csv’));

for i=1:length(files)
% read files(i).name and process
FileName= [DirName '/' files(i).name];
% read in csv file from FileName and store in x

x = csvread(FileName);

[Rows, Columns] = size(x);  % find size of input matrix
m=mean(x);                  % find mean of input matrix
y=x-ones(size(x,1),1)*m;    % normalise by subtracting mean
c=cov(y);                   % find covariance matrix
[V,D]=eig(c);               % find eigenvectors (V) and eigenvalues (D) of covariance matrix
[D,idx] = sort(diag(D));    % sort eigenvalues in descending order by first diagonalising eigenvalue matrix, idx stores order to use when ordering eigenvectors
D = D(end:-1:1)’;
V = V(:,idx(end:-1:1));     % put eigenvectors in order to correspond with eigenvalues
V2d=V(:,1:OutputSize);        % (significant Principal Components we use, OutputSize is input variable)
final=prefinal’;            % final is normalised data projected onto eigenspace

[infile, remain] = strtok(FileName,’/’);
infile = strtok(remain,’.’);
mkdir([num2str(OutputSize) 'PC']);
outputfilename = [num2str(OutputSize) 'PC' infile '_' num2str(OutputSize) 'PCs.csv'];


The files are saved in the same directory as the input data, eg: “filename20PCs.csv”