Using Matlab and Principal Component Analysis (PCA) to Reduce Dimensionality of .csv Data

This information is out of date really, I have a much easier method here that does away with doing everything yourself.

I used Matlab to reduce the number of dimensions in my gesture data. After a bit of experimentation with different numbers of dimensions I found I could reduce the number of dimensions by half using PCA and still get quite low errors between the original data and the reduced dimension reconstructed data. Some gesturers made such consistent movements I could use just 2 dimensions to describe almost their entire range of motion.

The method is relatively clear in Matlab, although I am still a bit unsure of the multiple transforms made in the following code. I think I may have performed a few too many, but at least it works! The code “ReduceUsingPCA.m” takes in a directory to perform the conversion on and the number of output dimensions you require. So to convert every .csv in “c:input” to 20 dimensional data you run it as “ReduceUsingPCA(“c:input”,20) in Matlab.

% FileName is the name of the file to work on, OutputSize is no. of
% dimensions to output after PCA
function [output_args]=ReduceUsingPCA(DirName,OutputSize)

files = dir(fullfile(DirName, ‘*.csv’));

for i=1:length(files)
% read files(i).name and process
FileName= [DirName '/' files(i).name];
% read in csv file from FileName and store in x

x = csvread(FileName);

[Rows, Columns] = size(x);  % find size of input matrix
m=mean(x);                  % find mean of input matrix
y=x-ones(size(x,1),1)*m;    % normalise by subtracting mean
c=cov(y);                   % find covariance matrix
[V,D]=eig(c);               % find eigenvectors (V) and eigenvalues (D) of covariance matrix
[D,idx] = sort(diag(D));    % sort eigenvalues in descending order by first diagonalising eigenvalue matrix, idx stores order to use when ordering eigenvectors
D = D(end:-1:1)’;
V = V(:,idx(end:-1:1));     % put eigenvectors in order to correspond with eigenvalues
V2d=V(:,1:OutputSize);        % (significant Principal Components we use, OutputSize is input variable)
prefinal=V2d’*y’;
final=prefinal’;            % final is normalised data projected onto eigenspace

[infile, remain] = strtok(FileName,’/');
infile = strtok(remain,’.');
mkdir([num2str(OutputSize) 'PC']);
outputfilename = [num2str(OutputSize) 'PC' infile '_' num2str(OutputSize) 'PCs.csv'];

csvwrite(outputfilename,final);
end
end

The files are saved in the same directory as the input data, eg: “filename20PCs.csv”

21 thoughts on “Using Matlab and Principal Component Analysis (PCA) to Reduce Dimensionality of .csv Data

  1. Pingback: Using Matlab “princomp” for Easy Dimension Reduction Using Principal Component Analysis (PCA) « James Rossiter

  2. hi, sorry fo my english, i am from chihuahua, mexico.
    So, let me ask you something, may a reduce a matrix or a vector from 10304×1 to 40×1?
    because i have implemented in matlab a code similar to yours in a application for face recognition, and the function of PCA works great when i have a matrix of 10304×72 (for example, this is the result of codify 72 picture of 24 persons), but when i codify de picture of just one persons it gets de vector 10304×1.
    could you help me to know how to transform to 40×1?
    thanks a lot.

    • Hi…I kindly and humbly request you to mail me the source code for face recognition using PCA as i am also doing the same project..From a long time i am not able to write the correct code…please help me…please mail me the source code…..M file.thanking you in advance.
      you can mail me at murarkaankit@gmail.com

    • hi.. i am doing a project on dimensionality reduction. For that i have decide to use pca. can you please tell what function is used to reduce the dimensions using pca?

    • How did you solve that problem? Because I also having the same problem. I am doing it for hand signs. Even though it works fine for a data set related to number of hand signs in the training data set it doesn’t work only for one image related matrix. It just gives only 0s. Can you please tel me how did you solve that problem.
      Thanks in advance.

  3. hi.. i am working on image fusion using PCA and iam using your code to calculate principal component but iam not getting the amswer
    here i have the error
    y=x-ones(size(x,1),1)*m;

    iam not using csvread here..
    thank you in advance

  4. hi, im working with pca in prtools, and i would like to know if exists a special command who can tell me which one of the caracteristicas has been retained…im stuck in this part of the work. will be really nice if some one knows how to do it!!! :) thx

  5. Sir i’m doing a project on face hallucination and recognition please help me to find the eigen values of the images and recognition using PCA

    • Hi Imran, or James

      Please, I need to use the PCA for feature extraction with matlab, can you send the m file if you did the exraxtion of featueres

  6. Hello James, could you please explain how can I perform KPCA + LDA on large databases such as mnist (handwritten digits). If you could give us some examples i would be really appreciated. Thanks and keep up the good work.

  7. Pingback: Using Matlab “princomp” for Easy Dimension Reduction Using Principal Component Analysis (PCA) | James Rossiter

  8. I am applying PCA for feature selection on normalized NSL KDD data set. for this purpose i was using your code .but i am getting error like
    Input argument “DirName” is undefined.

    Error in ==> ReduceUsingPCA at 10
    FileName= [DirName '/' files(i).name];
    will you please send me your matlab code .
    Since so many days i am working on PCA but i am not able to select features exactly using PCA .my data set is intrusion detection data set i.e. NSL KDD data set.

    • hi… i have to reduce the dimensions of the insurance benchmark dataset. when i doing my project have some problems. Can you please send a copy of your file?

  9. Hi,
    I’m doing project in BCI, i have data in .csv file n I just want to capture the fluctuation in the waveform n indicate it by any indicator.
    So can anyone help me regarding this…..???

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>