Does PCA affect performance?

Dimensionality reduction is an unsupervised technique. It aims to reduce the number of features for each input. The question is does it help to reduce the training time? to improve performance?

PCA retains ‘n’ features that contain the highest variance - or information content of the data. We can either choose the top n components, or we can choose to retain the n-components that contain say 80% of the variance.

We will train the MNIST data set on a simple NN model - first using the original data. Then, we will reduce the data to contain 90% of the variance (or the information) in the data. Finally, we will train using n=2 (so that the data can be plotted as well)

The result is as follows

Data	Test Accuracy
Original	96.11%
90% Var.	94.37%
n = 2	41.55%

By retaining 90% of the information, we didn’t affect the test acuracy much even though we dropped around 700 features! But reducing it to 2 components only, we performed poorly even though now the data can be visualized on a 2D plot!

For complete working example, please look at the notebook here.