Unsupervised Learning and Linear Dimensionality Reduction

This is an early version of this project write-up. For now it’s largely a placeholder. I’m actively working on it.
Introduction
Unsupervised learning is about seeing the structure that labels hide. Clustering and linear dimensionality reduction give us two complementary lenses: one groups by similarity, the other reshapes the coordinate system to reveal simpler patterns. This post takes a concept-first tour of how PCA, ICA, and Random Projections change the geometry—and how that impacts K-Means, EM, and even a small neural network trained downstream. This outlines the concepts I learned while working the the third assignment for Georgia Tech’s Machine Learning course.
This post sits alongside my broader course summary. For the full arc of topics covered, see: Machine Learning: A Retrospective.
Overview
How does linear dimensionality reduction reshape data geometry, and how does that interact with clustering and downstream learning? This piece looks at the practical interplay:
- Clustering: K-Means and Expectation-Maximization (Gaussian Mixtures)
- Linear DR: PCA, ICA, Random Projections (RP)
- Interaction studies: clustering on raw vs. reduced spaces; using cluster assignments as engineered features; retraining a small NN from A1 on reduced features.
In accordance with Georgia Tech’s academic integrity policy and the license for course materials, the source code for this project is kept in a private repository. I believe passionately in sharing knowledge, but I also firmly respect the university’s policies. This document follows Dean Joyner’s advice on sharing projects with a focus not on any particular solution and instead on an abstract overview of the problem and the underlying concepts I learned.
I would be delighted to discuss the implementation details, architecture, or specific code sections in an interview. Please feel free to reach out to request private access to the repository.