project draft 4 min read

Unsupervised Learning and Linear Dimensionality Reduction

K-Means and EM clustering on raw and reduced spaces (PCA, ICA, Random Projections), plus the impact on a downstream neural classifier.
machine-learningpythonunsupervised-learning
Published
The Port of Saint-Tropez by Paul Signac
The Port of Saint-Tropez by Paul Signac
Early Draft

This is an early version of this project write-up. For now it’s largely a placeholder. I’m actively working on it.

Introduction

Unsupervised learning is about seeing the structure that labels hide. Clustering and linear dimensionality reduction give us two complementary lenses: one groups by similarity, the other reshapes the coordinate system to reveal simpler patterns. This post takes a concept-first tour of how PCA, ICA, and Random Projections change the geometry—and how that impacts K-Means, EM, and even a small neural network trained downstream. This outlines the concepts I learned while working the the third assignment for Georgia Tech’s Machine Learning course.

Note

This post sits alongside my broader course summary. For the full arc of topics covered, see: Machine Learning: A Retrospective.

Overview

How does linear dimensionality reduction reshape data geometry, and how does that interact with clustering and downstream learning? This piece looks at the practical interplay:

  • Clustering: K-Means and Expectation-Maximization (Gaussian Mixtures)
  • Linear DR: PCA, ICA, Random Projections (RP)
  • Interaction studies: clustering on raw vs. reduced spaces; using cluster assignments as engineered features; retraining a small NN from A1 on reduced features.
A Note on Code Availability

In accordance with Georgia Tech’s academic integrity policy and the license for course materials, the source code for this project is kept in a private repository. I believe passionately in sharing knowledge, but I also firmly respect the university’s policies. This document follows Dean Joyner’s advice on sharing projects with a focus not on any particular solution and instead on an abstract overview of the problem and the underlying concepts I learned.

I would be delighted to discuss the implementation details, architecture, or specific code sections in an interview. Please feel free to reach out to request private access to the repository.

Table of Contents

Table of Contents