Dimension Reduction – IsoMap
Introduction
Isomap stands for isometric mapping. Isomap is a non-linear dimensionality reduction method based on spectral theory, which tries to preserve the geodesic distances in the lower dimension. It starts by creating a neighborhood network, then uses graph distance to approximate the geodesic distance between all pairs of points. Finally, it finds the low-dimensional embedding of the dataset through eigenvalue decomposition of the geodesic distance matrix.
In non-linear manifolds, the Euclidean metric holds only if the neighborhood structure can be approximated as linear. If the neighborhood contains holes, Euclidean distances can be misleading. Measuring the distance along the manifold provides a better approximation of how far or near two points are. Let’s explore this concept with a simple 2-D example.
Prerequisites for Dimension Reduction
- Experience with Python code and a basic understanding of Deep Learning.
- Access to powerful machines or GPUs, which can be rented from cloud providers like DigitalOcean GPU Droplets.
- Familiarity with setting up Python environments and running beginner-level tutorials.
Why Geodesic Distances Are Better for Dimension Reduction
Geodesic distances better approximate relationships in non-linear manifolds. Points far apart, like a and b, may be mapped poorly with Euclidean metrics. Geodesic distances, however, maintain local and global neighborhood fidelity. Isomap applies this principle by creating a similarity matrix for eigenvalue decomposition. Unlike other methods, such as LLE or LPP, Isomap uses both local and global information for mapping.
Dimension Reduction: Steps of the IsoMap Algorithm
Neighborhood Graph
Create a neighborhood graph and adjacency matrix from the dataset. Ensure the resulting graph is a single connected component to prevent incoherent results.
Dissimilarity Matrix
Use Spark’s GraphX library to calculate geodesic distances. Iterate over neighborhood parameters to achieve a fully connected graph. Implement shortest path calculations for weighted graphs using a pregel-like API.
def ShortestPath(Verts, Edges, landmarks): ...
Eigenvalue Decomposition
Square the distance matrix and double-center it before eigenvalue decomposition. Select the top K eigenvectors corresponding to the highest eigenvalues.
Landmark Isomap
Landmark Isomap addresses the computational challenges of large datasets by using landmark points and applying classical MDS on them. Remaining points are embedded using triangulation, maintaining efficiency while reducing the computational load.
Drawbacks of Isomap
Isomap struggles when the manifold is not well-sampled or contains holes. Neighborhood graph creation requires careful parameter tuning to avoid poor results.
Conclusion for Dimension Reduction
This article introduced IsoMap, a manifold learning algorithm, and explored its use cases and limitations. We discussed its theoretical basis, implemented it using Spark’s GraphX library, and highlighted its advantages for non-linear dimensionality reduction.