Scalable optimization of neighbor embedding for visualization
International conference on machine learning, 2013•proceedings.mlr.press
Neighbor embedding (NE) methods have found their use in data visualization but are limited
in big data analysis tasks due to their O (n^ 2) complexity for n data samples. We
demonstrate that the obvious approach of subsampling produces inferior results and
propose a generic approximated optimization technique that reduces the NE optimization
cost to O (n log n). The technique is based on realizing that in visualization the embedding
space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations …
in big data analysis tasks due to their O (n^ 2) complexity for n data samples. We
demonstrate that the obvious approach of subsampling produces inferior results and
propose a generic approximated optimization technique that reduces the NE optimization
cost to O (n log n). The technique is based on realizing that in visualization the embedding
space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations …
Abstract
Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O (n^ 2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O (n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.
proceedings.mlr.press