Dimension Reduction with TSNE and Bokeh Scatterplot
What’s in this notebook?
This is a workflow I use often in data exploration. TSNE gives a good representation of high-dimensional data, and Bokeh is helpful in creating a simple interactive plots with contextual info given by colors and tooltips.
This workflow has been extremely helpful for:
- text analytics/NLP tasks if text data is passed through a
TfidfVectorizer
or similar fromscikit-learn
- understanding
word2vec
ordoc2vec
vectors by passing them to TSNE - getting an idea of separability in doing prediction / classification by passing the outcome variable to bokeh
This example uses the Australian atheletes data set, which contains 11 numeric variables. This workflow is even more helpful on larger datsets with higher dimensionality.