jason.phillips

Survey Text Analysis

In order to present the qualitative findings of a large university-wide survey, I took a large database of comments and transformed them for a clustering analysis. The following steps made up the overall processing:

  • privacy filtering and redaction, using a search for named-entities along with a few heuristics and searched phrases / topics;
  • sentence embedding - this was before the transformers revolution in NLP, so I used a word-vector based approach after learning vector representations tuned on the dataset. The specific algorithm to compute sentence-level vectors without too much noise was partly based on https://openreview.net/pdf?id=SyK00v5xx, using a TF-IDF weighting followed by removing the principal component;
  • k-means clustering for broad topic areas;
  • a custom front-end for searching and visualizing relationships between topics, sentiment, and survey audiences

For later work in this conceptual space, see machine learning.