Machine Learning

Recent Production Work

In recent years, I have transitioned fully into architecting, tuning, combining, and training models end-to-end, particularly at the intersecton of LLMs / autoregressive modeling, vision, and image generation through diffusion.

For my most recent work, see: Multimodal models & diffusion at CanOfSoup;
Also, see Slite’s Ask for another recent project shipped to production;
For more on my history with NLP & machine learning, read on

Prior Years: a Long Trajectory in NLP

Earlier projects: browser-based NLP for Overlay AI; various survey projects

My work over the past many years increasingly pivoted towards machine learning, both at the lower of embedding-space and heuristic tasks (text segmentation, tagging, classifying, clustering) and more in-depth transformer-based NLP processing.

My earlier efforts in the space of machine learning or NLP—as an analyst for a large university—were centered on building, fine-tuning, clustering, and then serving robust vectors for topical analysis purposes, using simpler foundational NLP techniques (one of my NLP projects before transformers); see this summary. Following up on that work, I was occasionally tasked with taking large survey or other text data (e.g., a database of syllabi) and performing analysis for certain topics and themes of interest to the university administration.

Working in a consulting role for the startup Overlay AI meant diving deep into the latest state-of-the-art for NLP in a number of domains, and working to efficiently build an entire pipeline in a resource-constrained context—going from tokenization, though tagging, stemming, lemmatization, dependency parsing, and then applying top-level rules or vector-space comparisons, all within a chain of WASM-compiled tools running in a background WebWorker.

I have also expended a fair amount of effort in the domain of working with semantic graphs, particularly Wordnet. Given a desire to find topical connections across words and potentially disambiguate meanings, I have written code (in Rust) to traverse Wordnet’s relations in order to pick the most neighboring senses. I later moved those efforts into simply reducing the graph (using graph to vector learning) into a vector domain, gaining useful spatial representations of all Wordnet senses, which I later was able to put to use in obtaining smart synonyms or word replacements.

In subsequent years, I adopted the increasingly dominant approach of extending transformer-based, pre-trained models (often BERT-based model, since they obtains excellent low-level vectors) and further tuning these to domain-specific tasks. Given the size of these models and my focus on web-embeddable learning, I was particularly invested in the research of model distillation through teacher-student or other methods for minification.

At Slite, I used my expertise to direct the architecture of a complete, AI-powered product feature called Ask, which uses a mixture of models and techniques in order to power a Q&A feature from a workspace of documents.