activeText

GitHub Repo

Description: With Saki Kuzushima (University of Michigan), Yuki Shiraito (University of Michigan) and Ted Enamorado (Washington University St. Louis). We develop a semi-supervised active learning algorithm for text classification, and show its effectiveness at reducing the amount of labeled data needed to train a classifier. We also introduce activeText, an R package for active learning for text classification.

Papers:

Keywords: active learning, text classification, semi-supervised learning, machine learning, R package


Language Models for Political Science

Description: With Musashi Jacobs-Harukawa (Princeton University), Alexander Hoyle (University of Maryland), Hauke Licht (University of Cologne). With the rapid development of large language models (LLMs), we claim that researchers using LLMs must make three critical decisions: model selection, domain-adaptation strategies, and prompt design. To help provide guidance on these choices, we establish a set of benchmarks for a wide range of natural language processing (NLP) tasks pursued by political science tasks.

Papers:

Keywords: language models, BERT, GPT, NLP, political science


Legislative Records from Colonial India, 1919-1947

GitHub Repo

Description: With Thiha Zaw (University of Michigan). A new dataset of legislative records from colonial India, 1919-1947.

Papers: