Large-Scale Data Collection Using ChatGPT API and R

Aug. 20, 2020

Collecting novel and large-scale data has become essential to social science research. This workshop instructed attendees on how to use R and Chatgpt API to automate the large-scale data collection and cleaning process, dramatically reducing the time and costs put into collecting and cleaning data. The workshop discussed how Chatgpt can extract or summarize information from unstructured PDF files and which Chatgpt models (e.g., Chatgpt3.5, Chatgpt4, Chatgpt4-turbo) are suitable for specific data collection processes. 

View the presentation slides here.


About Kyuwon Lee

I am a Postdoctoral Research Associate at Princeton University’s Center for the Study of Democratic Politics (CSDP). In July 2024, I will be joining the University of Southern California (USC)’s POIR Department as an Assistant Professor. I received my Ph.D. from the Wilf Family Department of Politics at New York University. My research interests are bureaucratic politics and American political institutions. My dissertation is unified by a core question: Does political or interest group influence on government agencies undermine or enhance bureaucratic performance? While there is a widespread perception that agencies subjected to such influence perform worse than those that are not, my dissertation addresses unexplored subtleties regarding this question.