|Title||A Dynamic Model of Speech in the Social Sciences|
|Authors||Knox D, Lucas C|
Social scientists increasingly rely on statistical models of text to resolve a wide range of questions about speech across a range of domains. However, humans communicate with more than text alone. Auditory cues convey important information, such as emotion, in many contexts of interest to social scientists. Nonetheless, researchers typically discard this information and work only with transcriptions of audio data. We develop the Structural Speaker Affect Model (SSAM), to classify auditorily distinct “modes” of speech (e.g., emotions, speakers) and the transitions between them. SSAM incorporates ridge-like regularization into a nested hidden Markov model, allowing the use of high-dimensional audio features. We implement a fast estimation procedure that enables a principled approach to uncertainty based on the Bayesian bootstrap. As a validation test, we show that SSAM markedly outperforms existing audio and text approaches in both (a) identifying individual Supreme Court justices and (b) detecting human-labeled ”skepticism” in their speech. We extend the analysis by examining the dynamics of expressed emotion in oral arguments.