Week 9

Computational Sociology

Christopher Barrie

Introduction

  1. Housekeeping
  2. Social media tracking and NLP

Introduction: Social media tracking Cao, Lindo, and Zhong (2022)

Introduction: Social media tracking Alshaabi et al. (2021)

Introduction: Social media tracking and NLP Waller and Anderson (2021)

Social media tracking… why?

  • “always-on” means we can study phenomena otherwise hard to capture:

    • protest

    • culture

    • radicalization

      • many more… ideas?

Social media tracking… why?

  • To provide a historical archive

    • utterances online are consequentiall

      • therefore need to be documented… see e.g. LOC

Social media tracking… why?

  • To measure experimental outcomes

    • we’ve seen how social media can be used as experimental platform

      • and by tracking we can measure over-time changes (equivalent of follow-up)

Social media tracking… how?

  1. After the event

  2. Live streaming (w/ API endpoint/scraping tool)

Social media tracking… how?

  1. After the event
  • Using e.g., Twitter API

    • Searching for keywords/users
  • Using e.g., Reddit data dumps

    • Identifying users and posts

Social media tracking… how?

Social media tracking… how?

  1. Live streaming (w/ API endpoint/scraping tool)
  • Using e.g., Twitter Streaming API (see here)

    • Streaming in content according to parameters
  • Using e.g., TikTok scraper (see here)

    • Streaming in content outside of an API (ethical clearance!)

Social media tracking… how?

Social media tracking… how?

…but why is NLP relevant here?

Natural Language Processing refers to;

  • The computational restructuring, labelling, and analysis of text

…but why is NLP relevant here?

And most of these data come in the form of… text!

So what can NLP help us do?

  • The construction of similarity scores

So what can NLP help us do?

An example:

  • “We are all very happy to be at a lecture at 11AM” 

  • “We are all even happier that we don’t have a lecture next week”

    • How do these compare?

Word-level

Document-level

Other measures…

  1. Euclidean distance  √∑(a - b)2 

  2. Manhattan (Block) distance ∑(a - b)

  3. Cosine similarity ∑(ab)/√(∑a2)(∑b2)

  4. Jaccard similarity (a∩b/a∪b) = |a∩b| / |a|+|b|−|a∩b|

Back to conspiracy theories

But it’s not just similarity…

  • Sentiment

    • Examples
  • Scaling

    • Examples
  • Clustering

    • Examples
  • Embedding

    • Examples

A note on computational thinking

This week:

  1. Using the “always-on” nature of digital data to document temporal dynamics
  2. Automating the understanding of language… Come to my other course!

References

Alshaabi, Thayer, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth, and Peter Sheridan Dodds. 2021. “Storywrangler: A Massive Exploratorium for Sociolinguistic, Cultural, Socioeconomic, and Political Timelines Using Twitter.” Science Advances 7 (29): eabe6534. https://doi.org/10.1126/sciadv.abe6534.
Cao, Andy, Jason Lindo, and Jiee Zhong. 2022. “Can Social Media Rhetoric Incite Hate Incidents? Evidence from Trump’s "Chinese Virus" Tweets.” https://doi.org/10.3386/w30588.
Waller, Isaac, and Ashton Anderson. 2021. “Quantifying Social Organization and Political Polarization in Online Platforms.” Nature 600 (7888): 264–68. https://doi.org/10.1038/s41586-021-04167-x.