Computational Sociology
Christopher Barrie
Web tracking a core concern in the study of digital society
1994: HTTP cookie is introduced
2010: term “zombie cookies” is coined
2020: Apple bans third-party cookies
And has led to major legislation, e.g.:
GDPR in EU
Draft Online Safety Bill in UK
Is now core of enornmous marketplace:
And you can view your own profile!
Web-tracking methods make accessible:
Granular, real-time data on information consumption
Amenable to experimental intervention
Complementary data source for survey research
What does it look like?
Comes in many forms that might include
What does it look like?
And some add-ons can measure even more, such as:
There are many add-ons/extensions to popular browsers such as Chrome and Firefox.
These include:
View and predicts user habits:
Kulshrestha et al. (2020) find very predictable patterns of routineness
estimate aggregate patterns of information consumption
Pair with surveys to:
understand variation according to demographic attributes
understand variation according to ideological attributes
7,775 participants who have web-tracking software installed
surveyed for attributes often linked to information inequality
surveys + web-tracking behaviour combined
Webpages classified accorded to hand-coded list + ML classifier
Panel structure that controls for within-person characteristics
What we do and see online is often organized in the form of interactions or ties, e.g.:
Emailing a friend or colleague
Retweeting a news article
Hyperlinks within domains
Any such interaction or link therefore has two types of data associated:
And these elements of network data have their names:
For estimating political preferences (cf. last week’s lecture)
For understanding information consumption and e.g.:
polarization in news exposure
diversity in news consumption
For tracking information diffusion and e.g.:
spread of rumours
spread of fake news
Data: 250,000 tweets collected before 2010 US midterms
45,000 individual users
Tweets tagged for political content
Tagging of political content was with
Jaccard coefficient to determine frequently co-occurring tweets:
Then filtered data according to appearance of political hashtags
Generated retweet and mentions node and edge data
Manually annotated 1000 tweets for left-right affiliation
Measured frequency of cross-ideological retweeting/mentioning
Inferred overall community membership from these statistics
Inferred membership of (left-right) community from community detection statistic
Taking digital trace data and:
Asking questions that can be articulated in computational terms
Enriching that data (through computational/non-computational means)
Abstracting that data into approximate solution