SOLUTIONS
Tutorial 3, Advanced Crime Analysis, BSc Security and Crime Science, UCL
You will use concepts learned in the lectures to:
In this tutorial, you will explore a unique dataset of YouTube transcripts extracted from left-leaning and right-leaning news channels. In the provided dataset, you have the transcripts of 2000 YouTube videos each from FoxNews (a right-leaning US news channel) and from The Young Turks (a left-leaning US news outlet).
Load the original dataframe called media_data
from data/media_data.RData
.
Make sure you have the following packages installed+loaded into your workspace:
library(sentimentr)
Attaching package: ‘sentimentr’
The following object is masked from ‘package:syuzhet’:
get_sentences
Take a look at the data and identify the column that contains the text data:
#your code
summary(media_data)
head(media_data)
tail(media_data)
Now, before you create a corpus from the text column, makle sure that all strings are in the same format. You will see that some are all UPPERCASE. Equally, you can observe that some contain punctuation, while others do not.
Fix this by creating a new column called text_clean
that contains the lowercased strings and has the punctuation removed. (hint: take a look at the super useful stringr
package and this related SO question)
Now use the text_clean
column to create a corpus object called media_corpus
from the quanteda
package. Remove the orginal text column to avoid excessive data structures and the rename the column text_clean
to text
(this is important for quanteda to knwo where the text is located):
names(media_data_)
[1] "channel_vlog_id" "Filename" "file_id.x"
[4] "file_parent" "nwords" "id"
[7] "vlog_id" "channel_id" "file_id.y"
[10] "url" "view_count" "date_posted"
[13] "landing_url" "upvotes" "downvotes"
[16] "days_until_reference" "view_count_corrected" "pol"
[19] "eng_prop" "ascii" "channel_name"
[22] "text_clean"
Take a look at the summary of the new object media_corpus
:
#your code
summary(media_corpus)
Use the summary
function (hint: you might want to change the n=
argument) on the corpus to create the object corpus_statistics
and calculate:
tapply
)#the average type/token ration for both channels separately
tapply(corpus_statistics$TTR, corpus_statistics$channel_name, mean)
foxnewschannel theyoungturks
0.4436150 0.3447911
Next, build a TFIDF representation from the corpus using the count
TF representation and the inverse
DF representation. Use the stemmed tokens but leave the stopwords in when you create a DFM.
Step 1: create the DFM
Take a look at the first 10 rows and first ten columns of your DFM.
#your code
media_dfm[1:10, 1:10]
Document-feature matrix of: 10 documents, 10 features (56% sparse).
10 x 10 sparse Matrix of class "dfm"
features
docs all right and neo nazi leader richard spencer spoke at
text1 13 23 76 3 8 1 11 18 1 4
text2 3 0 41 0 0 0 1 0 0 3
text3 1 4 21 0 0 1 0 0 0 1
text4 1 0 25 0 0 0 0 0 0 2
text5 3 1 9 0 0 0 0 0 0 3
text6 7 10 48 0 0 0 0 0 0 3
text7 11 16 40 0 0 1 0 0 0 3
text8 4 4 14 0 0 0 0 0 0 0
text9 8 4 46 0 0 0 0 0 0 8
text10 2 0 3 0 0 0 0 0 0 0
Step 2: Weigh the DFM to a TFIDF representation
Again, take a look at the first 10 rows and first ten columns of your TFIDF-DFM.
#your code
media_tfidf[1:10, 1:10]
Document-feature matrix of: 10 documents, 10 features (56% sparse).
10 x 10 sparse Matrix of class "dfm"
features
docs all right and neo nazi leader richard
text1 0.78787231 2.3894086 0.041283784 5.002685 11.33895 0.7099654 16.751666
text2 0.18181669 0 0.022271515 0 0 0 1.522879
text3 0.06060556 0.4155493 0.011407361 0 0 0.7099654 0
text4 0.06060556 0 0.013580192 0 0 0 0
text5 0.18181669 0.1038873 0.004888869 0 0 0 0
text6 0.42423894 1.0388733 0.026073969 0 0 0 0
text7 0.66666119 1.6621973 0.021728307 0 0 0.7099654 0
text8 0.24242225 0.4155493 0.007604907 0 0 0 0
text9 0.48484450 0.4155493 0.024987553 0 0 0 0
text10 0.12121113 0 0.001629623 0 0 0 0
features
docs spencer spoke at
text1 42.24217 1.066766 0.17484593
text2 0 0 0.13113445
text3 0 0 0.04371148
text4 0 0 0.08742297
text5 0 0 0.13113445
text6 0 0 0.13113445
text7 0 0 0.13113445
text8 0 0 0
text9 0 0 0.34969186
text10 0 0 0
Retrieve the top 10 features (tokens) according to their TFIDF value for both channel_name
values. (Hint: look at the groups
argument in the topfeatures
function).
topfeatures(x = media_tfidf, groups = 'channel_name')
$theyoungturks
re gonna she yeah okay money tax her oh
3503.312 3273.507 3004.244 2760.814 2441.523 2057.390 1984.010 1865.957 1860.247
he
1837.587
$foxnewschannel
presid tucker north laura korea kavanaugh fbi greg
1844.192 1678.521 1611.765 1563.885 1486.060 1473.535 1425.031 1361.444
investig she
1319.055 1315.552
Rebuild the TFIDF without stopwords and look at the top features again:
topfeatures(x = media_tfidf_2, groups = 'channel_name')
$theyoungturks
re gonna yeah okay money tax oh trump dollar
3503.312 3273.507 2760.814 2441.523 2057.390 1984.010 1860.247 1769.254 1742.244
guy
1700.337
$foxnewschannel
presid tucker north laura korea kavanaugh fbi greg
1844.192 1678.521 1611.765 1563.885 1486.060 1473.535 1425.031 1361.444
investig democrat
1319.055 1311.602
Now take the above steps a bit further and produce a TF-IDF weighted bi-gram DFM.
Keep in mind that this involves several steps. Once you have created your bi-gram DFM (without the TFIDF weighting), remove those that do not occur in at least 5% of all documents. Stem the tokens.
Step 1: create the bigram DFM and apply the sparsity correction.
bigram_dfm = dfm(x = dfm_without_stopwords
, ngrams = 2
)
Argument ngrams not used.
What was the original overall sparsity, and how did it change?
bigram_dfm
Document-feature matrix of: 4,000 documents, 861,095 features (99.9% sparse).
bigram_dfm_small
Document-feature matrix of: 4,000 documents, 2,016 features (86.9% sparse).
Step 2: apply TFIDF weighting
What are the top 5 features (per group) before TFIDF weighting, and after TFIDF weighting?
topfeatures(bigram_tfidf, n = 5, groups = 'channel_name')
$theyoungturks
they_re you_re he_s you_know and_and
3.817638 3.629969 3.598328 3.523020 3.005292
$foxnewschannel
the_president president_trump north_korea u_s kim_jong
5.913264 5.472814 5.211625 4.605258 4.135510
Now let’s look at the sentiment of the texts from these news outlets. We’ll start with the sentence-based approach from the sentimentr
package. Since only the data from FoxNews was punctuated and hence contained sentences, we’ll focus on these ones only.
Create a new object called foxnews_only
that contains only the transcripts of FoxNews:
media_data$channel_name
[1] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[6] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[11] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[16] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[21] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[26] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[31] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[36] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[41] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[46] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[51] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[56] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[61] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[66] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[71] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[76] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[81] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[86] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[91] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[96] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[101] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[106] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[111] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[116] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[121] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[126] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[131] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[136] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[141] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[146] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[151] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[156] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[161] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[166] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[171] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[176] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[181] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[186] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[191] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[196] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[201] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[206] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[211] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[216] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[221] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[226] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[231] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[236] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[241] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[246] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[251] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[256] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[261] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[266] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[271] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[276] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[281] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[286] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[291] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[296] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[301] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[306] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[311] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[316] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[321] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[326] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[331] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[336] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[341] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[346] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[351] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[356] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[361] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[366] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[371] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[376] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[381] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[386] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[391] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[396] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[401] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[406] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[411] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[416] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[421] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[426] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[431] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[436] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[441] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[446] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[451] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[456] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[461] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[466] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[471] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[476] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[481] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[486] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[491] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[496] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[501] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[506] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[511] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[516] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[521] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[526] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[531] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[536] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[541] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[546] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[551] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[556] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[561] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[566] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[571] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[576] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[581] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[586] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[591] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[596] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[601] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[606] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[611] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[616] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[621] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[626] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[631] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[636] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[641] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[646] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[651] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[656] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[661] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[666] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[671] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[676] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[681] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[686] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[691] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[696] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[701] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[706] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[711] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[716] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[721] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[726] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[731] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[736] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[741] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[746] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[751] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[756] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[761] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[766] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[771] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[776] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[781] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[786] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[791] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[796] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[801] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[806] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[811] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[816] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[821] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[826] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[831] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[836] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[841] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[846] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[851] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[856] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[861] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[866] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[871] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[876] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[881] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[886] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[891] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[896] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[901] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[906] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[911] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[916] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[921] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[926] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[931] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[936] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[941] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[946] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[951] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[956] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[961] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[966] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[971] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[976] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[981] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[986] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[991] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[996] theyoungturks theyoungturks theyoungturks theyoungturks theyoungturks
[ reached getOption("max.print") -- omitted 3000 entries ]
Levels: foxnewschannel theyoungturks
Now use the sentiment
function to retrieve the sentiment of each sentence from this sub-corpus and store the results in a variable called foxnews_sentiments
(this will take a while):
foxnews_sentiments
element_id sentence_id word_count sentiment
1: 1 1 3 0.43301270
2: 1 2 1 0.00000000
3: 1 3 6 -0.20412415
4: 1 4 18 -0.08249579
5: 1 5 11 -0.61809826
---
108491: 2000 34 9 -0.16666667
108492: 2000 35 6 -0.20412415
108493: 2000 36 13 0.00000000
108494: 2000 37 12 0.21650635
108495: 2000 38 5 -0.17888544
The object foxnews_sentiments
now contains a sentiment value for each sentence in this sub-corpus.
Take a look at the distribution of these sentiments using a histogram:
What is the mean/median/min/max sentiment?
summary(foxnews_sentiments$sentiment)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.88871 -0.07217 0.00000 0.02910 0.16499 3.04667
Now let’s use the more advanced dynamic approach that can handle valence shifters as well as unpunctuated data.
Step 1: load the local source to access the ncs
(= naive context sentiment) function by running the command below:
You now have access to the sentiment trajectory algorithm developed and introduced in this paper.
The main wrapper function is called ncs_full
and asks you to specify the following arguments:
Extract the sentiment trajectories of the first 100 FoxNews and the first 100 The Young Turks transcripts and leave all values in their default state (i.e. only specify the txt_input_col
and txt_id_col
argument). Call the resulting data sentiment_trajectories_foxnews
and sentiment_trajectories_tyt
. Note that ncs_full
assumes that your input data is a data.frame (so use the media_data dataframe). Run the analysis on the cleaned text column. This operation will also take a few minutes.
#your code
fn = foxnews_only[1:100, ]
tyt = media_data[media_data$channel_name == 'theyoungturks', ][1:100, ]
sentiment_trajectories_foxnews = ncs_full(txt_input_col = fn$text_clean
, txt_id_col = fn$id)
sentiment_trajectories_tyt = ncs_full(txt_input_col = tyt$text_clean
, txt_id_col = tyt$id)
Now take a look at some shapes of the sentiment trajectories. Compare 2 shapes from FoxNews with 2 shapes from The Young Turks by plotting them:
Finally, let’s explore the psycholinguistics dimension (and additional, deeper linguistic constructs as retrieved through the Linguistic Inquiry and Word Count Software a.k.a. LIWC.
Load the LIWC output for this corpus (LIWC extraction already done) as a csv file from ./data/media_data_liwc.csv
and call the resulting object liwc_data
.
Look at the first ten rows of this object:
#your code
head(liwc_data, 10)
X file WC Analytic Clout Authentic Tone WPS Sixltr Dic
1 1 theyoungturks_3505 2338 56.11 86.05 15.57 62.27 2338 16.00 86.66
2 2 theyoungturks_312 509 30.84 86.87 14.81 4.19 509 14.54 86.25
3 3 theyoungturks_4214 527 37.27 81.37 50.42 4.54 527 14.61 83.49
4 4 theyoungturks_1986 453 23.83 86.04 6.99 16.32 453 13.25 83.89
5 5 theyoungturks_4241 839 59.75 70.84 21.22 31.89 839 15.85 82.60
6 6 theyoungturks_4568 1372 47.56 76.47 25.93 16.40 1372 17.20 82.73
7 7 theyoungturks_1441 1979 25.75 83.91 2.31 43.24 1979 15.66 82.16
8 8 theyoungturks_2201 800 42.56 78.09 11.76 61.33 800 12.00 83.75
9 9 theyoungturks_1094 2052 69.39 78.37 13.13 39.73 2052 22.47 82.36
10 10 theyoungturks_3579 201 31.21 82.78 2.27 80.01 201 13.43 87.06
function. pronoun ppron i we you shehe they ipron article prep auxverb adverb
1 54.79 16.77 8.73 1.97 0.73 2.78 2.18 1.07 8.04 7.14 12.28 7.10 5.77
2 58.55 18.07 10.22 3.73 0.98 0.98 3.14 1.38 7.86 6.68 10.61 4.91 7.07
3 55.60 15.56 9.11 0.76 2.28 0.57 1.33 4.17 6.45 5.69 10.25 10.44 6.07
4 56.73 20.75 11.92 2.43 2.65 0.44 4.64 1.77 8.83 6.18 9.05 5.74 7.06
5 53.16 13.83 6.20 0.83 0.60 1.19 0.36 3.22 7.63 6.56 12.87 9.54 5.36
6 53.64 15.60 8.02 2.19 1.31 2.11 0.29 2.11 7.58 6.92 10.86 7.22 7.43
7 57.71 21.22 12.73 2.53 0.81 2.73 5.66 1.01 8.49 5.71 10.01 8.34 6.22
8 55.12 20.00 10.38 3.00 0.62 2.25 1.50 3.00 9.62 6.50 10.62 8.25 4.62
9 50.05 14.96 7.41 2.14 0.83 1.61 1.66 1.17 7.55 6.48 13.01 5.75 5.02
10 57.71 21.39 11.94 1.99 1.99 2.99 4.98 0.00 9.45 6.47 9.95 8.96 6.47
conj negate verb adj compare interrog number quant affect posemo negemo anx
1 7.44 0.68 14.24 3.64 2.27 2.31 0.64 2.01 4.79 3.34 1.41 0.43
2 13.75 0.98 12.38 2.95 1.57 3.34 1.18 2.95 4.52 1.18 3.34 0.39
3 7.40 1.52 18.03 2.85 1.14 1.14 1.52 0.95 3.98 0.95 3.04 0.76
4 11.26 0.88 15.23 3.09 1.99 1.77 2.21 1.55 4.19 1.77 2.43 0.00
5 6.91 0.83 13.83 5.24 3.58 2.50 1.43 3.93 5.84 3.10 2.74 1.79
6 7.80 1.09 15.60 4.23 3.21 2.26 2.33 2.33 4.88 2.11 2.77 0.15
7 8.29 1.26 15.41 3.28 1.77 1.77 0.40 1.62 4.55 2.73 1.77 0.20
8 6.00 1.38 17.12 3.50 3.38 2.12 1.00 2.00 5.62 3.75 1.88 0.25
9 6.53 0.63 11.99 4.78 2.44 2.19 1.51 3.02 5.85 3.31 2.53 0.29
10 4.98 1.99 17.41 3.48 2.49 1.49 1.49 1.99 5.97 4.48 1.49 0.00
anger sad social family friend female male cogproc insight cause discrep tentat
1 0.56 0.13 12.75 0.34 0.30 0.09 2.44 10.78 2.78 1.45 1.54 2.01
2 1.57 0.20 15.72 0.59 0.20 3.73 1.38 9.43 2.16 2.55 1.38 1.38
3 2.09 0.19 12.52 0.00 0.57 0.00 1.71 11.39 0.95 1.52 1.52 2.85
4 0.88 0.66 14.13 0.00 0.00 3.75 0.88 13.91 4.42 2.21 1.10 3.75
5 0.24 0.24 8.94 0.00 0.12 0.00 0.48 11.44 2.03 3.69 0.95 1.67
6 2.11 0.07 10.20 0.00 0.15 0.15 0.36 12.17 2.62 1.09 1.60 2.84
7 0.61 0.05 14.10 0.00 0.15 0.20 5.81 14.15 2.27 2.02 3.89 3.59
8 0.62 0.25 12.12 0.12 0.12 0.00 1.50 13.38 2.50 2.75 2.88 3.00
9 1.02 0.15 11.35 0.10 0.10 0.58 3.22 11.94 2.78 2.58 1.95 2.73
10 1.49 0.00 11.44 0.00 0.00 0.50 4.98 9.45 1.99 1.00 0.50 0.50
certain differ percept see hear feel bio body health sexual ingest drives
1 1.15 2.74 3.85 1.58 1.92 0.34 0.60 0.38 0.00 0.00 0.04 6.29
2 1.38 1.77 1.77 0.39 1.18 0.20 1.18 0.00 0.39 0.79 0.00 6.29
3 2.28 4.17 3.80 0.95 2.09 0.76 0.38 0.19 0.19 0.00 0.00 7.40
4 0.66 3.09 1.32 0.22 0.88 0.00 0.44 0.00 0.44 0.00 0.00 8.39
5 1.07 3.58 2.15 1.07 0.48 0.48 0.24 0.24 0.00 0.00 0.00 7.39
6 1.75 2.92 2.62 1.46 1.09 0.07 1.09 0.58 0.36 0.00 0.44 6.34
7 1.72 3.84 2.48 0.56 1.52 0.30 0.25 0.15 0.05 0.05 0.00 7.48
8 1.62 2.88 1.50 0.88 0.50 0.12 0.62 0.00 0.00 0.00 0.25 6.50
9 1.17 3.12 1.75 0.73 0.58 0.39 0.88 0.34 0.34 0.15 0.05 7.46
10 2.99 2.49 2.49 1.49 0.00 0.50 1.00 0.50 0.50 0.00 0.00 8.46
affiliation achieve power reward risk focuspast focuspresent focusfuture relativ
1 1.80 0.86 2.27 0.86 0.94 3.85 10.01 0.68 11.16
2 1.96 0.98 2.16 1.38 0.39 5.11 6.29 0.59 11.59
3 2.66 1.14 3.42 0.57 0.95 2.85 11.39 3.61 18.03
4 2.87 1.32 1.77 1.99 0.66 4.86 7.95 0.88 7.95
5 1.19 1.67 3.69 1.67 0.24 1.91 11.56 1.07 11.08
6 2.41 0.73 2.62 0.66 0.29 4.74 8.82 0.66 11.30
7 1.26 0.96 3.89 1.11 0.66 2.12 10.06 1.26 9.90
8 1.50 1.38 2.12 1.50 0.38 4.88 9.50 1.12 9.88
9 1.75 1.36 2.49 1.61 0.88 2.24 9.21 0.49 9.84
10 2.99 0.00 4.48 1.00 0.00 3.48 13.93 1.00 7.96
motion space time work leisure home money relig death informal swear netspeak assent
1 1.41 6.63 3.29 2.91 0.56 0.43 1.07 0.13 0.13 1.11 0.04 0.17 0.68
2 1.96 4.72 5.89 0.59 0.20 0.20 0.20 0.20 0.00 0.79 0.00 0.39 0.20
3 2.66 9.68 6.26 0.95 0.19 0.00 0.00 0.00 0.38 0.38 0.00 0.00 0.38
4 0.66 4.42 3.09 1.99 1.32 0.00 0.66 0.22 0.22 1.55 0.00 0.22 0.66
5 2.15 4.53 4.41 3.58 0.24 0.00 2.86 0.00 0.00 0.48 0.00 0.00 0.24
6 1.24 6.12 4.23 2.11 0.36 0.15 1.02 0.15 0.07 1.24 0.22 0.15 0.51
7 1.11 5.76 2.98 3.44 0.40 0.10 0.40 0.10 0.10 1.97 0.10 0.30 1.06
8 1.75 5.50 2.38 3.25 0.38 0.12 0.12 0.00 0.00 1.62 0.00 0.12 0.88
9 1.12 6.38 2.29 3.70 0.88 0.10 1.36 0.05 0.10 1.32 0.05 0.19 0.68
10 1.99 1.49 3.48 3.48 1.00 0.00 1.49 0.00 0.00 2.99 0.50 0.00 1.00
nonflu filler AllPunc Period Comma Colon SemiC QMark Exclam Dash Quote Apostro
1 0.21 0.00 0 0 0 0 0 0 0 0 0 0
2 0.20 0.00 0 0 0 0 0 0 0 0 0 0
3 0.00 0.00 0 0 0 0 0 0 0 0 0 0
4 0.66 0.00 0 0 0 0 0 0 0 0 0 0
5 0.24 0.00 0 0 0 0 0 0 0 0 0 0
6 0.36 0.00 0 0 0 0 0 0 0 0 0 0
7 0.40 0.10 0 0 0 0 0 0 0 0 0 0
8 0.38 0.25 0 0 0 0 0 0 0 0 0 0
9 0.34 0.05 0 0 0 0 0 0 0 0 0 0
10 1.49 0.00 0 0 0 0 0 0 0 0 0 0
Parenth OtherP
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
Now calculate the average of the following variables per group:
Note: you will have to create the ‘group’ variable yourself from the file
variable.
tapply(liwc_data$power, liwc_data$group, mean)
FN TYT
3.909940 2.942015
Explore the data further to understand the LIWC.