---
title: 'Solutions: Getting API access'
author: "B Kleinberg, F Soldner, D Hammocks"
subtite: Homework week 2
output:
  pdf_document: default
  html_document: default
subtitle: Advanced Crime Analysis, UCL
---

**SOLUTIONS**

## Overview

In this notebook, you will follow a sequence of steps to get access to [Twitter's Search API](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html) and [YouTube's Data API](https://developers.google.com/youtube/v3/).

You will then be able to use these two to access Twitter and YouTube data from your own account through R.

The following external tutorials are an excellent step-by-step guide to obtain the required access codes. You can think of the access codes as a key with which you unlock access to the data through your own credentials. 

**Important:** The process takes some time. However, once you have the access credentials, **do not** share these with anyone. These credentials give others full access to your Twitter/YouTube data and would allow them to send queries on your behalf.

The access codes in the slides are revoked and cannot be used any longer.

## Twitter

Please follow this tutorial to obtain access to Twitter's Search API: [https://towardsdatascience.com/access-data-from-twitter-api-using-r-and-or-python-b8ac342d3efe](https://towardsdatascience.com/access-data-from-twitter-api-using-r-and-or-python-b8ac342d3efe)

The tutorial is an updated version taking into account that Twitter now requires that you apply for a developer account (this used to be different). Please take into account that each application is checked and **approval can take 1-3 days**. We recommend you start with that application first.

_Return to this part of the tutorial once your application has been approved. You can in the meantime best proceed with the steps for the YouTube API [below](##YouTube)._


Start initialising the authentication in R with the `twitteR` package and work through the tasks:

1. Install the `twitteR` package
2. Load the package
3. Authenticate by replacing your access credentials below (_you can uncomment the lines by removing the #_):

```{r}
install.packages("twitteR")
library(twitteR)
# my_consumer_key = "your_consumer_key_as_a_string"
# my_consumer_secret = "your_consumer_secret_key_as_a_string"
# my_access_token = "your_access_token_as_a_string"
# my_access_secret = "your_access_secret_key_as_a_string"

# setup_twitter_oauth(consumer_key = my_consumer_key
#                     , consumer_secret = my_consumer_secret
#                     , access_token = my_access_token
#                     , access_secret = my_access_secret
#                     )
```

### Task 1

Search for tweets in 2019 that are about gun control.

```{r}
# set parameters
searchString = "gun+control"
TimeFrame = "2019-01-01"
# serach for tweets
gunControlTW = searchTwitter(searchString, n=25, since = TimeFrame)
# display 
head(gunControlTW)
```


### Task 2

Filter these tweets to the following locations: NYC, LA, Austin (Texas).

```{r}
# set geolocation (city) in a 5 mile radius
locNY = "40.685264,-73.944017,5mi"
locLA = "34.086155,-118.317639,5mi"
locTex = "30.276867,-97.737594,5mi"

# search and display tweets fro each location
NY = searchTwitter(searchString, n=25, since = TimeFrame, geocode = locNY)
head(NY)

LA = searchTwitter(searchString, n=25, since = TimeFrame, geocode = locLA)
head(LA)

Tex = searchTwitter(searchString, n=25, since = TimeFrame, geocode = locTex)
head(Tex)
```


### Task 3

Expand the search to a wider area and choose the San Diego region. How many tweets are in Spanish and how many in English language?

```{r}
# set geolocation (San Diego region) for a 10 miles radius
SD = "32.828101,-117.146665,10mi"

# serach and safe (check for language setting - how mandy can be found if set for spanish? (es))
SDT = searchTwitter(searchString, n=50, since = TimeFrame, geocode = SD, lang = "en")
# convert to data frame
SDDF = twListToDF(SDT)
head(SDDF)
# check in text for language?
```


### Task 4

Find the current trends on Twitter in London.

```{r}
# setting latitude & longitude for London
locLondon = "51.520814,-0.127714,5mi"
LLat = 51.520814
LLong = -0.127714
# get woid
LTrendTWoid = closestTrendLocations(LLat, LLong)
# pass woid to "get trend"" function to obtain trend
LTrend = getTrends(LTrendTWoid[1,3])
head(LTrend)
```


### Task 5

Choose the top trend and use it in a search query to retrieve tweets made in the past 2 days.

```{r}
# get top trend
TopTrend = LTrend[1,1]
# set serach parameters and serach for top trend tweets
TopTrendTweets = searchTwitter(TopTrend, n=25, sinceID="2019-01-20")
head(TopTrendTweets)

```

---

### An alternative approach: `rtweet`

In the lecture we briefly covered differences between Twitter's Search API and the Streaming API. In essence, the Search API gives you more control of your search but less scope of the data, while the Streaming API allows for large-scale real-time access with fewer search controls (see [here](http://140dev.com/twitter-api-programming-tutorials/aggregating-tweets-search-api-vs-streaming-api/)).

A nice feature of the `rtweet` package is that it allows for very simple access to Twitter data without the need for applying to a developer account.

**Note:** Most tutorials on the `rweet` package are outdated and do not yet have the simple authentication process. Therefore, you may find that they still walk you through the API request steps. This is not necessary any longer (but it would still work if you do want to use the API credentials).

Try to follow these steps:

1. Install the `rwteet` package in R
2. Load the package
3. Initialise the authentication process through the `search_tweets(...)` function.
    - this will open a popup in your browser asking you to authorise the `rstats2twitter` app
    - allow this app to be connected to your Twitter account by loggin in

For step 3, you can use any search query, but as a guide, search for tweets that have the hashtag #london with a maximum of 100 results.

```{r}
# install and load the package "rwteet"
#install.packages("rtweet")
library(rtweet)

search_tweets("london", n = 100)

```


You may also find these materials useful:

- Brief [workshop slides](https://mkearney.github.io/nicar_tworkshop/) by the `rtweet` developer Michael Kearney.
- Tutorial by the University of Colorado's Earth Lab on [Twitter Data in R Using Rtweet](https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/use-twitter-api-r/).
- [Steps for API authentication](https://rtweet.info/#api-authorization) for `rtweet`. Note: this too is not necessary any longer.

Now that you are set to obtain the data, walk through these tasks:

### Task 1

Search (max. 1,000) tweets that are about knife crimes in London. The actual search query is up to you.

```{r}
#your code comes here:
tweets_KnifeCrimeLondon<-search_tweets(q='knife OR stabbing AND crime', n=1000)

#View data
View(tweets_KnifeCrimeLondon)
```

### Task 2

Search for all the users that the Dept. of Security and Crime Science's twitter account `@UCLCrimeScience` is following:

```{r}
#your code comes here:
#Get list of UserID followed by UCL
following_UCLDSCS<-get_friends('UCLCrimeScience')

#View top 6 rows of data
#Note that this shows the user_id
head(following_UCLDSCS)
  
#Use user_id to get their data
following_UCLDSCSData<- lookup_users(following_UCLDSCS$user_id)
#View data
View(followers_UCLDSCSData)
```

Now pick one of these users and sample a number of their tweets (e.g. 100):

```{r}
#your code comes here:

#Break down:
#1 We use the sample.int function to return a random integer
  #Input: number of rows (following), size=1 (return 1 number)
#2 Index the data to give a user_id (from a random row)
#3 Turn to numeric as get_timelines wants numeric not character type
#4 Get 100 of this random users tweets
  
#We use nrow rather than a number to ensure that the code will work in the future if there are more or less followed
  
timeline_RandomDSCSFollowing<- get_timelines(as.numeric(following_UCLDSCSData[sample.int(n=nrow(following_UCLDSCS),size=1),'user_id']), n=100)
  
#View results
head(timeline_RandomDSCSFollowing)
```


### Task 3

Retrieve a list of all followers from `@UCLCrimeScience`.

Hint: `?get_followers`

```{r}
#your code comes here:
# Get a list of all followers of @UCLCrimeScience
#Get USer ID
followers_UCLDSCS<-get_followers('UCLCrimeScience')

#Get their data
followers_UCLDSCSData<-lookup_users(followers_UCLDSCS$user_id)
  
#View results
View(followers_UCLDSCSData)
```

---

## YouTube

Access to YouTube's API requires a Google account. If you have a Gmail account, this is sufficient. 

The steps to get the credentials to not require Youtube's approval (in contrast to Twitter) but getting there is a bit tickier because you will have to navigate the Google Developer's Console. Please follow this step-by-step guide on the Towards Data Science blog: [https://towardsdatascience.com/what-is-api-and-how-to-use-youtube-api-65525744f520](https://towardsdatascience.com/what-is-api-and-how-to-use-youtube-api-65525744f520).

The first part of the blog post is an intro to APIs which we have largely covered in the lecture, so you could skim through this or skip it. The important part for getting access is where you are directed to another brief guide to obtain the actual credentials. 

We noticed that the guide linked to in that tutorial has trouble loading the correct stylesheets (hence the messy layout). If this persists, please have a look at the official docs from [YouTube](https://developers.google.com/youtube/registering_an_application) or this [video](https://www.youtube.com/watch?v=pP4zvduVAqo).

**Note:** The blog post states explicitly that you need the "OAuth" credentials. Other tutorials might show you how to get simple API keys but these won't suffice for access through the `tuber` package.

Once you have obtained access, you can see an addiitonal guide to the `tuber` package [here](http://soodoku.github.io/tuber/articles/tuber-ex.html).

Now, to access YouTube data through your credentials with the `tuber` package, follow these steps:

1. Install the `tuber` package in R
2. Load the package
3. Initialise the authentication process.

You can run step 3 by uncommenting (= removing the # at the start of the line) and replacing your access credentials here:

```{r}
# client_secret = 'your_secret_key_as_a_string'
# client_id = 'your_cliend_id_as_a_string'

# yt_oauth(app_id = client_id, app_secret = client_secret, token='')
```

Now that you are set to obtain the data, walk through these tasks:

### Task 1

Choose a recent video from a YouTube channel of your choice and retrieve the url to the high resolution thumbnail.

Hint: `...$snippet$thumbnails$high$url`

```{r}
vidID = "VYqWtZTh2LA"
vidID = "YouTube video ID"
VDetails = get_video_details(video_id=vidID)
VDetails$items[[1]]$snippet$thumbnails$high$url

```


### Task 2

From the video selected above, retrieve the comments made to that video and get the frequencies of the comments made by each user. Are there some users that comment more than others?

```{r}
comments <- get_comment_threads(c(video_id=vidID))
head(comments)
authComm = comments$authorDisplayName
length(authComm)
length(unique(authComm))
n_occur <- data.frame(table(authComm))
n_occur[n_occur$Freq > 1,]

```


### Task 3

Choose a channel of your choice (a different one from the one used above) and produce the following outcomes: total number of views of the whole channel, number of channel subscribers.

Hint: `...$statistics`

```{r}
chID =  "UCZUT79WUUpZlZ-XMF7l4CFg"
chID =  "Youtube channel id"
CHstats <- list_channel_resources(filter = c(channel_id = chID), part="statistics")
viewCount = CHstats$items[[1]]$statistics$viewCount
TotalSub = CHstats$items[[1]]$statistics$subscriberCount
viewCount
TotalSub
```


### Task 4

From that channel, have a look at the channel activity and retrieve all thumbnails used by that channel. Are some re-used?

```{r}
ChanVid = get_all_channel_video_stats(channel_id = "UCZUT79WUUpZlZ-XMF7l4CFg", mine = FALSE)
allVidIds = as.vector(ChanVid$id)

get_all_thumb <- function(id) {
  x = get_video_details(video_id=id)
  x$items[[1]]$snippet$thumbnails$high$url
} 

AllThum <- lapply(allVidIds, get_all_thumb)
length(AllThum)
length(unique(AllThum))

```


### Task 5

Try to find a list of all subscriptions of that channel.

```{r}
subChannel = get_subscriptions(filter = c(channel_id = "UCZUT79WUUpZlZ-XMF7l4CFg"), part = "snippet")
# no permissions?!
```


## END

---