Tutorial 2, Advanced Crime Analysis, BSc Security and Crime Science, UCL


Aim of this tutorial

This tutorial will help you consolidate some techniques presented and used earlier in this module about APIs and web-scraping. You will also be able to start working on your own webscraping programme that might be useful for your final project.

Task 1: Geographical variation of public perception on Twitter

Use Twitter’s API to retrieve Tweets about “crime” in these cities: (1) New York City. (2) London, (3) Los Angeles, (4) Austin, Texas, and (5) Dublin.

Store all tweets in a single dataframe with a column identifying the city.

#your code comes here

Task 2: Using YouTube’s API to analyse highly controversial content

Recently, the razor manufacturer Gilette released a video called We Believe: The Best Men Can Be. That video has been extremely controversial and has evoked a number considerably opinioted responses.

Use YouTube’s API to gether information about that video. Note: retrieving all comments will exceed your quota and will take a very long time.

#your code comes here

Task 3: Retrieving the public opinion on that controversial topic.

You might not that some comments below that video are on the boundary of hate speech or even in full-blown aggressive language.

Let’s look at a different source of opinion about that video. This recent opinion article in the Guardian discusses the video.

Try to access the comments made to that video and store them in a dataframe along with a unique identifier.

#your code comes here

Task 4: Recreate the lyrics scraping example with your own artist selection

In this blog you can see stepwise how to scrape data from popular music artists and then access the lyrics of their songs.

(Re-)use the code from the above blog post and re-do their scraping process using your own artist (charts) selection.

#your code comes here

Task 5: Creating a local database of missing persons

The problem of missing persons in the UK is increasingly recognised in academic research. However, to date not curated database exists that researchers can easily download to query the data of people reported missing.

Note that some of the images are from dead persons and might be confrontational to look at.

Your task is to create a local database on your computer of missing females. Use this urlc (https://www.missingpersons.police.uk/en-gb/case-search/9444442) as a starting point and retrieve the (1) gender, (2) age, (3) ethnicity and (4) circumstances details of the first three pages of search results.

#your code comes here

Task 6: Web-scraping for the detection of potentially suspicious items

You can use web-scraping to look for suspicious items on online market places. Let’s take the example of clothing from the “Supreme” brand on gumtree.

You can access the search results for supreme hoodie here: https://www.gumtree.com/search?featured_filter=false&urgent_filter=false&sort=date&search_scope=false&photos_filter=false&search_category=all&q=supreme+hoodie&tq=%7B%22i%22%3A%22supreme%22%2C%22s%22%3A%22supreme+hoodie%22%2C%22p%22%3A9%2C%22t%22%3A14%7D&search_location=

Your task now is to scrape the (1) price, (2) title, and (3) location of each ad.

#your code comes here

Task 7: Start with your own web-scraping

Choose a website that you want to scrape (e.g. for your final project for this module). In this task, try to understand the structure of that website and how you can retrieve the content you wish to access.

Write a stepwise plan on what you need to do to obtain that data.

#write your plan here

#step1:
#step2:
#step3:
#step4:
#step5:

Now start by creating a my_target_url variable and do the initial accessing of that url with the read_html function:

#your code comes here

Task 8: Your custom web-scraper

Start building your own webscraper here:

#your code comes here
LS0tCnRpdGxlOiAiV2Vic2NyYXBpbmcgaW4gUiIKYXV0aG9yOiAiQiBLbGVpbmJlcmciCmRhdGU6IDIyIEphbnVhcnkgMjAxOQpzdWJ0aXRsZTogRGVwdCBvZiBTZWN1cml0eSBhbmQgQ3JpbWUgU2NpZW5jZSwgVUNMCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCi0tLQoKVHV0b3JpYWwgMiwgQWR2YW5jZWQgQ3JpbWUgQW5hbHlzaXMsIEJTYyBTZWN1cml0eSBhbmQgQ3JpbWUgU2NpZW5jZSwgVUNMCgotLS0KCiMjIEFpbSBvZiB0aGlzIHR1dG9yaWFsCgpUaGlzIHR1dG9yaWFsIHdpbGwgaGVscCB5b3UgY29uc29saWRhdGUgc29tZSB0ZWNobmlxdWVzIHByZXNlbnRlZCBhbmQgdXNlZCBlYXJsaWVyIGluIHRoaXMgbW9kdWxlIGFib3V0IEFQSXMgYW5kIHdlYi1zY3JhcGluZy4gWW91IHdpbGwgYWxzbyBiZSBhYmxlIHRvIHN0YXJ0IHdvcmtpbmcgb24geW91ciBvd24gd2Vic2NyYXBpbmcgcHJvZ3JhbW1lIHRoYXQgbWlnaHQgYmUgdXNlZnVsIGZvciB5b3VyIGZpbmFsIHByb2plY3QuCgoKIyMgVGFzayAxOiBHZW9ncmFwaGljYWwgdmFyaWF0aW9uIG9mIHB1YmxpYyBwZXJjZXB0aW9uIG9uIFR3aXR0ZXIKClVzZSBUd2l0dGVyJ3MgQVBJIHRvIHJldHJpZXZlIFR3ZWV0cyBhYm91dCAiY3JpbWUiIGluIHRoZXNlIGNpdGllczogKDEpIE5ldyBZb3JrIENpdHkuICgyKSBMb25kb24sICgzKSBMb3MgQW5nZWxlcywgKDQpIEF1c3RpbiwgVGV4YXMsIGFuZCAoNSkgRHVibGluLgoKU3RvcmUgYWxsIHR3ZWV0cyBpbiBhIHNpbmdsZSBkYXRhZnJhbWUgd2l0aCBhIGNvbHVtbiBpZGVudGlmeWluZyB0aGUgY2l0eS4KCmBgYHtyfQojeW91ciBjb2RlIGNvbWVzIGhlcmUKYGBgCgoKIyMgVGFzayAyOiBVc2luZyBZb3VUdWJlJ3MgQVBJIHRvIGFuYWx5c2UgaGlnaGx5IGNvbnRyb3ZlcnNpYWwgY29udGVudAoKUmVjZW50bHksIHRoZSByYXpvciBtYW51ZmFjdHVyZXIgR2lsZXR0ZSByZWxlYXNlZCBhIHZpZGVvIGNhbGxlZCBbV2UgQmVsaWV2ZTogVGhlIEJlc3QgTWVuIENhbiBCZV0oaHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1rb1BtdUV5UDNhMCkuIFRoYXQgdmlkZW8gaGFzIGJlZW4gZXh0cmVtZWx5IGNvbnRyb3ZlcnNpYWwgYW5kIGhhcyBldm9rZWQgYSBudW1iZXIgY29uc2lkZXJhYmx5IG9waW5pb3RlZCByZXNwb25zZXMuCgpVc2UgWW91VHViZSdzIEFQSSB0byBnZXRoZXIgaW5mb3JtYXRpb24gYWJvdXQgdGhhdCB2aWRlby4gX05vdGU6IHJldHJpZXZpbmcgYWxsIGNvbW1lbnRzIHdpbGwgZXhjZWVkIHlvdXIgcXVvdGEgYW5kIHdpbGwgdGFrZSBhIHZlcnkgbG9uZyB0aW1lLl8KCmBgYHtyfQojeW91ciBjb2RlIGNvbWVzIGhlcmUKYGBgCgoKIyMgVGFzayAzOiBSZXRyaWV2aW5nIHRoZSBwdWJsaWMgb3BpbmlvbiBvbiB0aGF0IGNvbnRyb3ZlcnNpYWwgdG9waWMuCgpZb3UgbWlnaHQgbm90IHRoYXQgc29tZSBjb21tZW50cyBiZWxvdyB0aGF0IHZpZGVvIGFyZSBvbiB0aGUgYm91bmRhcnkgb2YgaGF0ZSBzcGVlY2ggb3IgZXZlbiBpbiBmdWxsLWJsb3duIGFnZ3Jlc3NpdmUgbGFuZ3VhZ2UuCgpMZXQncyBsb29rIGF0IGEgZGlmZmVyZW50IHNvdXJjZSBvZiBvcGluaW9uIGFib3V0IHRoYXQgdmlkZW8uIFRoaXMgW3JlY2VudCBvcGluaW9uIGFydGljbGVdKGh0dHBzOi8vd3d3LnRoZWd1YXJkaWFuLmNvbS9jb21tZW50aXNmcmVlLzIwMTkvamFuLzE2L21lbi1tYXNjdWxpbml0eS1naWxsZXR0ZS1hZHZlcnRpc2VtZW50KSBpbiB0aGUgR3VhcmRpYW4gZGlzY3Vzc2VzIHRoZSB2aWRlby4KClRyeSB0byBhY2Nlc3MgdGhlIGNvbW1lbnRzIG1hZGUgdG8gdGhhdCB2aWRlbyBhbmQgc3RvcmUgdGhlbSBpbiBhIGRhdGFmcmFtZSBhbG9uZyB3aXRoIGEgdW5pcXVlIGlkZW50aWZpZXIuCgpgYGB7cn0KI3lvdXIgY29kZSBjb21lcyBoZXJlCmBgYAoKCiMjIFRhc2sgNDogUmVjcmVhdGUgdGhlIGx5cmljcyBzY3JhcGluZyBleGFtcGxlIHdpdGggeW91ciBvd24gYXJ0aXN0IHNlbGVjdGlvbgoKSW4gW3RoaXMgYmxvZ10oaHR0cHM6Ly90b3dhcmRzZGF0YXNjaWVuY2UuY29tL2xlYXJuLXRvLWNyZWF0ZS15b3VyLW93bi1kYXRhc2V0cy13ZWItc2NyYXBpbmctaW4tci1mOTM0YTMxNzQ4YTUpIHlvdSBjYW4gc2VlIHN0ZXB3aXNlIGhvdyB0byBzY3JhcGUgZGF0YSBmcm9tIHBvcHVsYXIgbXVzaWMgYXJ0aXN0cyBhbmQgdGhlbiBhY2Nlc3MgdGhlIGx5cmljcyBvZiB0aGVpciBzb25ncy4KCihSZS0pdXNlIHRoZSBjb2RlIGZyb20gdGhlIGFib3ZlIGJsb2cgcG9zdCBhbmQgcmUtZG8gdGhlaXIgc2NyYXBpbmcgcHJvY2VzcyB1c2luZyB5b3VyIG93biBhcnRpc3QgKGNoYXJ0cykgc2VsZWN0aW9uLgoKYGBge3J9CiN5b3VyIGNvZGUgY29tZXMgaGVyZQpgYGAKCgojIyBUYXNrIDU6IENyZWF0aW5nIGEgbG9jYWwgZGF0YWJhc2Ugb2YgbWlzc2luZyBwZXJzb25zCgpUaGUgcHJvYmxlbSBvZiBtaXNzaW5nIHBlcnNvbnMgaW4gdGhlIFVLIGlzIGluY3JlYXNpbmdseSByZWNvZ25pc2VkIGluIGFjYWRlbWljIHJlc2VhcmNoLiBIb3dldmVyLCB0byBkYXRlIG5vdCBjdXJhdGVkIGRhdGFiYXNlIGV4aXN0cyB0aGF0IHJlc2VhcmNoZXJzIGNhbiBlYXNpbHkgZG93bmxvYWQgdG8gcXVlcnkgdGhlIGRhdGEgb2YgcGVvcGxlIHJlcG9ydGVkIG1pc3NpbmcuCgpfTm90ZSB0aGF0IHNvbWUgb2YgdGhlIGltYWdlcyBhcmUgZnJvbSBkZWFkIHBlcnNvbnMgYW5kIG1pZ2h0IGJlIGNvbmZyb250YXRpb25hbCB0byBsb29rIGF0Ll8KCllvdXIgdGFzayBpcyB0byBjcmVhdGUgYSBsb2NhbCBkYXRhYmFzZSBvbiB5b3VyIGNvbXB1dGVyIG9mIG1pc3NpbmcgZmVtYWxlcy4gVXNlIHRoaXMgdXJsYyBbKGh0dHBzOi8vd3d3Lm1pc3NpbmdwZXJzb25zLnBvbGljZS51ay9lbi1nYi9jYXNlLXNlYXJjaC85NDQ0NDQyKV0oaHR0cHM6Ly93d3cubWlzc2luZ3BlcnNvbnMucG9saWNlLnVrL2VuLWdiL2Nhc2Utc2VhcmNoLzk0NDQ0NDIpIGFzIGEgc3RhcnRpbmcgcG9pbnQgYW5kIHJldHJpZXZlIHRoZSAoMSkgZ2VuZGVyLCAoMikgYWdlLCAoMykgZXRobmljaXR5IGFuZCAoNCkgY2lyY3Vtc3RhbmNlcyBkZXRhaWxzIG9mIHRoZSBmaXJzdCB0aHJlZSBwYWdlcyBvZiBzZWFyY2ggcmVzdWx0cy4KCmBgYHtyfQojeW91ciBjb2RlIGNvbWVzIGhlcmUKYGBgCgoKIyMgVGFzayA2OiBXZWItc2NyYXBpbmcgZm9yIHRoZSBkZXRlY3Rpb24gb2YgcG90ZW50aWFsbHkgc3VzcGljaW91cyBpdGVtcwoKWW91IGNhbiB1c2Ugd2ViLXNjcmFwaW5nIHRvIGxvb2sgZm9yIHN1c3BpY2lvdXMgaXRlbXMgb24gb25saW5lIG1hcmtldCBwbGFjZXMuIExldCdzIHRha2UgdGhlIGV4YW1wbGUgb2YgY2xvdGhpbmcgZnJvbSB0aGUgIlN1cHJlbWUiIGJyYW5kIG9uIGd1bXRyZWUuCgpZb3UgY2FuIGFjY2VzcyB0aGUgc2VhcmNoIHJlc3VsdHMgZm9yIGBzdXByZW1lIGhvb2RpZWAgaGVyZTogW2h0dHBzOi8vd3d3Lmd1bXRyZWUuY29tL3NlYXJjaD9mZWF0dXJlZF9maWx0ZXI9ZmFsc2UmdXJnZW50X2ZpbHRlcj1mYWxzZSZzb3J0PWRhdGUmc2VhcmNoX3Njb3BlPWZhbHNlJnBob3Rvc19maWx0ZXI9ZmFsc2Umc2VhcmNoX2NhdGVnb3J5PWFsbCZxPXN1cHJlbWUraG9vZGllJnRxPSU3QiUyMmklMjIlM0ElMjJzdXByZW1lJTIyJTJDJTIycyUyMiUzQSUyMnN1cHJlbWUraG9vZGllJTIyJTJDJTIycCUyMiUzQTklMkMlMjJ0JTIyJTNBMTQlN0Qmc2VhcmNoX2xvY2F0aW9uPV0oaHR0cHM6Ly93d3cuZ3VtdHJlZS5jb20vc2VhcmNoP2ZlYXR1cmVkX2ZpbHRlcj1mYWxzZSZ1cmdlbnRfZmlsdGVyPWZhbHNlJnNvcnQ9ZGF0ZSZzZWFyY2hfc2NvcGU9ZmFsc2UmcGhvdG9zX2ZpbHRlcj1mYWxzZSZzZWFyY2hfY2F0ZWdvcnk9YWxsJnE9c3VwcmVtZStob29kaWUmdHE9JTdCJTIyaSUyMiUzQSUyMnN1cHJlbWUlMjIlMkMlMjJzJTIyJTNBJTIyc3VwcmVtZStob29kaWUlMjIlMkMlMjJwJTIyJTNBOSUyQyUyMnQlMjIlM0ExNCU3RCZzZWFyY2hfbG9jYXRpb249KQoKWW91ciB0YXNrIG5vdyBpcyB0byBzY3JhcGUgdGhlICgxKSBwcmljZSwgKDIpIHRpdGxlLCBhbmQgKDMpIGxvY2F0aW9uIG9mIGVhY2ggYWQuCgpgYGB7cn0KI3lvdXIgY29kZSBjb21lcyBoZXJlCmBgYAoKCiMjIFRhc2sgNzogU3RhcnQgd2l0aCB5b3VyIG93biB3ZWItc2NyYXBpbmcKCkNob29zZSBhIHdlYnNpdGUgdGhhdCB5b3Ugd2FudCB0byBzY3JhcGUgKGUuZy4gZm9yIHlvdXIgZmluYWwgcHJvamVjdCBmb3IgdGhpcyBtb2R1bGUpLiBJbiB0aGlzIHRhc2ssIHRyeSB0byB1bmRlcnN0YW5kIHRoZSBzdHJ1Y3R1cmUgb2YgdGhhdCB3ZWJzaXRlIGFuZCBob3cgeW91IGNhbiByZXRyaWV2ZSB0aGUgY29udGVudCB5b3Ugd2lzaCB0byBhY2Nlc3MuCgpXcml0ZSBhIHN0ZXB3aXNlIHBsYW4gb24gd2hhdCB5b3UgbmVlZCB0byBkbyB0byBvYnRhaW4gdGhhdCBkYXRhLgoKYGBge3J9CiN3cml0ZSB5b3VyIHBsYW4gaGVyZQoKI3N0ZXAxOgojc3RlcDI6CiNzdGVwMzoKI3N0ZXA0Ogojc3RlcDU6CmBgYAoKCk5vdyBzdGFydCBieSBjcmVhdGluZyBhIGBteV90YXJnZXRfdXJsYCB2YXJpYWJsZSBhbmQgZG8gdGhlIGluaXRpYWwgYWNjZXNzaW5nIG9mIHRoYXQgdXJsIHdpdGggdGhlIGByZWFkX2h0bWxgIGZ1bmN0aW9uOgoKYGBge3J9CiN5b3VyIGNvZGUgY29tZXMgaGVyZQpgYGAKCgojIyBUYXNrIDg6IFlvdXIgY3VzdG9tIHdlYi1zY3JhcGVyCgpTdGFydCBidWlsZGluZyB5b3VyIG93biB3ZWJzY3JhcGVyIGhlcmU6CgpgYGB7cn0KI3lvdXIgY29kZSBjb21lcyBoZXJlCmBgYAoKCg==