Applied Data Science Project

This assessment is the capstone project of the module. It requires you to address a crime and security science research problem in the full data science workflow (e.g., obtaining the data, processing the data, modelling the data, building predictive models, reporting on the findings, interpreting the outcomes). You will write a brief report on your project (a template will be provided) and you have to submit the R code needed to reproduce your findings. After passing this assessment, you will have the demonstrated the skills to solve a problem using data science techniques.

Basic info

  • Weight for final grade: 70%
  • Learning outcomes tested:
    • demonstrating knowledge of a broader range of analytical techniques used in the field of Security and Crime Science
    • performing data science analyses on crime and/or-security related issues
    • applying the data science pipeline on crime and/or-security related issues
    • interpreting and effectively reporting the results of said techniques
  • Deadline: 16 April 2019.
  • Feedback deadlines: 8 March 2019 for the peer-feedback, 22 March 2019 for the 1-on-1 feedback (see below)
  • Word count limit: 2000 words (excl. code supplement; do not exceed this word count limit!)

Project requirements

You are free to choose a topic that you want to address using data science techniques learned in this module. The problem should be crime/security related.

We strongly encourage ambitious/daring/new/fancy ways of addressing a problem and originality is highly valued in this assignment.

There are few cornerstones for this project that we expect you to include. In this module, we cover three blocks of data science for crime science: (1) obtaining data with webscaping/APIs, (2) using and analysing text data, and (3) applying machine learning models.

In this module you have to use at least 2 of these 3 blocks. The combination of blocks is up to you and you are free to include all three. Some inspirations for each block are listed below (this list is of course not exhaustive):

Block 1: Obtaining data with webscaping/APIs

  • retrieving data from APIs that were covered in the module
  • new APIs that have a similar R access pipeline
  • obtaining data through custom-made webscraping scripts

Block 2: Using and analysing text data

  • modelling texts through linguistic representations
  • processing + cleaning texts for further analysis
  • using text metrics to compare groups/clusters of texts

Block 3: Applying machine learning models

  • building predictive models (supervised machine learning)
  • applying clustering methods (unsupervised machine learning)

Grading guidelines

Requirements to be graded:

  • submit the peer-feedback form in time
  • submit the 1-on-1 feedback form in time
  • attend the feedback sessions
  • submit the report + code supplement before the deadline

Grading criteria:

  • Originality of research question/problem – 15%
  • Quality of data science techniques – 20%
  • Quality of analytical methods – 15%
  • Interpretation of findings – 20%
  • Clarity of report – 10%
  • Quality of report (layout, formatting) – 10%
  • Quality of R code (clarity, documentation, reproducibility) – 10%

Feedback sessions

Since a full project is a major step in your data science skills career, we will hold two feedback sessions to help you in the process.

  • Peer-feedback session: you will exchange an outline of your project idea (i.e. which problem do you want to address and how?) with a fellow student. The purpose of the peer feedback is to get an independent view on your project early in the process. The peer-feedback session will be held at end of the “Advanced, promises and problems” lecture on 11 March 2019. Use this template for your feedback submission.
  • 1-on-1 feedback session: you will receive individualised feedback from both Bennett and Felix in a 1-on-1 session where we will help you with questions and give you final advice to fine-tune your project. These sessions will take 10 minutes per student and will be held on 25 March 2019 (timeslots to be arranged). Submit your feedback on Moodle through using this template.

Deliverables

  • Feedback report for peer-feedback
  • Feedback review report for a fellow student
  • Feedback report for 1-on-1 session
  • Report of your project: the report has a word count limit of 2,000 words and should contain the following sections
    • intro to the problem
    • aim of your project
    • data (i.e. how were the data obtained, which data were used, …)
    • method (i.e. what did you do with the data, which analytical models did you apply, …)
    • analytical plan (i.e. clearly set out which analysis you used)
    • results
    • discussion (incl. limitations of your work + outlook on future research)
  • Code supplement: you also submit the code needed to fully reproduce your work in an R Notebook
    • the code should be clearly documented
    • the header should contain your student assessment ID (i.e. the code is fully anonymised)
    • the code must be reproducible both for the data collection (in case of webscraping) and for the analysis
  • Data: submit your data in the format that you use for the analysis as well as in the raw format
    • raw format: this can be individual text files, existing spreadsheets, news paper articles that were scraped, etc.
    • analysis file: this is typically a .csv file or an .RData file that you load in when you analyse the data

LS0tCnRpdGxlOiAiQUNBIGFzc2lnbm1lbnQiCm91dHB1dDoKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKIyMgQXBwbGllZCBEYXRhIFNjaWVuY2UgUHJvamVjdAoKVGhpcyBhc3Nlc3NtZW50IGlzIHRoZSBjYXBzdG9uZSBwcm9qZWN0IG9mIHRoZSBtb2R1bGUuIEl0IHJlcXVpcmVzIHlvdSB0byBhZGRyZXNzIGEgY3JpbWUgYW5kIHNlY3VyaXR5IHNjaWVuY2UgcmVzZWFyY2ggcHJvYmxlbSBpbiB0aGUgZnVsbCBkYXRhIHNjaWVuY2Ugd29ya2Zsb3cgKGUuZy4sIG9idGFpbmluZyB0aGUgZGF0YSwgcHJvY2Vzc2luZyB0aGUgZGF0YSwgbW9kZWxsaW5nIHRoZSBkYXRhLCBidWlsZGluZyBwcmVkaWN0aXZlIG1vZGVscywgcmVwb3J0aW5nIG9uIHRoZSBmaW5kaW5ncywgaW50ZXJwcmV0aW5nIHRoZSBvdXRjb21lcykuIFlvdSB3aWxsIHdyaXRlIGEgYnJpZWYgcmVwb3J0IG9uIHlvdXIgcHJvamVjdCAoYSB0ZW1wbGF0ZSB3aWxsIGJlIHByb3ZpZGVkKSBhbmQgeW91IGhhdmUgdG8gc3VibWl0IHRoZSBSIGNvZGUgbmVlZGVkIHRvIHJlcHJvZHVjZSB5b3VyIGZpbmRpbmdzLiBBZnRlciBwYXNzaW5nIHRoaXMgYXNzZXNzbWVudCwgeW91IHdpbGwgaGF2ZSB0aGUgZGVtb25zdHJhdGVkIHRoZSBza2lsbHMgdG8gc29sdmUgYSBwcm9ibGVtIHVzaW5nIGRhdGEgc2NpZW5jZSB0ZWNobmlxdWVzLgoKIyMjIEJhc2ljIGluZm8KCi0gV2VpZ2h0IGZvciBmaW5hbCBncmFkZTogNzAlCi0gTGVhcm5pbmcgb3V0Y29tZXMgdGVzdGVkOgogICAgLSBkZW1vbnN0cmF0aW5nIGtub3dsZWRnZSBvZiBhIGJyb2FkZXIgcmFuZ2Ugb2YgYW5hbHl0aWNhbCB0ZWNobmlxdWVzIHVzZWQgaW4gdGhlIGZpZWxkIG9mIFNlY3VyaXR5IGFuZCBDcmltZSBTY2llbmNlCiAgICAtIHBlcmZvcm1pbmcgZGF0YSBzY2llbmNlIGFuYWx5c2VzIG9uIGNyaW1lIGFuZC9vci1zZWN1cml0eSByZWxhdGVkIGlzc3VlcwogICAgLSBhcHBseWluZyB0aGUgZGF0YSBzY2llbmNlIHBpcGVsaW5lIG9uIGNyaW1lIGFuZC9vci1zZWN1cml0eSByZWxhdGVkIGlzc3VlcwogICAgLSBpbnRlcnByZXRpbmcgYW5kIGVmZmVjdGl2ZWx5IHJlcG9ydGluZyB0aGUgcmVzdWx0cyBvZiBzYWlkIHRlY2huaXF1ZXMKLSBEZWFkbGluZTogKioxNiBBcHJpbCAyMDE5KiouCi0gRmVlZGJhY2sgZGVhZGxpbmVzOiAqKjggTWFyY2ggMjAxOSoqIGZvciB0aGUgcGVlci1mZWVkYmFjaywgKioyMiBNYXJjaCAyMDE5KiogZm9yIHRoZSAxLW9uLTEgZmVlZGJhY2sgKHNlZSBiZWxvdykKLSBXb3JkIGNvdW50IGxpbWl0OiAyMDAwIHdvcmRzIChleGNsLiBjb2RlIHN1cHBsZW1lbnQ7IGRvIG5vdCBleGNlZWQgdGhpcyB3b3JkIGNvdW50IGxpbWl0ISkKCiMjIyBQcm9qZWN0IHJlcXVpcmVtZW50cwoKWW91IGFyZSBmcmVlIHRvIGNob29zZSBhIHRvcGljIHRoYXQgeW91IHdhbnQgdG8gYWRkcmVzcyB1c2luZyBkYXRhIHNjaWVuY2UgdGVjaG5pcXVlcyBsZWFybmVkIGluIHRoaXMgbW9kdWxlLiBUaGUgcHJvYmxlbSBzaG91bGQgYmUgY3JpbWUvc2VjdXJpdHkgcmVsYXRlZC4KCldlIHN0cm9uZ2x5IGVuY291cmFnZSBhbWJpdGlvdXMvZGFyaW5nL25ldy9mYW5jeSB3YXlzIG9mIGFkZHJlc3NpbmcgYSBwcm9ibGVtIGFuZCBvcmlnaW5hbGl0eSBpcyBoaWdobHkgdmFsdWVkIGluIHRoaXMgYXNzaWdubWVudC4KClRoZXJlIGFyZSBmZXcgY29ybmVyc3RvbmVzIGZvciB0aGlzIHByb2plY3QgdGhhdCB3ZSBleHBlY3QgeW91IHRvIGluY2x1ZGUuIEluIHRoaXMgbW9kdWxlLCB3ZSBjb3ZlciB0aHJlZSBibG9ja3Mgb2YgZGF0YSBzY2llbmNlIGZvciBjcmltZSBzY2llbmNlOiAoMSkgb2J0YWluaW5nIGRhdGEgd2l0aCB3ZWJzY2FwaW5nL0FQSXMsICgyKSB1c2luZyBhbmQgYW5hbHlzaW5nIHRleHQgZGF0YSwgYW5kICgzKSBhcHBseWluZyBtYWNoaW5lIGxlYXJuaW5nIG1vZGVscy4KCkluIHRoaXMgbW9kdWxlIHlvdSBoYXZlIHRvIHVzZSBhdCBsZWFzdCAyIG9mIHRoZXNlIDMgYmxvY2tzLiBUaGUgY29tYmluYXRpb24gb2YgYmxvY2tzIGlzIHVwIHRvIHlvdSBhbmQgeW91IGFyZSBmcmVlIHRvIGluY2x1ZGUgYWxsIHRocmVlLiBTb21lIGluc3BpcmF0aW9ucyBmb3IgZWFjaCBibG9jayBhcmUgbGlzdGVkIGJlbG93ICh0aGlzIGxpc3QgaXMgb2YgY291cnNlIG5vdCBleGhhdXN0aXZlKToKCioqQmxvY2sgMTogT2J0YWluaW5nIGRhdGEgd2l0aCB3ZWJzY2FwaW5nL0FQSXMqKgoKLSByZXRyaWV2aW5nIGRhdGEgZnJvbSBBUElzIHRoYXQgd2VyZSBjb3ZlcmVkIGluIHRoZSBtb2R1bGUKLSBuZXcgQVBJcyB0aGF0IGhhdmUgYSBzaW1pbGFyIFIgYWNjZXNzIHBpcGVsaW5lCi0gb2J0YWluaW5nIGRhdGEgdGhyb3VnaCBjdXN0b20tbWFkZSB3ZWJzY3JhcGluZyBzY3JpcHRzCgoqKkJsb2NrIDI6IFVzaW5nIGFuZCBhbmFseXNpbmcgdGV4dCBkYXRhKioKCi0gbW9kZWxsaW5nIHRleHRzIHRocm91Z2ggbGluZ3Vpc3RpYyByZXByZXNlbnRhdGlvbnMKLSBwcm9jZXNzaW5nICsgY2xlYW5pbmcgdGV4dHMgZm9yIGZ1cnRoZXIgYW5hbHlzaXMKLSB1c2luZyB0ZXh0IG1ldHJpY3MgdG8gY29tcGFyZSBncm91cHMvY2x1c3RlcnMgb2YgdGV4dHMKCioqQmxvY2sgMzogQXBwbHlpbmcgbWFjaGluZSBsZWFybmluZyBtb2RlbHMqKgoKLSBidWlsZGluZyBwcmVkaWN0aXZlIG1vZGVscyAoc3VwZXJ2aXNlZCBtYWNoaW5lIGxlYXJuaW5nKQotIGFwcGx5aW5nIGNsdXN0ZXJpbmcgbWV0aG9kcyAodW5zdXBlcnZpc2VkIG1hY2hpbmUgbGVhcm5pbmcpCgoKIyMjIEdyYWRpbmcgZ3VpZGVsaW5lcwoKUmVxdWlyZW1lbnRzIHRvIGJlIGdyYWRlZDoKCi0gc3VibWl0IHRoZSBwZWVyLWZlZWRiYWNrIGZvcm0gaW4gdGltZQotIHN1Ym1pdCB0aGUgMS1vbi0xIGZlZWRiYWNrIGZvcm0gaW4gdGltZQotIGF0dGVuZCB0aGUgZmVlZGJhY2sgc2Vzc2lvbnMKLSBzdWJtaXQgdGhlIHJlcG9ydCArIGNvZGUgc3VwcGxlbWVudCBiZWZvcmUgdGhlIGRlYWRsaW5lCgpHcmFkaW5nIGNyaXRlcmlhOgoKLSBPcmlnaW5hbGl0eSBvZiByZXNlYXJjaCBxdWVzdGlvbi9wcm9ibGVtIC0tIDE1JQotIFF1YWxpdHkgb2YgZGF0YSBzY2llbmNlIHRlY2huaXF1ZXMgLS0gMjAlCi0gUXVhbGl0eSBvZiBhbmFseXRpY2FsIG1ldGhvZHMgLS0gMTUlCi0gSW50ZXJwcmV0YXRpb24gb2YgZmluZGluZ3MgLS0gMjAlCi0gQ2xhcml0eSBvZiByZXBvcnQgLS0gMTAlCi0gUXVhbGl0eSBvZiByZXBvcnQgKGxheW91dCwgZm9ybWF0dGluZykgLS0gMTAlCi0gUXVhbGl0eSBvZiBSIGNvZGUgKGNsYXJpdHksIGRvY3VtZW50YXRpb24sIHJlcHJvZHVjaWJpbGl0eSkgLS0gMTAlCgoKIyMjIEZlZWRiYWNrIHNlc3Npb25zCgpTaW5jZSBhIGZ1bGwgcHJvamVjdCBpcyBhIG1ham9yIHN0ZXAgaW4geW91ciBkYXRhIHNjaWVuY2Ugc2tpbGxzIGNhcmVlciwgd2Ugd2lsbCBob2xkIHR3byBmZWVkYmFjayBzZXNzaW9ucyB0byBoZWxwIHlvdSBpbiB0aGUgcHJvY2Vzcy4KCi0gUGVlci1mZWVkYmFjayBzZXNzaW9uOiB5b3Ugd2lsbCBleGNoYW5nZSBhbiBvdXRsaW5lIG9mIHlvdXIgcHJvamVjdCBpZGVhIChpLmUuIHdoaWNoIHByb2JsZW0gZG8geW91IHdhbnQgdG8gYWRkcmVzcyBhbmQgaG93Pykgd2l0aCBhIGZlbGxvdyBzdHVkZW50LiBUaGUgcHVycG9zZSBvZiB0aGUgcGVlciBmZWVkYmFjayBpcyB0byBnZXQgYW4gaW5kZXBlbmRlbnQgdmlldyBvbiB5b3VyIHByb2plY3QgZWFybHkgaW4gdGhlIHByb2Nlc3MuIFRoZSBwZWVyLWZlZWRiYWNrIHNlc3Npb24gd2lsbCBiZSBoZWxkIGF0IGVuZCBvZiB0aGUg4oCcQWR2YW5jZWQsIHByb21pc2VzIGFuZCBwcm9ibGVtc+KAnSBsZWN0dXJlIG9uICoqMTEgTWFyY2ggMjAxOSoqLiBVc2UgW3RoaXNdKGh0dHBzOi8vcmF3LmdpdGhhY2suY29tL2Jlbi1hYXJvbjE4OC91Y2xfYWNhXzIwMTgyMDE5L21hc3Rlci9hc3NpZ25tZW50cy9wZWVyX2ZlZWRiYWNrX3N1Ym1pc3Npb24ubmIuaHRtbCkgdGVtcGxhdGUgZm9yIHlvdXIgZmVlZGJhY2sgc3VibWlzc2lvbi4KLSAxLW9uLTEgZmVlZGJhY2sgc2Vzc2lvbjogeW91IHdpbGwgcmVjZWl2ZSBpbmRpdmlkdWFsaXNlZCBmZWVkYmFjayBmcm9tIGJvdGggQmVubmV0dCBhbmQgRmVsaXggaW4gYSAxLW9uLTEgc2Vzc2lvbiB3aGVyZSB3ZSB3aWxsIGhlbHAgeW91IHdpdGggcXVlc3Rpb25zIGFuZCBnaXZlIHlvdSBmaW5hbCBhZHZpY2UgdG8gZmluZS10dW5lIHlvdXIgcHJvamVjdC4gVGhlc2Ugc2Vzc2lvbnMgd2lsbCB0YWtlIDEwIG1pbnV0ZXMgcGVyIHN0dWRlbnQgYW5kIHdpbGwgYmUgaGVsZCBvbiAqKjI1IE1hcmNoIDIwMTkqKiAodGltZXNsb3RzIHRvIGJlIGFycmFuZ2VkKS4gU3VibWl0IHlvdXIgZmVlZGJhY2sgb24gTW9vZGxlIHRocm91Z2ggdXNpbmcgW3RoaXNdKGh0dHBzOi8vcmF3LmdpdGhhY2suY29tL2Jlbi1hYXJvbjE4OC91Y2xfYWNhXzIwMTgyMDE5L21hc3Rlci9hc3NpZ25tZW50cy9pbmRpdmlkdWFsX2ZlZWRiYWNrX3N1Ym1pc3Npb24ubmIuaHRtbCkgdGVtcGxhdGUuCgoKIyMjIERlbGl2ZXJhYmxlcwoKLSBGZWVkYmFjayByZXBvcnQgZm9yIHBlZXItZmVlZGJhY2sKLSBGZWVkYmFjayByZXZpZXcgcmVwb3J0IGZvciBhIGZlbGxvdyBzdHVkZW50Ci0gRmVlZGJhY2sgcmVwb3J0IGZvciAxLW9uLTEgc2Vzc2lvbgotIFJlcG9ydCBvZiB5b3VyIHByb2plY3Q6IHRoZSByZXBvcnQgaGFzIGEgd29yZCBjb3VudCBsaW1pdCBvZiAyLDAwMCB3b3JkcyBhbmQgc2hvdWxkIGNvbnRhaW4gdGhlIGZvbGxvd2luZyBzZWN0aW9ucwogICAgLSBpbnRybyB0byB0aGUgcHJvYmxlbQogICAgLSBhaW0gb2YgeW91ciBwcm9qZWN0CiAgICAtIGRhdGEgKGkuZS4gaG93IHdlcmUgdGhlIGRhdGEgb2J0YWluZWQsIHdoaWNoIGRhdGEgd2VyZSB1c2VkLCAuLi4pCiAgICAtIG1ldGhvZCAoaS5lLiB3aGF0IGRpZCB5b3UgZG8gd2l0aCB0aGUgZGF0YSwgd2hpY2ggYW5hbHl0aWNhbCBtb2RlbHMgZGlkIHlvdSBhcHBseSwgLi4uKQogICAgLSBhbmFseXRpY2FsIHBsYW4gKGkuZS4gY2xlYXJseSBzZXQgb3V0IHdoaWNoIGFuYWx5c2lzIHlvdSB1c2VkKQogICAgLSByZXN1bHRzCiAgICAtIGRpc2N1c3Npb24gKGluY2wuIGxpbWl0YXRpb25zIG9mIHlvdXIgd29yayArIG91dGxvb2sgb24gZnV0dXJlIHJlc2VhcmNoKQotIENvZGUgc3VwcGxlbWVudDogeW91IGFsc28gc3VibWl0IHRoZSBjb2RlIG5lZWRlZCB0byBmdWxseSByZXByb2R1Y2UgeW91ciB3b3JrIGluIGFuIFIgTm90ZWJvb2sKICAgIC0gdGhlIGNvZGUgc2hvdWxkIGJlIGNsZWFybHkgZG9jdW1lbnRlZAogICAgLSB0aGUgaGVhZGVyIHNob3VsZCBjb250YWluIHlvdXIgc3R1ZGVudCBhc3Nlc3NtZW50IElEIChpLmUuIHRoZSBjb2RlIGlzIGZ1bGx5IGFub255bWlzZWQpCiAgICAtIHRoZSBjb2RlIG11c3QgYmUgcmVwcm9kdWNpYmxlIGJvdGggZm9yIHRoZSBkYXRhIGNvbGxlY3Rpb24gKGluIGNhc2Ugb2Ygd2Vic2NyYXBpbmcpIGFuZCBmb3IgdGhlIGFuYWx5c2lzCi0gRGF0YTogc3VibWl0IHlvdXIgZGF0YSBpbiB0aGUgZm9ybWF0IHRoYXQgeW91IHVzZSBmb3IgdGhlIGFuYWx5c2lzIGFzIHdlbGwgYXMgaW4gdGhlIHJhdyBmb3JtYXQKICAgIC0gcmF3IGZvcm1hdDogdGhpcyBjYW4gYmUgaW5kaXZpZHVhbCB0ZXh0IGZpbGVzLCBleGlzdGluZyBzcHJlYWRzaGVldHMsIG5ld3MgcGFwZXIgYXJ0aWNsZXMgdGhhdCB3ZXJlIHNjcmFwZWQsIGV0Yy4KICAgIC0gYW5hbHlzaXMgZmlsZTogdGhpcyBpcyB0eXBpY2FsbHkgYSBgLmNzdmAgZmlsZSBvciBhbiBgLlJEYXRhYCBmaWxlIHRoYXQgeW91IGxvYWQgaW4gd2hlbiB5b3UgYW5hbHlzZSB0aGUgZGF0YQogICAgCgotLS0=