Tutorial 2, Probability, Statistics and Modelling 2, BSc Security and Crime Science, UCL


Outcomes of this tutorial

In this tutorial, you will:


Linear regression

Read the csv file located at data/tutorial2/black_friday/sample_black_friday.csv as a dataframe, and name it blackfriday (Details on the data).

#your code

Familiarise yourself with the dataset:

#your code

Task 1:

Build a linear model that predicts the purchase size based on gender. Use the glm function to do this.

#your code

Task 2:

Assess the model fit using a “fit” metric that tells you how far on average the model mis-estimated the purchase size.

#your code
#your code

Task 3:

Now build two competing models:

  • one model with the added variable Marital_Status
  • and one model with all terms possible from the two variables Marital_Status and Gender
#your code

Now compare the sum of squared residuals of all three models:

#your code

Now compare the three models statistically. What are your conclusions?

Hint: make sure to check the documentation for the ?anova function. You want to use the “F” test.

#your code

Logistic regression

Load the dataframe called stop_search_met from the file data/tutorial2/stop_and_search/stop_and_search_sample.RData.

#your code

Familiarise yourself with the dataset:

#your code

Task 4:

Subset the data to contain only the two levels of the variable Outcome “Arrest” and “A no further action disposal”.

#your code

Now build a logistic regression model on the Outcome variable as outcome variable modelled through the gender and age of the suspect.

#your code

What are your conclusions about the effects of either predictor variable?

#your code

Task 5:

Expand the model and include the predictor variable Officer.defined.ethnicity.

#your code

How does this affect the model fit?

#your code

Task 6:

Examine the variable Type of the original dataset:

#your code

Now exclude the level Vehicle search and build a logistic regression model on the two remaining levels. Try to answer the question whether race or gender affected the type of stop and search.

#your code

GLM for group comparisons

Use the blackfriday dataset.

Check the documentation in R for the aov(...) function.

Note that an outstanding online help is https://www.rdocumentation.org/.

Task 7:

Does the amount of money spent on purchaes on black friday differ between female and male shoppers?

Show your results with a t-test, an ANOVA and the GLM.

#your code
#your code
#your code

Show that: F = sqrt(t)

Note that for the GLM implementation, you will have to run a model comparison against the model itself using the F test, to obtain the F-value.
Note also that the t.test corrects for unequal variances, which the other two don’t.

#your code

Task 8:

Use an ANOVA and the GLM to test the hypothesis that age affects the purchase size.

#your code
#your code

What is your conclusion?

Task 9:

Bonus question

Use the blackfriday dataset and try to adopt the role of a business analyst. Suppose you are interested in customer profiling and want to target married customers differently than single customers.

Build a model that models the customers’ marital status.

#your code

END


LS0tCnRpdGxlOiBHTE0KYXV0aG9yOiBCIEtsZWluYmVyZwpkYXRlOiAyOSBKYW51YXJ5IDIwMTkKc3VidGl0bGU6IERlcHQgb2YgU2VjdXJpdHkgYW5kIENyaW1lIFNjaWVuY2UsIFVDTApvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgotLS0KClR1dG9yaWFsIDIsIFByb2JhYmlsaXR5LCBTdGF0aXN0aWNzIGFuZCBNb2RlbGxpbmcgMiwgQlNjIFNlY3VyaXR5IGFuZCBDcmltZSBTY2llbmNlLCBVQ0wKCi0tLQoKIyMgT3V0Y29tZXMgb2YgdGhpcyB0dXRvcmlhbAoKSW4gdGhpcyB0dXRvcmlhbCwgeW91IHdpbGw6CgotIGJ1aWxkIGFuZCBldmFsdWF0ZSB5b3VyIG93biBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVscwotIGJ1aWxkIGFuZCBldmFsdWF0ZSB5b3VyIG93biBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVscwotIGNvbmR1Y3QgaHlwb3RoZXNpcyB0ZXN0aW5nCiAgICAtIHQtdGVzdAogICAgLSBBTk9WQQotIHBlcmZvcm0gZ2VuZXJhbGl6ZWQgQU5PVkEvdC10ZXN0IHRlc3RpbmcKCi0tLQoKIyMgTGluZWFyIHJlZ3Jlc3Npb24KClJlYWQgdGhlIGNzdiBmaWxlIGxvY2F0ZWQgYXQgYGRhdGEvdHV0b3JpYWwyL2JsYWNrX2ZyaWRheS9zYW1wbGVfYmxhY2tfZnJpZGF5LmNzdmAgYXMgYSBkYXRhZnJhbWUsIGFuZCBuYW1lIGl0IGBibGFja2ZyaWRheWAgWyhEZXRhaWxzIG9uIHRoZSBkYXRhKV0oaHR0cHM6Ly93d3cua2FnZ2xlLmNvbS9tZWhkaWRhZy9ibGFjay1mcmlkYXkvdmVyc2lvbi8xKS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCkZhbWlsaWFyaXNlIHlvdXJzZWxmIHdpdGggdGhlIGRhdGFzZXQ6CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyMgVGFzayAxOgoKQnVpbGQgYSBsaW5lYXIgbW9kZWwgdGhhdCBwcmVkaWN0cyB0aGUgcHVyY2hhc2Ugc2l6ZSBiYXNlZCBvbiBnZW5kZXIuIFVzZSB0aGUgYGdsbWAgZnVuY3Rpb24gdG8gZG8gdGhpcy4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCiMjIyBUYXNrIDI6CgpBc3Nlc3MgdGhlIG1vZGVsIGZpdCB1c2luZyBhICJmaXQiIG1ldHJpYyB0aGF0IHRlbGxzIHlvdSBob3cgZmFyIG9uIGF2ZXJhZ2UgdGhlIG1vZGVsIG1pcy1lc3RpbWF0ZWQgdGhlIHB1cmNoYXNlIHNpemUuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKIyMjIFRhc2sgMzoKCk5vdyBidWlsZCB0d28gY29tcGV0aW5nIG1vZGVsczoKCi0gb25lIG1vZGVsIHdpdGggdGhlIGFkZGVkIHZhcmlhYmxlIGBNYXJpdGFsX1N0YXR1c2AKLSBhbmQgb25lIG1vZGVsIHdpdGggYWxsIHRlcm1zIHBvc3NpYmxlIGZyb20gdGhlIHR3byB2YXJpYWJsZXMgYE1hcml0YWxfU3RhdHVzYCBhbmQgYEdlbmRlcmAKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCk5vdyBjb21wYXJlIHRoZSBzdW0gb2Ygc3F1YXJlZCByZXNpZHVhbHMgb2YgYWxsIHRocmVlIG1vZGVsczoKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCk5vdyBjb21wYXJlIHRoZSB0aHJlZSBtb2RlbHMgc3RhdGlzdGljYWxseS4gV2hhdCBhcmUgeW91ciBjb25jbHVzaW9ucz8KCl9IaW50Ol8gbWFrZSBzdXJlIHRvIGNoZWNrIHRoZSBkb2N1bWVudGF0aW9uIGZvciB0aGUgP2Fub3ZhIGZ1bmN0aW9uLiBZb3Ugd2FudCB0byB1c2UgdGhlICJGIiB0ZXN0LgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKIyMgTG9naXN0aWMgcmVncmVzc2lvbgoKTG9hZCB0aGUgZGF0YWZyYW1lIGNhbGxlZCBgc3RvcF9zZWFyY2hfbWV0YCBmcm9tIHRoZSBmaWxlIGBkYXRhL3R1dG9yaWFsMi9zdG9wX2FuZF9zZWFyY2gvc3RvcF9hbmRfc2VhcmNoX3NhbXBsZS5SRGF0YWAuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpGYW1pbGlhcmlzZSB5b3Vyc2VsZiB3aXRoIHRoZSBkYXRhc2V0OgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKIyMjIFRhc2sgNDoKClN1YnNldCB0aGUgZGF0YSB0byBjb250YWluIG9ubHkgdGhlIHR3byBsZXZlbHMgb2YgdGhlIHZhcmlhYmxlIGBPdXRjb21lYCAiQXJyZXN0IiBhbmQgIkEgbm8gZnVydGhlciBhY3Rpb24gZGlzcG9zYWwiLgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKTm93IGJ1aWxkIGEgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCBvbiB0aGUgYE91dGNvbWVgIHZhcmlhYmxlIGFzIG91dGNvbWUgdmFyaWFibGUgbW9kZWxsZWQgdGhyb3VnaCB0aGUgZ2VuZGVyIGFuZCBhZ2Ugb2YgdGhlIHN1c3BlY3QuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpXaGF0IGFyZSB5b3VyIGNvbmNsdXNpb25zIGFib3V0IHRoZSBlZmZlY3RzIG9mIGVpdGhlciBwcmVkaWN0b3IgdmFyaWFibGU/CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKCiMjIyBUYXNrIDU6CgpFeHBhbmQgdGhlIG1vZGVsIGFuZCBpbmNsdWRlIHRoZSBwcmVkaWN0b3IgdmFyaWFibGUgYE9mZmljZXIuZGVmaW5lZC5ldGhuaWNpdHlgLgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKSG93IGRvZXMgdGhpcyBhZmZlY3QgdGhlIG1vZGVsIGZpdD8KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCiMjIyBUYXNrIDY6CgpFeGFtaW5lIHRoZSB2YXJpYWJsZSBgVHlwZWAgb2YgdGhlIG9yaWdpbmFsIGRhdGFzZXQ6CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpOb3cgZXhjbHVkZSB0aGUgbGV2ZWwgYFZlaGljbGUgc2VhcmNoYCBhbmQgYnVpbGQgYSBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIG9uIHRoZSB0d28gcmVtYWluaW5nIGxldmVscy4gVHJ5IHRvIGFuc3dlciB0aGUgcXVlc3Rpb24gd2hldGhlciByYWNlIG9yIGdlbmRlciBhZmZlY3RlZCB0aGUgdHlwZSBvZiBzdG9wIGFuZCBzZWFyY2guCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyBHTE0gZm9yIGdyb3VwIGNvbXBhcmlzb25zCgpVc2UgdGhlIGJsYWNrZnJpZGF5IGRhdGFzZXQuCgpDaGVjayB0aGUgZG9jdW1lbnRhdGlvbiBpbiBSIGZvciB0aGUgYGFvdiguLi4pYCBmdW5jdGlvbi4KCk5vdGUgdGhhdCBhbiBvdXRzdGFuZGluZyBvbmxpbmUgaGVscCBpcyBbaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnL10oaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnLykuCgojIyMgVGFzayA3OgoKRG9lcyB0aGUgYW1vdW50IG9mIG1vbmV5IHNwZW50IG9uIHB1cmNoYWVzIG9uIGJsYWNrIGZyaWRheSBkaWZmZXIgYmV0d2VlbiBmZW1hbGUgYW5kIG1hbGUgc2hvcHBlcnM/CgpTaG93IHlvdXIgcmVzdWx0cyB3aXRoIGEgdC10ZXN0LCBhbiBBTk9WQSBhbmQgdGhlIEdMTS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKU2hvdyB0aGF0OiBgRiA9IHNxcnQodClgCgpOb3RlIHRoYXQgZm9yIHRoZSBHTE0gaW1wbGVtZW50YXRpb24sIHlvdSB3aWxsIGhhdmUgdG8gcnVuIGEgbW9kZWwgY29tcGFyaXNvbiBhZ2FpbnN0IHRoZSBtb2RlbCBpdHNlbGYgdXNpbmcgdGhlIEYgdGVzdCwgdG8gb2J0YWluIHRoZSBGLXZhbHVlLiAgCk5vdGUgYWxzbyB0aGF0IHRoZSBgdC50ZXN0YCBjb3JyZWN0cyBmb3IgdW5lcXVhbCB2YXJpYW5jZXMsIHdoaWNoIHRoZSBvdGhlciB0d28gZG9uJ3QuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyMgVGFzayA4OgoKVXNlIGFuIEFOT1ZBIGFuZCB0aGUgR0xNIHRvIHRlc3QgdGhlIGh5cG90aGVzaXMgdGhhdCBhZ2UgYWZmZWN0cyB0aGUgcHVyY2hhc2Ugc2l6ZS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCldoYXQgaXMgeW91ciBjb25jbHVzaW9uPwoKIyMjIFRhc2sgOToKCipCb251cyBxdWVzdGlvbioKClVzZSB0aGUgYGJsYWNrZnJpZGF5YCBkYXRhc2V0IGFuZCB0cnkgdG8gYWRvcHQgdGhlIHJvbGUgb2YgYSBidXNpbmVzcyBhbmFseXN0LiBTdXBwb3NlIHlvdSBhcmUgaW50ZXJlc3RlZCBpbiBjdXN0b21lciBwcm9maWxpbmcgYW5kIHdhbnQgdG8gdGFyZ2V0IG1hcnJpZWQgY3VzdG9tZXJzIGRpZmZlcmVudGx5IHRoYW4gc2luZ2xlIGN1c3RvbWVycy4KCkJ1aWxkIGEgbW9kZWwgdGhhdCBtb2RlbHMgdGhlIGN1c3RvbWVycycgbWFyaXRhbCBzdGF0dXMuCgpgYGB7cn0KI3lvdXIgY29kZQpgYGAKCgojIyBFTkQKCi0tLQoK