Tutorial 2, Probability, Statistics and Modelling 2, BSc Security and Crime Science, UCL
Outcomes of this tutorial
In this tutorial, you will:
- build and evaluate your own multiple regression models
- build and evaluate your own logistic regression models
- conduct hypothesis testing
- perform generalized ANOVA/t-test testing
Linear regression
Read the csv file located at data/tutorial2/black_friday/sample_black_friday.csv
as a dataframe, and name it blackfriday
(Details on the data).
#your code
Familiarise yourself with the dataset:
#your code
Task 1:
Build a linear model that predicts the purchase size based on gender. Use the glm
function to do this.
#your code
Task 2:
Assess the model fit using a “fit” metric that tells you how far on average the model mis-estimated the purchase size.
#your code
#your code
Task 3:
Now build two competing models:
- one model with the added variable
Marital_Status
- and one model with all terms possible from the two variables
Marital_Status
and Gender
#your code
Now compare the sum of squared residuals of all three models:
#your code
Now compare the three models statistically. What are your conclusions?
Hint: make sure to check the documentation for the ?anova function. You want to use the “F” test.
#your code
Logistic regression
Load the dataframe called stop_search_met
from the file data/tutorial2/stop_and_search/stop_and_search_sample.RData
.
#your code
Familiarise yourself with the dataset:
#your code
Task 4:
Subset the data to contain only the two levels of the variable Outcome
“Arrest” and “A no further action disposal”.
#your code
Now build a logistic regression model on the Outcome
variable as outcome variable modelled through the gender and age of the suspect.
#your code
What are your conclusions about the effects of either predictor variable?
#your code
Task 5:
Expand the model and include the predictor variable Officer.defined.ethnicity
.
#your code
How does this affect the model fit?
#your code
Task 6:
Examine the variable Type
of the original dataset:
#your code
Now exclude the level Vehicle search
and build a logistic regression model on the two remaining levels. Try to answer the question whether race or gender affected the type of stop and search.
#your code
GLM for group comparisons
Use the blackfriday dataset.
Check the documentation in R for the aov(...)
function.
Note that an outstanding online help is https://www.rdocumentation.org/.
Task 7:
Does the amount of money spent on purchaes on black friday differ between female and male shoppers?
Show your results with a t-test, an ANOVA and the GLM.
#your code
#your code
#your code
Show that: F = sqrt(t)
Note that for the GLM implementation, you will have to run a model comparison against the model itself using the F test, to obtain the F-value.
Note also that the t.test
corrects for unequal variances, which the other two don’t.
#your code
Task 8:
Use an ANOVA and the GLM to test the hypothesis that age affects the purchase size.
#your code
#your code
What is your conclusion?
Task 9:
Bonus question
Use the blackfriday
dataset and try to adopt the role of a business analyst. Suppose you are interested in customer profiling and want to target married customers differently than single customers.
Build a model that models the customers’ marital status.
#your code
END
LS0tCnRpdGxlOiBHTE0KYXV0aG9yOiBCIEtsZWluYmVyZwpkYXRlOiAyOSBKYW51YXJ5IDIwMTkKc3VidGl0bGU6IERlcHQgb2YgU2VjdXJpdHkgYW5kIENyaW1lIFNjaWVuY2UsIFVDTApvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgotLS0KClR1dG9yaWFsIDIsIFByb2JhYmlsaXR5LCBTdGF0aXN0aWNzIGFuZCBNb2RlbGxpbmcgMiwgQlNjIFNlY3VyaXR5IGFuZCBDcmltZSBTY2llbmNlLCBVQ0wKCi0tLQoKIyMgT3V0Y29tZXMgb2YgdGhpcyB0dXRvcmlhbAoKSW4gdGhpcyB0dXRvcmlhbCwgeW91IHdpbGw6CgotIGJ1aWxkIGFuZCBldmFsdWF0ZSB5b3VyIG93biBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVscwotIGJ1aWxkIGFuZCBldmFsdWF0ZSB5b3VyIG93biBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVscwotIGNvbmR1Y3QgaHlwb3RoZXNpcyB0ZXN0aW5nCiAgICAtIHQtdGVzdAogICAgLSBBTk9WQQotIHBlcmZvcm0gZ2VuZXJhbGl6ZWQgQU5PVkEvdC10ZXN0IHRlc3RpbmcKCi0tLQoKIyMgTGluZWFyIHJlZ3Jlc3Npb24KClJlYWQgdGhlIGNzdiBmaWxlIGxvY2F0ZWQgYXQgYGRhdGEvdHV0b3JpYWwyL2JsYWNrX2ZyaWRheS9zYW1wbGVfYmxhY2tfZnJpZGF5LmNzdmAgYXMgYSBkYXRhZnJhbWUsIGFuZCBuYW1lIGl0IGBibGFja2ZyaWRheWAgWyhEZXRhaWxzIG9uIHRoZSBkYXRhKV0oaHR0cHM6Ly93d3cua2FnZ2xlLmNvbS9tZWhkaWRhZy9ibGFjay1mcmlkYXkvdmVyc2lvbi8xKS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCkZhbWlsaWFyaXNlIHlvdXJzZWxmIHdpdGggdGhlIGRhdGFzZXQ6CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyMgVGFzayAxOgoKQnVpbGQgYSBsaW5lYXIgbW9kZWwgdGhhdCBwcmVkaWN0cyB0aGUgcHVyY2hhc2Ugc2l6ZSBiYXNlZCBvbiBnZW5kZXIuIFVzZSB0aGUgYGdsbWAgZnVuY3Rpb24gdG8gZG8gdGhpcy4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCiMjIyBUYXNrIDI6CgpBc3Nlc3MgdGhlIG1vZGVsIGZpdCB1c2luZyBhICJmaXQiIG1ldHJpYyB0aGF0IHRlbGxzIHlvdSBob3cgZmFyIG9uIGF2ZXJhZ2UgdGhlIG1vZGVsIG1pcy1lc3RpbWF0ZWQgdGhlIHB1cmNoYXNlIHNpemUuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKIyMjIFRhc2sgMzoKCk5vdyBidWlsZCB0d28gY29tcGV0aW5nIG1vZGVsczoKCi0gb25lIG1vZGVsIHdpdGggdGhlIGFkZGVkIHZhcmlhYmxlIGBNYXJpdGFsX1N0YXR1c2AKLSBhbmQgb25lIG1vZGVsIHdpdGggYWxsIHRlcm1zIHBvc3NpYmxlIGZyb20gdGhlIHR3byB2YXJpYWJsZXMgYE1hcml0YWxfU3RhdHVzYCBhbmQgYEdlbmRlcmAKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCk5vdyBjb21wYXJlIHRoZSBzdW0gb2Ygc3F1YXJlZCByZXNpZHVhbHMgb2YgYWxsIHRocmVlIG1vZGVsczoKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCk5vdyBjb21wYXJlIHRoZSB0aHJlZSBtb2RlbHMgc3RhdGlzdGljYWxseS4gV2hhdCBhcmUgeW91ciBjb25jbHVzaW9ucz8KCl9IaW50Ol8gbWFrZSBzdXJlIHRvIGNoZWNrIHRoZSBkb2N1bWVudGF0aW9uIGZvciB0aGUgP2Fub3ZhIGZ1bmN0aW9uLiBZb3Ugd2FudCB0byB1c2UgdGhlICJGIiB0ZXN0LgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKIyMgTG9naXN0aWMgcmVncmVzc2lvbgoKTG9hZCB0aGUgZGF0YWZyYW1lIGNhbGxlZCBgc3RvcF9zZWFyY2hfbWV0YCBmcm9tIHRoZSBmaWxlIGBkYXRhL3R1dG9yaWFsMi9zdG9wX2FuZF9zZWFyY2gvc3RvcF9hbmRfc2VhcmNoX3NhbXBsZS5SRGF0YWAuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpGYW1pbGlhcmlzZSB5b3Vyc2VsZiB3aXRoIHRoZSBkYXRhc2V0OgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKIyMjIFRhc2sgNDoKClN1YnNldCB0aGUgZGF0YSB0byBjb250YWluIG9ubHkgdGhlIHR3byBsZXZlbHMgb2YgdGhlIHZhcmlhYmxlIGBPdXRjb21lYCAiQXJyZXN0IiBhbmQgIkEgbm8gZnVydGhlciBhY3Rpb24gZGlzcG9zYWwiLgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKTm93IGJ1aWxkIGEgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCBvbiB0aGUgYE91dGNvbWVgIHZhcmlhYmxlIGFzIG91dGNvbWUgdmFyaWFibGUgbW9kZWxsZWQgdGhyb3VnaCB0aGUgZ2VuZGVyIGFuZCBhZ2Ugb2YgdGhlIHN1c3BlY3QuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpXaGF0IGFyZSB5b3VyIGNvbmNsdXNpb25zIGFib3V0IHRoZSBlZmZlY3RzIG9mIGVpdGhlciBwcmVkaWN0b3IgdmFyaWFibGU/CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKCiMjIyBUYXNrIDU6CgpFeHBhbmQgdGhlIG1vZGVsIGFuZCBpbmNsdWRlIHRoZSBwcmVkaWN0b3IgdmFyaWFibGUgYE9mZmljZXIuZGVmaW5lZC5ldGhuaWNpdHlgLgoKYGBge3J9CiN5b3VyIGNvZGUKCmBgYAoKSG93IGRvZXMgdGhpcyBhZmZlY3QgdGhlIG1vZGVsIGZpdD8KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCiMjIyBUYXNrIDY6CgpFeGFtaW5lIHRoZSB2YXJpYWJsZSBgVHlwZWAgb2YgdGhlIG9yaWdpbmFsIGRhdGFzZXQ6CgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpOb3cgZXhjbHVkZSB0aGUgbGV2ZWwgYFZlaGljbGUgc2VhcmNoYCBhbmQgYnVpbGQgYSBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIG9uIHRoZSB0d28gcmVtYWluaW5nIGxldmVscy4gVHJ5IHRvIGFuc3dlciB0aGUgcXVlc3Rpb24gd2hldGhlciByYWNlIG9yIGdlbmRlciBhZmZlY3RlZCB0aGUgdHlwZSBvZiBzdG9wIGFuZCBzZWFyY2guCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyBHTE0gZm9yIGdyb3VwIGNvbXBhcmlzb25zCgpVc2UgdGhlIGJsYWNrZnJpZGF5IGRhdGFzZXQuCgpDaGVjayB0aGUgZG9jdW1lbnRhdGlvbiBpbiBSIGZvciB0aGUgYGFvdiguLi4pYCBmdW5jdGlvbi4KCk5vdGUgdGhhdCBhbiBvdXRzdGFuZGluZyBvbmxpbmUgaGVscCBpcyBbaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnL10oaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnLykuCgojIyMgVGFzayA3OgoKRG9lcyB0aGUgYW1vdW50IG9mIG1vbmV5IHNwZW50IG9uIHB1cmNoYWVzIG9uIGJsYWNrIGZyaWRheSBkaWZmZXIgYmV0d2VlbiBmZW1hbGUgYW5kIG1hbGUgc2hvcHBlcnM/CgpTaG93IHlvdXIgcmVzdWx0cyB3aXRoIGEgdC10ZXN0LCBhbiBBTk9WQSBhbmQgdGhlIEdMTS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgoKU2hvdyB0aGF0OiBgRiA9IHNxcnQodClgCgpOb3RlIHRoYXQgZm9yIHRoZSBHTE0gaW1wbGVtZW50YXRpb24sIHlvdSB3aWxsIGhhdmUgdG8gcnVuIGEgbW9kZWwgY29tcGFyaXNvbiBhZ2FpbnN0IHRoZSBtb2RlbCBpdHNlbGYgdXNpbmcgdGhlIEYgdGVzdCwgdG8gb2J0YWluIHRoZSBGLXZhbHVlLiAgCk5vdGUgYWxzbyB0aGF0IHRoZSBgdC50ZXN0YCBjb3JyZWN0cyBmb3IgdW5lcXVhbCB2YXJpYW5jZXMsIHdoaWNoIHRoZSBvdGhlciB0d28gZG9uJ3QuCgpgYGB7cn0KI3lvdXIgY29kZQoKYGBgCgojIyMgVGFzayA4OgoKVXNlIGFuIEFOT1ZBIGFuZCB0aGUgR0xNIHRvIHRlc3QgdGhlIGh5cG90aGVzaXMgdGhhdCBhZ2UgYWZmZWN0cyB0aGUgcHVyY2hhc2Ugc2l6ZS4KCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCmBgYHtyfQojeW91ciBjb2RlCgpgYGAKCldoYXQgaXMgeW91ciBjb25jbHVzaW9uPwoKIyMjIFRhc2sgOToKCipCb251cyBxdWVzdGlvbioKClVzZSB0aGUgYGJsYWNrZnJpZGF5YCBkYXRhc2V0IGFuZCB0cnkgdG8gYWRvcHQgdGhlIHJvbGUgb2YgYSBidXNpbmVzcyBhbmFseXN0LiBTdXBwb3NlIHlvdSBhcmUgaW50ZXJlc3RlZCBpbiBjdXN0b21lciBwcm9maWxpbmcgYW5kIHdhbnQgdG8gdGFyZ2V0IG1hcnJpZWQgY3VzdG9tZXJzIGRpZmZlcmVudGx5IHRoYW4gc2luZ2xlIGN1c3RvbWVycy4KCkJ1aWxkIGEgbW9kZWwgdGhhdCBtb2RlbHMgdGhlIGN1c3RvbWVycycgbWFyaXRhbCBzdGF0dXMuCgpgYGB7cn0KI3lvdXIgY29kZQpgYGAKCgojIyBFTkQKCi0tLQoK