This document is the annotation layer to ./manipulation/1-aggregator.R
script, which uses the product of ./manipulation/0-greeter.R
to compute counts of conviction events at different aggregation levels, among which we will distinquish:
(4) |
person_id |
person level |
(3) |
person_id + offense_group |
person level with groups |
(2) |
person_id + begin_date |
sentence level |
(1) |
person_id + begin_date + offense_group |
conviction level less granualr |
(0) |
person_id + begin_date + offense_arrest_cd |
conviction level more granular |
III. Groom
Observations: 120,381
Variables: 6
$ person_id <dbl> 1, 1, 2, 2, 3, 5, 6, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 11...
$ begin_date <date> 1974-09-24, 1974-09-24, 1975-01-17, 1975-01-17, 1975-01-08, 1976-10-...
$ offense_arrest_cd <chr> "E01", "E01", "D21", "D21", "B41", "D21", "D11", "E01", "B11", "B31",...
$ offense_group <chr> "E", "E", "D", "D", "B", "D", "D", "E", "B", "B", "B", "B", "B", "H",...
$ offense_count <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 4, 2, 1, 3, 6, 5, 1, 2, 3, 1, 2, 2, 3, 1, 1, ...
$ offense_arrest <chr> "FORGERY 1ST DEGREE", "FORGERY 1ST DEGREE", "THEFT", "THEFT", "ROBBER...
offense_arrest_cd |
code for the offense committed |
E01 |
offense_arrest |
standardized description of the offense committed |
FORGERY 1ST DEGREE |
begin_date |
date the person began serving the aggregate sentence |
27296 |
IV. Aggregate
ds0
- starting point
# We will distinguish the following levels of aggregation :
# (4) person_id # person level
# (3) person_id + offense_group # person level with groups
# (2) person_id + begin_date # sentence level
# (1) person_id + begin_date + offense_group # conviction level # less granualr
# (0) person_id + begin_date + offense_arrest_cd # conviction level # more granular
# NOTE: one may have multiple convictions on the same date
# compute variables at the lowest level of aggregation (0)
ds0 <- ds %>%
# dplyr::filter(person_id %in% c(46222,65392, 50495) ) %>% # for testing
dplyr::arrange(person_id, begin_date, offense_arrest_cd) %>%
dplyr::group_by(person_id) %>%
dplyr::mutate(
conviction_order = dplyr::row_number() # sequential order of convictions
,total_n_convictions = dplyr::n() # total number of convictions a person has
) %>%
# additional measures, computed within a person
# note that we are NOT collapsing/aggregating
dplyr::mutate( # `mutate` NOT `summarize`, same N of rows
drug_related = ifelse(offense_group == "C", TRUE, NA)
,drug_order = cumsum(!is.na(drug_related))
,drug_order = ifelse(is.na(drug_related), NA, drug_order)
,after_1996 = ifelse(begin_date > "1996-01-01", TRUE, NA)
,days_since_previous = begin_date - dplyr::lag(begin_date,1)
) %>%
dplyr::ungroup()
# let us examine the data for three individuals
ds0 %>%
dplyr::filter(person_id %in% c(46222,65392, 50495) ) %>%
neat(caption = "Grouped by : (PERSON) - (DATE) - (OFFENSE ARREST CODE)")
Grouped by : (PERSON) - (DATE) - (OFFENSE ARREST CODE)
person_id
|
begin_date
|
offense_arrest_cd
|
offense_group
|
offense_count
|
offense_arrest
|
conviction_order
|
total_n_convictions
|
drug_related
|
drug_order
|
after_1996
|
days_since_previous
|
46222
|
1994-11-17
|
D11
|
D
|
1
|
BURGLARY
|
1
|
9
|
NA
|
NA
|
NA
|
NA
|
46222
|
1994-11-17
|
D11
|
D
|
1
|
BURGLARY
|
2
|
9
|
NA
|
NA
|
NA
|
0 days
|
46222
|
1994-11-17
|
D31
|
D
|
2
|
CRIMINAL MISCHIEF
|
3
|
9
|
NA
|
NA
|
NA
|
0 days
|
46222
|
2007-08-06
|
B35
|
B
|
2
|
VIOLATION OF PROTECTION ORDER
|
4
|
9
|
NA
|
NA
|
TRUE
|
4645 days
|
46222
|
2007-08-06
|
C21
|
C
|
1
|
POS CNTRL SUB EXCEPT MARIJUANA
|
5
|
9
|
TRUE
|
1
|
TRUE
|
0 days
|
46222
|
2007-08-06
|
L22
|
L
|
1
|
TELECOMMUNICATION VIOLATION
|
6
|
9
|
NA
|
NA
|
TRUE
|
0 days
|
46222
|
2012-08-20
|
C21
|
C
|
1
|
POS CNTRL SUB EXCEPT MARIJUANA
|
7
|
9
|
TRUE
|
2
|
TRUE
|
1841 days
|
46222
|
2012-08-20
|
C21
|
C
|
1
|
POS CNTRL SUB EXCEPT MARIJUANA
|
8
|
9
|
TRUE
|
3
|
TRUE
|
0 days
|
46222
|
2012-08-20
|
K01
|
K
|
1
|
CARRY/POSS CONCEALED WEAPON
|
9
|
9
|
NA
|
NA
|
TRUE
|
0 days
|
50495
|
1998-01-07
|
D12
|
D
|
1
|
POSSESSION OF BURGLARY TOOLS
|
1
|
5
|
NA
|
NA
|
TRUE
|
NA
|
50495
|
2004-04-06
|
D41
|
D
|
1
|
CRIMINAL TRESPASS
|
2
|
5
|
NA
|
NA
|
TRUE
|
2281 days
|
50495
|
2007-05-04
|
D11
|
D
|
1
|
BURGLARY
|
3
|
5
|
NA
|
NA
|
TRUE
|
1123 days
|
50495
|
2007-05-04
|
D20
|
D
|
2
|
THEFT BY RECEIVING STOLEN PROP
|
4
|
5
|
NA
|
NA
|
TRUE
|
0 days
|
50495
|
2007-05-04
|
D20
|
D
|
3
|
THEFT BY RECEIVING STOLEN PROP
|
5
|
5
|
NA
|
NA
|
TRUE
|
0 days
|
65392
|
2007-01-25
|
C21
|
C
|
1
|
POS CNTRL SUB EXCEPT MARIJUANA
|
1
|
6
|
TRUE
|
1
|
TRUE
|
NA
|
65392
|
2007-01-25
|
D11
|
D
|
1
|
BURGLARY
|
2
|
6
|
NA
|
NA
|
TRUE
|
0 days
|
65392
|
2007-01-25
|
D20
|
D
|
1
|
THEFT BY RECEIVING STOLEN PROP
|
3
|
6
|
NA
|
NA
|
TRUE
|
0 days
|
65392
|
2010-06-11
|
C21
|
C
|
1
|
POS CNTRL SUB EXCEPT MARIJUANA
|
4
|
6
|
TRUE
|
2
|
TRUE
|
1233 days
|
65392
|
2010-06-11
|
D11
|
D
|
1
|
BURGLARY
|
5
|
6
|
NA
|
NA
|
TRUE
|
0 days
|
65392
|
2014-05-21
|
D43
|
D
|
1
|
THEFT BY UNLWFL TAKING OR DISP
|
6
|
6
|
NA
|
NA
|
TRUE
|
1440 days
|
What to compute
Let us define the function that would govern the computation of variables during aggregation.
ds1
Grouped by : (PERSON) - (DATE) - (OFFENSE GROUP)
person_id
|
begin_date
|
offense_group
|
n_offense_counts
|
n_convictions
|
n_drug_related
|
n_after_1996
|
n_drug_related_after_1996
|
46222
|
1994-11-17
|
D
|
4
|
3
|
0
|
0
|
0
|
46222
|
2007-08-06
|
B
|
2
|
1
|
0
|
1
|
0
|
46222
|
2007-08-06
|
C
|
1
|
1
|
1
|
1
|
1
|
46222
|
2007-08-06
|
L
|
1
|
1
|
0
|
1
|
0
|
46222
|
2012-08-20
|
C
|
2
|
2
|
2
|
2
|
2
|
46222
|
2012-08-20
|
K
|
1
|
1
|
0
|
1
|
0
|
50495
|
1998-01-07
|
D
|
1
|
1
|
0
|
1
|
0
|
50495
|
2004-04-06
|
D
|
1
|
1
|
0
|
1
|
0
|
50495
|
2007-05-04
|
D
|
6
|
3
|
0
|
3
|
0
|
65392
|
2007-01-25
|
C
|
1
|
1
|
1
|
1
|
1
|
65392
|
2007-01-25
|
D
|
2
|
2
|
0
|
2
|
0
|
65392
|
2010-06-11
|
C
|
1
|
1
|
1
|
1
|
1
|
65392
|
2010-06-11
|
D
|
1
|
1
|
0
|
1
|
0
|
65392
|
2014-05-21
|
D
|
1
|
1
|
0
|
1
|
0
|
ds2
Grouped by : (PERSON) - (DATE)
person_id
|
begin_date
|
n_offense_counts
|
n_convictions
|
n_drug_related
|
n_after_1996
|
n_drug_related_after_1996
|
46222
|
1994-11-17
|
4
|
3
|
0
|
0
|
0
|
46222
|
2007-08-06
|
4
|
3
|
1
|
3
|
1
|
46222
|
2012-08-20
|
3
|
3
|
2
|
3
|
2
|
50495
|
1998-01-07
|
1
|
1
|
0
|
1
|
0
|
50495
|
2004-04-06
|
1
|
1
|
0
|
1
|
0
|
50495
|
2007-05-04
|
6
|
3
|
0
|
3
|
0
|
65392
|
2007-01-25
|
3
|
3
|
1
|
3
|
1
|
65392
|
2010-06-11
|
2
|
2
|
1
|
2
|
1
|
65392
|
2014-05-21
|
1
|
1
|
0
|
1
|
0
|
ds3
Grouped by : (PERSON) - (OFFENSE GROUP)
person_id
|
offense_group
|
n_offense_counts
|
n_convictions
|
n_drug_related
|
n_after_1996
|
n_drug_related_after_1996
|
46222
|
B
|
2
|
1
|
0
|
1
|
0
|
46222
|
C
|
3
|
3
|
3
|
3
|
3
|
46222
|
D
|
4
|
3
|
0
|
0
|
0
|
46222
|
K
|
1
|
1
|
0
|
1
|
0
|
46222
|
L
|
1
|
1
|
0
|
1
|
0
|
50495
|
D
|
8
|
5
|
0
|
5
|
0
|
65392
|
C
|
2
|
2
|
2
|
2
|
2
|
65392
|
D
|
4
|
4
|
0
|
4
|
0
|
ds4
Grouped by : (PERSON)
person_id
|
n_offense_counts
|
n_convictions
|
n_drug_related
|
n_after_1996
|
n_drug_related_after_1996
|
46222
|
11
|
9
|
3
|
6
|
3
|
50495
|
8
|
5
|
0
|
5
|
0
|
65392
|
6
|
6
|
2
|
6
|
2
|
Session Information
For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.
Environment
- Session info -------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.2 (2018-12-20)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Los_Angeles
date 2019-05-21
- Packages -----------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
codetools 0.2-15 2016-10-05 [2] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.3)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.3)
devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.3)
dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.3)
evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.3)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.3)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3)
ggplot2 * 3.1.1 2019-04-07 [1] CRAN (R 3.5.3)
ggpubr 0.2 2018-11-15 [1] CRAN (R 3.5.3)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3)
highr 0.8 2019-03-20 [1] CRAN (R 3.5.3)
hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.3)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.3)
httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.3)
kableExtra 1.1.0 2019-03-16 [1] CRAN (R 3.5.3)
knitr * 1.23 2019-05-18 [1] CRAN (R 3.5.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3)
lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.3)
magrittr * 1.5 2014-11-22 [1] CRAN (R 3.5.3)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.3)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.3)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.3)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.3)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.3)
processx 3.3.1 2019-05-08 [1] CRAN (R 3.5.2)
pryr 0.1.4 2018-02-18 [1] CRAN (R 3.5.3)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.3)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.3)
Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.5.3)
remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.3)
rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.3)
rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.3)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.3)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3)
rvest 0.3.3 2019-04-11 [1] CRAN (R 3.5.3)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.3)
tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.3)
usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.3)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.3)
webshot 0.5.1 2018-09-28 [1] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.3)
xfun 0.7 2019-05-14 [1] CRAN (R 3.5.3)
xml2 1.2.0 2018-01-24 [1] CRAN (R 3.5.3)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.2)
[1] C:/Users/an499583/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.2/library
Report rendered by an499583 at 2019-05-21, 10:24 -0700 in 29 seconds.