Data Wrangling UN Votes

In this problem set, we will use data from the unvotes package. It contains data for all countries voting history in the general assembly.

Here is a description of the data set:

variable	description
rcid	The roll call id; An identifier for each vote; used to join with un_votes and un_roll_call_issues
country	Country name, by official English short name
country_code	2-character ISO country code
vote	Vote result as a factor of yes/abstain/no
session	Session number. The UN holds one session per year; these started in 1946
importantvote	Whether the vote was classified as important by the U.S. State Department report “Voting Practices in the United Nations”. These classifications began with session 39
date	Date of the vote, as a Date vector
unres	Resolution code
amend	Whether the vote was on an amendment; coded only until 1985
para	Whether the vote was only on a paragraph and not a resolution; coded only until 1985
short	Short description
descr	Longer description
short_name	Two-letter issue codes
issue	Descriptive issue name, 6 issues

It’s panel data (/time-series cross-sectional data/longitudinal data). That is, we observe our observational units, in this case countries, over time.

For non-political scientists, here is a bit of a longer description of the data:

The UN General Assembly is the main deliberative/policy-making/representative institution of the UN. From 1946 onwards, the countries (almost any country in the world) of the UN meet every year (in so called “sessions” that take a couple of months) to vote on recommendations on peace, economic development, disarmament, human rights, etc. Each country has one vote and can vote “yes”, “no”, or can “abstain” (roll call; see rcid). In each year (session-year), multiple issues are voted upon on several occasions (see variable date). The broader issue category is measured by the variable issue.

Source: tidytuesday.

Loading the Data

In a first step, we import the data. Thanks to the rio package, this is super easy. You don`t have to be afraid of missing anything because you stick to this comfort wrapper. It’s just data importing; you don’t need to memorize 5 or more packages when you can learn one in the beginning. I also use this package all the time.

Our data is stored as a .parquet file and is not exactly small with almost 1 million observations. The .csv would’ve had 200 mb while .parquet is under 1mb. The power of compression.

unvotes <- rio::import(here("Data", "unvotes.parquet"))

A note on the “here” package:

We use R projects to organize our analysis. This is nice, as clicking on the .rproj file initializes an R project which sets the working directory to the project root (the place where the .rproj file lives in). We can then use relative file paths to point R to the files we want to import/use in our script.

However, when you work across different operating systems (e.g., Windows and Mac), relative file paths won’t work properly. Moreover, .Rmd files such as the one here won’t work with relative file paths when you knit the document (via the blue button above). This is because they set the knitting directory to the folder where the file is placed by default. You can reset this default via:

knitr::opts_knit$set(root.dir = normalizePath('../')) # for windows.

Because all of this, the here package exists. It makes relative file paths just work.

So instead of passing a relative file path to the file importing function of your choice, you wrap it into the here() function. Inside the function (starting from the project root, see example above), you simply pass the folder (and sub-folders if they exists) and the file name as strings separated by commas. Easy and robust!

Task 1

Familiarize yourself with the data. What’s each variable’s type/class?

str(unvotes)

## 'data.frame':    857878 obs. of  14 variables:
##  $ rcid         : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ country      : chr  "United States" "Canada" "Cuba" "Dominican Republic" ...
##  $ country_code : chr  "US" "CA" "CU" "DO" ...
##  $ vote         : chr  "no" "no" "yes" "abstain" ...
##  $ session      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ importantvote: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ date         : Date, format: "1946-01-04" "1946-01-04" ...
##  $ unres        : chr  "R/1/107" "R/1/107" "R/1/107" "R/1/107" ...
##  $ amend        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ para         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ short        : chr  "DECLARATION OF HUMAN RIGHTS" "DECLARATION OF HUMAN RIGHTS" "DECLARATION OF HUMAN RIGHTS" "DECLARATION OF HUMAN RIGHTS" ...
##  $ descr        : chr  "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DECLARATION OF THE RIGHTS AND DUTIES OF MAN BE TABLED." "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DECLARATION OF THE RIGHTS AND DUTIES OF MAN BE TABLED." "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DECLARATION OF THE RIGHTS AND DUTIES OF MAN BE TABLED." "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DECLARATION OF THE RIGHTS AND DUTIES OF MAN BE TABLED." ...
##  $ short_name   : chr  "hr" "hr" "hr" "hr" ...
##  $ issue        : chr  "Human rights" "Human rights" "Human rights" "Human rights" ...

glimpse(unvotes)

## Rows: 857,878
## Columns: 14
## $ rcid          <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
## $ country       <chr> "United States", "Canada", "Cuba", "Dominican Republic",~
## $ country_code  <chr> "US", "CA", "CU", "DO", "MX", "GT", "HN", "SV", "NI", "P~
## $ vote          <chr> "no", "no", "yes", "abstain", "yes", "no", "yes", "absta~
## $ session       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ importantvote <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ date          <date> 1946-01-04, 1946-01-04, 1946-01-04, 1946-01-04, 1946-01~
## $ unres         <chr> "R/1/107", "R/1/107", "R/1/107", "R/1/107", "R/1/107", "~
## $ amend         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ para          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ short         <chr> "DECLARATION OF HUMAN RIGHTS", "DECLARATION OF HUMAN RIG~
## $ descr         <chr> "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DEC~
## $ short_name    <chr> "hr", "hr", "hr", "hr", "hr", "hr", "hr", "hr", "hr", "h~
## $ issue         <chr> "Human rights", "Human rights", "Human rights", "Human r~

What is the level of observation of this data set?

Your Answer:

Is this data set tidy?

Your Answer:

Task 2

a) Pipes

I generate some random data:

# generate 10000 random observations drawn from a normal distribution (mean = 0, sd = 1.
x <- rnorm(10000)

Re-write the following code with a pipe:

# 1: 

# Kernel density plot made via Base R's plotting functions for your understanding.
# Base R's plotting functions are pretty mediocre, though. We will learn a better way soon: ggplot2.
plot(density((x)))

# 2:

round(log(sqrt(abs(exp(x)))))

# 3:

# Pipe out 1 instead of x this time.

round(x, digits = 1)

Your Answer:

# 1: 

x %>% 
  density %>% 
  plot()

# 2:

x %>% 
  exp() %>% 
  abs() %>% 
  sqrt() %>% 
  log() %>% 
  round() %>% 
  head()

## [1] 0 1 0 1 0 0

# 3: 

1 %>% 
  round(x, digits = .) %>% 
  head()

## [1] -0.6  2.5 -0.8  1.9 -0.8 -0.3

b) filter()

Filter the unvotes data frame such that you obtain observations only from the US. Bind the resulting object to a new name unvotes_us (this would be the accurate way of phrasing assignment. For simplicity you can also read this as “create a new object named unvotes_us”). Do this with AND without pipes!

Also, make the data.frame a tibble and print it to the console.

Note: If you need help, consult the slides or the help files via ?FUNCTIONNAME. Check if you understood data masking.

unvotes_us <- unvotes %>% 
  filter(country == "United States") %>% 
  as_tibble()

unvotes_us <- as_tibble(filter(unvotes, country == "United States"))

unvotes_us

## # A tibble: 5,718 x 14
##     rcid country country_code vote  session importantvote date       unres amend
##    <int> <chr>   <chr>        <chr>   <int>         <int> <date>     <chr> <int>
##  1     6 United~ US           no          1             0 1946-01-04 R/1/~     0
##  2     8 United~ US           no          1             0 1946-01-05 R/1/~     1
##  3    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  4    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  5    18 United~ US           no          1             0 1946-02-03 R/1/~     1
##  6    19 United~ US           yes         1             0 1946-02-03 R/1/~     0
##  7    24 United~ US           yes         1             0 1946-12-05 R/1/~     0
##  8    26 United~ US           no          1             0 1946-12-06 R/1/~     0
##  9    27 United~ US           yes         1             0 1946-12-06 R/1/~     0
## 10    28 United~ US           yes         1             0 1946-12-06 R/1/~     0
## # ... with 5,708 more rows, and 5 more variables: para <int>, short <chr>,
## #   descr <chr>, short_name <chr>, issue <chr>

c) summarise()

Using unvotes_us, “collapse” the data to a tibble showing the number of “yes” votes (pooled over all years). For a hint, scroll to the bottom of this document.

unvotes_us %>% 
  filter(vote == "yes") %>% 
  summarise(n_yes = n())

## # A tibble: 1 x 1
##   n_yes
##   <int>
## 1  1184

d) mutate()

“Overwrite” unvotes_us and create a new variable year holding the year of the roll call.

unvotes_us <- unvotes_us %>% 
  mutate(year = lubridate::year(date))

e) select() and arrange()

Select the year, the rcid, the descr variable, and the issue variable and sort the tibble by issue.

unvotes_us %>% 
  select(year, rcid, descr, issue) %>% 
  arrange(issue)

## # A tibble: 5,718 x 4
##     year  rcid descr                                          issue             
##    <dbl> <int> <chr>                                          <chr>             
##  1  1948    85 TO ADOPT USSR DRAFT RESOL. (A/C.1/310) POSTPO~ Arms control and ~
##  2  1948    94 TO ADOPT PARAGRAPH 1 OF USSR DRAFT RESOL. (A/~ Arms control and ~
##  3  1948    95 TO ADOPT PARAGRAPH 2 OF USSR DRAFT RESOL. (A/~ Arms control and ~
##  4  1948    96 TO ADOPT PARAGRAPH 3 OF USSR DRAFT RESOL. (A/~ Arms control and ~
##  5  1948    97 TO ADOPT PARAGRAPH 4 OF USSR DRAFT RESOL. (A/~ Arms control and ~
##  6  1948    98 TO ADOPT PARAGRAPH 5 OF THE USSR DRAFT RESOL.~ Arms control and ~
##  7  1948    99 TO ADOPT PARAGRAPH 6 OF USSR DRAFT RESOL. (A/~ Arms control and ~
##  8  1948   100 TO ADOPT PARAGRAPH 7 OF THE USSR DRAFT RESOL.~ Arms control and ~
##  9  1948   101 TO ADOPT PARAGRAPH 8 OF THE USSR DRAFT RESOL.~ Arms control and ~
## 10  1949   198 TO ADOPT PARAGRAPH 1 OF THE USSR DRAFT RESOLU~ Arms control and ~
## # ... with 5,708 more rows

Select everything but the year variable (Hint: read the select help file; there is a very short way to do this!).

unvotes_us %>% 
  select(-year)

## # A tibble: 5,718 x 14
##     rcid country country_code vote  session importantvote date       unres amend
##    <int> <chr>   <chr>        <chr>   <int>         <int> <date>     <chr> <int>
##  1     6 United~ US           no          1             0 1946-01-04 R/1/~     0
##  2     8 United~ US           no          1             0 1946-01-05 R/1/~     1
##  3    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  4    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  5    18 United~ US           no          1             0 1946-02-03 R/1/~     1
##  6    19 United~ US           yes         1             0 1946-02-03 R/1/~     0
##  7    24 United~ US           yes         1             0 1946-12-05 R/1/~     0
##  8    26 United~ US           no          1             0 1946-12-06 R/1/~     0
##  9    27 United~ US           yes         1             0 1946-12-06 R/1/~     0
## 10    28 United~ US           yes         1             0 1946-12-06 R/1/~     0
## # ... with 5,708 more rows, and 5 more variables: para <int>, short <chr>,
## #   descr <chr>, short_name <chr>, issue <chr>

unvotes_us %>% 
  select(!year)

## # A tibble: 5,718 x 14
##     rcid country country_code vote  session importantvote date       unres amend
##    <int> <chr>   <chr>        <chr>   <int>         <int> <date>     <chr> <int>
##  1     6 United~ US           no          1             0 1946-01-04 R/1/~     0
##  2     8 United~ US           no          1             0 1946-01-05 R/1/~     1
##  3    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  4    11 United~ US           yes         1             0 1946-02-05 R/1/~     0
##  5    18 United~ US           no          1             0 1946-02-03 R/1/~     1
##  6    19 United~ US           yes         1             0 1946-02-03 R/1/~     0
##  7    24 United~ US           yes         1             0 1946-12-05 R/1/~     0
##  8    26 United~ US           no          1             0 1946-12-06 R/1/~     0
##  9    27 United~ US           yes         1             0 1946-12-06 R/1/~     0
## 10    28 United~ US           yes         1             0 1946-12-06 R/1/~     0
## # ... with 5,708 more rows, and 5 more variables: para <int>, short <chr>,
## #   descr <chr>, short_name <chr>, issue <chr>

f) group_by()

Take the entire data frame and create a new variable that holds, for each country, the number of yes votes. Tip: to check if this worked fine, arrange() by country.

unvotes <- unvotes %>% 
  group_by(country) %>% 
  mutate(yes_votes = sum(vote == "yes", na.rm = TRUE)) %>% 
  arrange(country)
unvotes

## # A tibble: 857,878 x 15
## # Groups:   country [200]
##     rcid country country_code vote  session importantvote date       unres amend
##    <int> <chr>   <chr>        <chr>   <int>         <int> <date>     <chr> <int>
##  1    24 Afghan~ AF           yes         1             0 1946-12-05 R/1/~     0
##  2    35 Afghan~ AF           yes         1             0 1946-12-07 R/1/~     0
##  3    36 Afghan~ AF           abst~       1             0 1946-12-07 R/1/~     0
##  4    37 Afghan~ AF           abst~       1             0 1946-12-07 R/1/~     1
##  5    37 Afghan~ AF           abst~       1             0 1946-12-07 R/1/~     1
##  6    38 Afghan~ AF           abst~       1             0 1946-12-07 R/1/~     1
##  7    39 Afghan~ AF           abst~       1             0 1946-12-07 R/1/~     0
##  8    41 Afghan~ AF           no          1             0 1946-12-01 R/1/~     1
##  9    41 Afghan~ AF           no          1             0 1946-12-01 R/1/~     1
## 10    42 Afghan~ AF           yes         1             0 1946-12-01 R/1/~     0
## # ... with 857,868 more rows, and 6 more variables: para <int>, short <chr>,
## #   descr <chr>, short_name <chr>, issue <chr>, yes_votes <int>

Task 3

a)

With

datasummary_skim(unvotes$issue, type = "categorical")

data	N	%
Arms control and disarmament	170497	19.9
Colonialism	129708	15.1
Economic development	108759	12.7
Human rights	156623	18.3
Nuclear weapons and nuclear material	133635	15.6
Palestinian conflict	158656	18.5

you get a quick summary for the categorical issue variable.

However, as each rcid occurs multiple times for each country, the issues are, of course, also repeatedly present in our data.

What we want is a sorted table for the distribution of our issue variable over all roll calls.

Create such table/tibble/data frame using only tidyverse verbs/functions (scroll to the bottom of this document if you need a hint).

# There are several different ways we can achieve this:

# 1. E.g. using count() which is a shorthand for group_by(x) %>% summarise(n = n())

# count() returns a tibble of the form

# >        x        n
# >    <chr>    <int>
# >  1     A       35

# So we can just add a column to this computing the relative frequency:
unvotes %>%
  distinct(rcid, issue) %>%
  count(issue) %>%
  mutate(percent = round(100 * n / sum(n), 1)) %>% # sum() of the n vector/variable
  arrange(desc(percent))

## # A tibble: 1,195 x 4
## # Groups:   country [200]
##    country           issue                            n percent
##    <chr>             <chr>                        <int>   <dbl>
##  1 Zanzibar          Economic development             1   100  
##  2 Taiwan            Colonialism                    260    40.1
##  3 South Sudan       Human rights                   117    26.8
##  4 St. Kitts & Nevis Arms control and disarmament   544    26.5
##  5 Tuvalu            Human rights                   333    25.7
##  6 Turkmenistan      Arms control and disarmament   441    25.4
##  7 Palau             Human rights                   388    24.9
##  8 Nauru             Human rights                   298    24.8
##  9 Tajikistan        Arms control and disarmament   530    24.3
## 10 Kiribati          Arms control and disarmament   139    24.2
## # ... with 1,185 more rows

# Another way:

unvotes %>%
  distinct(rcid, issue) %>%
  group_by(issue) %>%
  summarise(n = n()) %>% 
  mutate(percent = round(100 * n / sum(n), 1)) %>%
  arrange(desc(percent))

## # A tibble: 6 x 3
##   issue                                     n percent
##   <chr>                                 <int>   <dbl>
## 1 Arms control and disarmament         170497    19.9
## 2 Palestinian conflict                 158656    18.5
## 3 Human rights                         156623    18.3
## 4 Nuclear weapons and nuclear material 133635    15.6
## 5 Colonialism                          129708    15.1
## 6 Economic development                 108759    12.7

# Yet another way

unvotes %>%
  distinct(rcid, issue) %>%
  group_by(issue) %>%
  transmute(n = n()) %>%
  unique() %>%
  ungroup() %>%
  mutate(percent = round(100 * n / sum(n), 1)) %>%
  arrange(desc(percent))

## # A tibble: 6 x 3
##   issue                                     n percent
##   <chr>                                 <int>   <dbl>
## 1 Arms control and disarmament         170497    19.9
## 2 Palestinian conflict                 158656    18.5
## 3 Human rights                         156623    18.3
## 4 Nuclear weapons and nuclear material 133635    15.6
## 5 Colonialism                          129708    15.1
## 6 Economic development                 108759    12.7

b)

Which issue category has the highest share of important votes?

unvotes %>%
  distinct(rcid, issue, .keep_all = T) %>%
  group_by(issue) %>%
  summarise(votes = n(), 
  imp_votes = sum(importantvote, na.rm = TRUE), 
  share_imp = sum(importantvote, na.rm = TRUE) / n()) %>%
  arrange(desc(share_imp))

## # A tibble: 6 x 4
##   issue                                 votes imp_votes share_imp
##   <chr>                                 <int>     <int>     <dbl>
## 1 Human rights                         156623     28844    0.184 
## 2 Economic development                 108759     11309    0.104 
## 3 Palestinian conflict                 158656     14274    0.0900
## 4 Nuclear weapons and nuclear material 133635      7269    0.0544
## 5 Arms control and disarmament         170497      8107    0.0475
## 6 Colonialism                          129708      5443    0.0420

Task 4

a)

Add variables that show, for each country, the number and share of “yes”, “no”, and “abstain” votes, pooled over all years. Additionally, put out a tibble/data frame with one row for each country and these new variables.

# First, add the variables (this is a bit cumbersome; we will learn case_when in the next session):

unvotes <- unvotes %>%
  group_by(country) %>%
  mutate(yes_votes = sum(vote == "yes"),
         no_votes = sum(vote == "no"),
         abstain_votes = sum(vote == "abstain"),
         pct_yes = sum(vote == "yes", na.rm = T)/n(),
         pct_no = sum(vote == "no", na.rm = T)/n(),
         pct_abs = sum(vote == "abstain", na.rm = T)/n())

# Next, compute a summary table

sum_tab_a <- unvotes %>%  # or use mean()
  distinct(country, .keep_all = TRUE) %>%
  select(country, yes_votes, no_votes, abstain_votes, pct_yes, pct_no, pct_abs)
sum_tab_a

## # A tibble: 200 x 7
## # Groups:   country [200]
##    country           yes_votes no_votes abstain_votes pct_yes  pct_no pct_abs
##    <chr>                 <int>    <int>         <int>   <dbl>   <dbl>   <dbl>
##  1 Afghanistan            4815      185           289   0.910 0.0350   0.0546
##  2 Albania                2959      599           648   0.704 0.142    0.154 
##  3 Algeria                4854       89           347   0.918 0.0168   0.0656
##  4 Andorra                1602      351           579   0.633 0.139    0.229 
##  5 Angola                 3685       36           242   0.930 0.00908  0.0611
##  6 Antigua & Barbuda      3305       20           265   0.921 0.00557  0.0738
##  7 Argentina              4536      143          1007   0.798 0.0251   0.177 
##  8 Armenia                1886       69           638   0.727 0.0266   0.246 
##  9 Australia              2950     1187          1600   0.514 0.207    0.279 
## 10 Austria                3447      454          1613   0.625 0.0823   0.293 
## # ... with 190 more rows

# OR

sum_tab_a_1 <- unvotes %>%
  group_by(country) %>%
  summarise(
    pct_yes = sum(vote == "yes", na.rm = T) / n(),
    pct_no = sum(vote == "no", na.rm = T) / n(),
    pct_abs = sum(vote == "abstain", na.rm = T) / n()
  )
sum_tab_a_1

## # A tibble: 200 x 4
##    country           pct_yes  pct_no pct_abs
##    <chr>               <dbl>   <dbl>   <dbl>
##  1 Afghanistan         0.910 0.0350   0.0546
##  2 Albania             0.704 0.142    0.154 
##  3 Algeria             0.918 0.0168   0.0656
##  4 Andorra             0.633 0.139    0.229 
##  5 Angola              0.930 0.00908  0.0611
##  6 Antigua & Barbuda   0.921 0.00557  0.0738
##  7 Argentina           0.798 0.0251   0.177 
##  8 Armenia             0.727 0.0266   0.246 
##  9 Australia           0.514 0.207    0.279 
## 10 Austria             0.625 0.0823   0.293 
## # ... with 190 more rows

# OR, better to read (for single countries), but no "wide" format:

sum_tab_a_2 <- unvotes %>%
  group_by(country, vote) %>%
  count(country, vote) %>% 
  group_by(country) %>% 
  mutate(percent = n/sum(n))
  
sum_tab_a_2

## # A tibble: 598 x 4
## # Groups:   country [200]
##    country     vote        n percent
##    <chr>       <chr>   <int>   <dbl>
##  1 Afghanistan abstain   289  0.0546
##  2 Afghanistan no        185  0.0350
##  3 Afghanistan yes      4815  0.910 
##  4 Albania     abstain   648  0.154 
##  5 Albania     no        599  0.142 
##  6 Albania     yes      2959  0.704 
##  7 Algeria     abstain   347  0.0656
##  8 Algeria     no         89  0.0168
##  9 Algeria     yes      4854  0.918 
## 10 Andorra     abstain   579  0.229 
## # ... with 588 more rows

b)

Calculate, for each country and issue, the number and share of “yes” votes but only for “important votes” and for the permanent members of the security council. The output should have 30 rows.

str(unvotes$importantvote)

##  int [1:857878] 0 0 0 0 0 0 0 0 0 0 ...

perm_members <- c("United States", "Russia", "France", "United Kingdom", "China")

imp_votes_tab <- unvotes %>%
  group_by(country, issue) %>%
  filter(importantvote == 1, country %in% perm_members) %>%
  summarise(
    n_votes = n(),
    pct_yes = sum(vote == "yes", na.rm = T) / n()
  )
imp_votes_tab

## # A tibble: 30 x 4
## # Groups:   country [5]
##    country issue                                n_votes pct_yes
##    <chr>   <chr>                                  <int>   <dbl>
##  1 China   Arms control and disarmament              45   0.622
##  2 China   Colonialism                               31   0.903
##  3 China   Economic development                      67   0.925
##  4 China   Human rights                             167   0.461
##  5 China   Nuclear weapons and nuclear material      40   0.575
##  6 China   Palestinian conflict                      87   0.989
##  7 France  Arms control and disarmament              48   0.729
##  8 France  Colonialism                               33   0.242
##  9 France  Economic development                      67   0.701
## 10 France  Human rights                             171   0.637
## # ... with 20 more rows

c)

Get the years with the highest and lowest share of “yes” votes for each country.

max_min_tab <- unvotes %>%
  filter(country %in% perm_members) %>%
  group_by(year = lubridate::year(date), country) %>% # lubridate is not loaded by default
  summarise(
    n_votes = n(),
    pct_yes = sum(vote == "yes", na.rm = T) / n()
  ) %>%
  ungroup() %>%
  group_by(country) %>%
  filter(pct_yes == max(pct_yes) | pct_yes == min(pct_yes)) %>%
  arrange(country, desc(pct_yes))
max_min_tab

## # A tibble: 11 x 4
## # Groups:   country [5]
##     year country        n_votes pct_yes
##    <dbl> <chr>            <int>   <dbl>
##  1  1979 China               92  0.967 
##  2  1971 China               33  0.636 
##  3  1951 France               2  1     
##  4  1968 France              26  0.0769
##  5  1989 Russia             125  0.992 
##  6  1951 Russia               2  0     
##  7  1951 United Kingdom       2  1     
##  8  1952 United Kingdom      29  0.172 
##  9  1951 United States        2  1     
## 10  1956 United States       11  1     
## 11  2004 United States       91  0.0659

Hints:

Task 2c: You also need filter() and n() for this.

Task 3a: distinct() may be helpful here!

Task 3b: Check out the arguments of distinct()!

Session 3: Problem Set

Data Wrangling I

Solution

18 Juli 2021