In this exercise, we will work with a dataset of beer reviews from RateBeer (you can download it here), and perform some exploratory data analysis.

Let’s start by exploring the data and familiarizing with its structure:

  1. How many variables are in the data?

  2. How many rows?

Now, let’s focus on some specific variables:

  1. Explore the values of beer_style, do you see anything strange or unusual?

  2. Explore the beer_ABV variable, do you see anything unusual?

  3. What does the “_” value mean in the beer_ABV variable?

  4. Explore the rating (review_XXX) variables. Are you able to compute the mean? If not, what is the issue?

  5. Explore the variable review_time. What do you notice? How can you convert it to a date format?

  6. Plot the (ordered) distribution of reviews by beer_style. Which one is the most and least popular beer?

  7. Plot the distribution of beer_ABV for all beers. Does it look like you would expect? Are there any outliers? Which one is the beer with the highest ABV?

  8. Plot the distribution of average beer_ABV by beer_style. Which beer_style has the most and least ABV content?

Next, let’s look at ratings:

  1. Plot the average review_overall by beer_style. Which one is the highest- and lowest-rated style?

  2. Plot the average review_taste by beer_style. Which one is the highest- and lowest-rated style?

  3. Plot the distribution of average review_overall by beer_brewerId. Do you see anything unusual? Which brewery is the most and least liked, with at least 10 reviews?

Finally, let’s look at the relationship between beer styles and reviews:

  1. Are breweries with a few or many beer styles those with higher reviews_overall?

  2. Suppose you were tasked with helping a brewery decide which and how many beers to produce. How can you use this data to answer these questions? You can utilize some of the analyses you performed above and also add new ones (for example, we did not examine time trends). Prepare a brief presentation that discusses the analyses supporting your decisions.