August 20, 2019

What is ?



  • Programming Q/A site launched September 15, 2008 created by Jeff Atwood and Joel Spolsky
  • Built as an open alternative to Q/A tech sites such as Experts Exchange
  • Rigorous, engaged community with appointed and voted moderators
  • Fosters a gaming style with points and milestone badges

StackExchange Network

  • Due to popularity became the flagship site in SE network

Core of StackOverflow

  • Users
    • Askers: Usually one-off, newcomers who ask few questions
    • Answerers: Usually long-term, advanced members
    • Moderators: Appointed and voted members to manage site
  • Posts
    • Immediate help to original posters (OPs)
    • Future help to greater community
  • Tags
    • Usually languages (C#, Java, Python, R)
    • Frameworks, tools, and modules (i.e., pandas, dplyr)

R Tag

Best Practices - Askers

Best Practices - Answerers

  • Set up time to search for good attempts with reproducible examples
  • Work through your comfortable skillsets:
    • [r] [dataframe], [r] [plot], [r] [shiny]
  • Provide detailed explanation and in-line code comments
  • Demonstrate code with data output and/or graph

Quick Data Build

### READ TABLE FORMATTED DATA
txt <- '   group int         num char  bool       date
1  stata  14 -0.01933983  FcC FALSE "1992-06-13"
2 python  15 -0.97016057  5V9  TRUE "1993-03-11"
3 python   8  0.01481491  llY FALSE "2017-01-28"
4    sas   5 -1.23408058  IXI FALSE "1985-09-03"
5      r  10  0.54730127  IIQ  TRUE "2015-12-16"
6      r   3 -1.16625133  05x  TRUE "1990-07-18"'

df <- read.table(text=txt, header=TRUE)
df
##    group int         num char  bool       date
## 1  stata  14 -0.01933983  FcC FALSE 1992-06-13
## 2 python  15 -0.97016057  5V9  TRUE 1993-03-11
## 3 python   8  0.01481491  llY FALSE 2017-01-28
## 4    sas   5 -1.23408058  IXI FALSE 1985-09-03
## 5      r  10  0.54730127  IIQ  TRUE 2015-12-16
## 6      r   3 -1.16625133  05x  TRUE 1990-07-18

Data.Frame Example

set.seed(8202019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")

random_df <- data.frame(
  group = factor(sample(data_tools, 500, replace=TRUE)),
  int = sample(1:15, 500, replace=TRUE),
  num = rnorm(500),
  char = replicate(500, paste(sample(alpha, 3, replace=TRUE), collapse="")),
  bool = sample(c(TRUE, FALSE), 500, replace=TRUE),
  date = as.Date(sample(1:as.integer(Sys.Date()), 500, replace=TRUE), origin="1970-01-01"),
  stringsAsFactors = FALSE
)
head(random_df)
##    group int        num char  bool       date
## 1  stata  14 -0.5096000  Bl8 FALSE 2019-05-11
## 2      r  11  0.4103947  EIf FALSE 1985-03-06
## 3 python  12 -1.9084805  6t9  TRUE 2015-01-19
## 4  julia   1 -0.5869093  hnu FALSE 2000-07-14
## 5   spss   5  0.3613189  XBY FALSE 1976-12-19
## 6      r   9  0.8342512  iay  TRUE 2014-06-08

Dput Example

### ASSIGN OBJECT TO RESULT OF dput(head(random_df))

reproduced_df <- structure(list(group = structure(c(6L, 2L, 2L, 4L, 3L, 3L), .Label = c("julia", 
"python", "r", "sas", "spss", "stata"), class = "factor"), int = c(14L, 
15L, 8L, 5L, 10L, 3L), num = c(-0.019339832897539, -0.970160572336964, 
0.0148149050692396, -1.23408057869592, 0.547301270279682, -1.16625132915773
), char = c("FcC", "5V9", "llY", "IXI", "IIQ", "05x"), bool = c(FALSE, 
TRUE, FALSE, FALSE, TRUE, TRUE), date = structure(c(8199, 8470, 
17194, 5724, 16785, 7503), class = "Date")), row.names = c(NA, 
6L), class = "data.frame")

reproduced_df
##    group int         num char  bool       date
## 1  stata  14 -0.01933983  FcC FALSE 1992-06-13
## 2 python  15 -0.97016057  5V9  TRUE 1993-03-11
## 3 python   8  0.01481491  llY FALSE 2017-01-28
## 4    sas   5 -1.23408058  IXI FALSE 1985-09-03
## 5      r  10  0.54730127  IIQ  TRUE 2015-12-16
## 6      r   3 -1.16625133  05x  TRUE 1990-07-18

Challenges

  • Askers
    • Not too invested with curated repository
    • Often do not care about craft or learning process
    • Passive group who comes and goes
  • Answerers
    • Some over-help for rep points without best practices
    • Some over-suggest packages/aglorithms (e.g., tidyverse)
    • Some pose as bullies and intimidate newcomers

StackOverflow API Data Analytics Workshop

  • Break up into ~5 person(s) group
  • Launch R notebooks (RStudio or Jupyter) on a computer
  • Answer guided questions along Posts, Users, and Tags or find other insights
  • Alternate someone to type table/plot solutions
  • Submit notebook (PR or email) for presentation