Adam Ginensky
January 29, 2020
# sample the integers 1- 1000 100 times with replacement)
x =c(1:100)
quantile(x) # Shows the cut points to divide the data into quartiles## 0% 25% 50% 75% 100%
## 1.00 25.75 50.50 75.25 100.00
quantile(x,probs = seq(0,1,.1)) # compute the deciles## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 1.0 10.9 20.8 30.7 40.6 50.5 60.4 70.3 80.2 90.1 100.0
quantile(x,probs = seq(0,1,.25)) # computes the quartiles.## 0% 25% 50% 75% 100%
## 1.00 25.75 50.50 75.25 100.00
The output of the quantile function is a vector with all the cut points. The cut points are labelled.
x = c(1:1000)
quantile(x) # displays the division## 0% 25% 50% 75% 100%
## 1.00 250.75 500.50 750.25 1000.00
y = quantile(x,probs = seq(0,1,1/1000)) # divide the data into 1000 bins
y[1] # minimum value## 0.0%
## 1
y[1000] # start of last bin## 99.9%
## 999.001
y[1001] # maximum value ## 100.0%
## 1000
y[501] # the median ## 50.0%
## 500.5
head(names(y))## [1] "0.0%" "0.1%" "0.2%" "0.3%" "0.4%" "0.5%"
library(dplyr) # we need to load the library##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
v1= matrix(rnorm(1000000),1000000) # matrix with 1M rows and 5 cols
see1= ntile(v1,100)
see2 = ntile(v1,1000)
head(see1) # first 6 bins## [1] 67 62 63 32 91 51
summary(see1) # as expected## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.75 50.50 50.50 75.25 100.00
head(see2)## [1] 663 619 630 315 909 505
summary(see2)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 250.8 500.5 500.5 750.2 1000.0
pctl = quantile(x, probs = seq(0,1,.01)) # 101 values
pctl.low = pctl[1:100] # the lower value of all the bins
pctl.high = pctl[2:101] # the upper values
pctl.mean = .5*(pctl.low + pctl.high)pct.bin = pctl.mean[ntile(v1,100)]