# install.packages(c("httr2", "jsonlite", "usethis"))
Use APIs to source your data in R
Welcome (5min)
- TU Delft R Cafe is an initiative supported by Open Science Community Delft (OSCD)
- We’re organizing a Plot-a-thon (11/10); Next session 17th November
- Ice breaking
What is API? (10min)
An API (Application Programming Interface) is used for computer programs to communicate with each other.
In the session today, we will focus on the APIs that allow us to upload, download and search for data. In this context, API is an intermediary between a dataset (usually a very large one) and the rest of the world. APIs provide an accessible way to request a dataset, which is referred to as making a call to the API.
The commonly used type of an API, that we will also use today, is a REST (Representational State Transfer) API (also called RESTful). This kind of API uses HTTP protocol to send request to a server and receive a standardized response.
REST API methods
There are five HTTP methods that you can use when making an API request:
- This method is used to retrieve a data from database / server.POST
- This method is used to create a new record.PUT
- This method is used to modify / replace the record. It replaces the entire record.PATCH
- This method is used to modify / update the record. It replaces parts of the record.DELETE
- This method is used to delete the record.
Request structure
Apart from the HTTP methods, you need a few other components to make the API request. The components are:
- HTTP method - to explain what action you want to perform
- endpoint - a URL to find the resource you are trying to reach on the Internet. The endpoint contains of Base URL (or root endpoint) - a consistent part of the URL to use and relative URL - reference to specific resource you want to access.
- headers - provides information relevant both for client (us) and the server.It can be used for example for authentication or to provide information about the body content.See the full list of valid HTTP headers
- body - contains data that you want to send to the server.
HTTP status codes
Once you send the request to the server, you will receive a response with a status code. Here are some responses that you might see:
Status Code | Description |
200 OK | Request has succeeded |
201 Created | Request has succeeded and a new resource has been created as a result |
400 Bad Request | Request could not be understood due to incorrect syntax |
403 Forbidden | Client does not have access rights to the content |
404 Not Found | Server can not find the requested resource |
500 Internal server error | Server encountered an unexpected condition that prevented it from fulfilling the request |
Any status codes in the 200s mean the request was successful (although this doesn’t necessarily mean it did what you wanted it to do). The 400s mean we did something wrong. 500s means something is likely wrong on the other end. We might see 401
, which means we either aren’t authorised to access what we are trying to access, or our authentication step went wrong. A 404
means the resource we are looking for was not found (just like for websites).
Please see the full list of HTTP response status codes.
The APIs that we’re going deal with today are public APIs - anyone can access them. However, usually you need to authenticate yourself to be able to use them (especially if you’re using a method that alters the database). There are a few ways to do that, but today we will discuss (and use) authentication using a personal access token.
Remember, it’s a secret key, that you never want to share with the world.
Never type you token in the console. This will be saved in the .Rhistory file. Also don’t include it in an R script (that you may share, accidentally or otherwise).
A good way to store your personal access token is to include it in the .Renviron
file and then use it be calling Sys.getenv('<ACCESS_TOKEN_NAME>')
For more details, you can have a look at this discussion on storing personal tokens.
Use APIs in R (20min)
First, install and load the {httr2} package.
Find your target
Our target will be Zenodo. Each API may have a different way of accessing the data, so you will have to read the documentation.
There are a couple of things you can do with the Zenodo API.
- Records
- create
- modify
- delete
- search
- specific record
- query records
- List
- user records
Some APIs require authentication to access the data. For Zenodo, we are required to create an API token. You can do this under Applications > Personal access tokens > +New token
Call it zenodo_api (or whatever you like).
Make sure you copy the access token. You will not be able to access it again once you navigate away from the page.
When you have copied the token, you will need to store it on your computer so that you can access it from R. Your access token is personal, so it’s important to make sure you don’t accidentally publish it.
Some useful ways to store the token are described in this blog post.
We will store it in .Renviron.In the console, type usethis::edit_r_environ()
. In the window that pops up, enter ZENODO_API=<your-access-token>
Now restart R.
We can retrieve (and store) the token using Sys.getenv()
<- Sys.getenv("ZENODO_TOKEN") token
Creating an API request
To construct an API call, we first need the base URL. You can find this in the API documentation for whatever site you are using. The base URL for Zenodo is https://zenodo.org/api/
. The base URL remains the same for all calls to the Zenodo API, so we can save it as a variable.
<- "https://zenodo.org/api/" zenodo_url
Accessing a Zenodo record
To access a specific record on Zenodo, we need the path to the resource we want to access. We will access a specific repository using the ‘records’ endpoint, records/:id
, where :id
is replaced with the record number.
The record we will access is here: https://zenodo.org/record/8376658
<- "records/8376658" path
Now we have the path, we need to add our authorisation, the request method, and a header with some additional information. We can do this by combining httr2 functions into a pipe. This will be the request we send to the API endpoint.
If we are retrieving something from the site, the method we use is ‘GET’. If we are creating something, like a new record, we would use ‘POST’, and if we are making modifications, we use ‘PATCH’. Since we are retrieving a record, we’ll be using ‘GET’,
<- request(paste0(zenodo_url, path))
req <- req |>
resp req_auth_bearer_token(token) |> # authentication
req_method("GET") |> # request method
req_headers("Accept" = "application/json") |>
GET /api/records/8376658 HTTP/1.1
Host: zenodo.org
User-Agent: httr2/0.2.3 r-curl/5.0.2 libcurl/7.84.0
Accept-Encoding: deflate, gzip
Authorization: <REDACTED>
Accept: application/json
Our call has been constructed. We can use req_dry_run()
to see what httr2 will send with the request. When we like what we see we can use req_perform()
instead of req_dry_run()
<- request(paste0(zenodo_url, path))
req <- req |>
resp req_auth_bearer_token(token) |> # authentication
req_method("GET") |> # request method
req_headers("Accept" = "application/json") |>
If successful, we want to see status_code: 200
. Let’s see…
: 200 ✅
You can find all the error messages and their meanings in the documentation.
The interesting part of our resp
is body
. But right now it’s unintelligible. We can extract the content using resp_body_json
(because we used ‘application/json’).
<- resp_body_json(resp) resp_content
Now we have stored information about the Zenodo record as a list in resp_content
. For example, we can take a look at the metadata:
$metadata resp_content
[1] "open"
[1] "oscd"
[1] "Delft University of Technology, Open Science Programme"
[1] "Michiel de Jong"
[1] "Delft University of Technology, Open Science Programme"
[1] "Marcell Várkonyi"
[1] "Delft University of Technology, Open Science Programme"
[1] "Tanya Yankelevich"
[1] "Delft University of Technology, Open Science Programme"
[1] "Frederique Belliard"
[1] "Delft University of Technology, Open Science Programme"
[1] "Just de Leeuwe"
[1] "Delft University of Technology, Open Science Programme"
[1] "Meta Keijzer- de Ruijter"
[1] "Delft University of Technology, Open Science Programme"
[1] "Julie Beardsell"
[1] "Delft University of Technology, Open Science Programme"
[1] "Santosh Ilamparuthi"
[1] "Delft University of Technology, Open Science Programme"
[1] "Jerry de Vos"
[1] "Delft University of Technology, Open Science Programme"
[1] "Ymke Bresser"
[1] "Janey Roanna company (graphic designer)"
[1] "Janey de Jong"
[1] "4TU.ResearchData"
[1] "Alessandra Soro"
[1] "Delft University of Technology"
[1] "Francesca Morselli"
[1] "<p>These posters about the TU Delft Open Science Programme, Open Science Community Delft and 4TU.ResearchData describe/provide links to support [services] available for researchers at TU Delft to make it easier/possible to practice Open Science.</p>"
[1] "10.5281/zenodo.8376658"
[1] "Open Science, FAIR Data, FAIR Software, Citizen Science, Open Hardware, Open Education"
[1] "eng"
[1] "CC-BY-4.0"
[1] "10.5281/zenodo.8376658"
[1] 8376658
[1] "2023-09-25"
[1] "10.5281/zenodo.8376657"
[1] "isVersionOf"
[1] "doi"
[1] "OPEN and FAIR community event_posters"
[1] "poster"
You may want to convert some elements of the response into a data frame instead of a list:
<- resp_body_string(resp)
json_response <- jsonlite::fromJSON(json_response, flatten = TRUE)
resp_df resp_df
[1] "10.5281/zenodo.8376657"
[1] "8376657"
[1] "2023-09-25T13:07:27.192656+00:00"
[1] "10.5281/zenodo.8376658"
[1] "https://doi.org/10.5281/zenodo.8376658"
checksum filename
1 80e921ca6d054de0d0047cdd8a433f81 4openscience_flyer.pdf
2 8f66f79c0aead2d274a8460698188ea7 4TU.ResearchData_poster.pdf
3 2b3acbb5f4e31803f9283db8efedad05 Citizen Science_poster.pdf
4 897eb34b6906291ddde91bb86502b9a3 Fair data & Fair software_poster.pdf
5 6331ba7687c398eb072517d4f4832698 Open Education_poster.pdf
6 6f955c027b37a7f5150ee70959c9be65 Open Hardware_poster.pdf
7 94eb93d6d24d9add9774d6c6cb118274 Open Publishing_poster.pdf
8 70622447f9ec16f7d11bb428fdae7246 OSCD Board_poster.pdf
filesize id
1 720199 2d23bb2d-2156-4137-ae36-d1a9d407b36c
2 1655267 09654bb7-c282-4356-ab35-f15974d3d7e9
3 1355262 51224840-0df9-422c-a972-97564a52b6e3
4 813944 42197cb3-500a-4f1b-9b5a-feca4828ae4b
5 890643 853bd4c5-5aa0-4228-a19e-8825fe94727d
6 770902 5492f57d-7874-4043-904d-d9454e1fc792
7 882710 d44c43a2-5781-472d-98a7-61022179f4bf
8 2540194 bdbf9bc7-a2df-4dd8-a57d-a88445ee0bed
1 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/4openscience_flyer.pdf
2 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/4TU.ResearchData_poster.pdf
3 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Citizen%20Science_poster.pdf
4 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Fair%20data%20%26%20Fair%20software_poster.pdf
5 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Education_poster.pdf
6 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Hardware_poster.pdf
7 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Publishing_poster.pdf
8 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/OSCD%20Board_poster.pdf
1 https://zenodo.org/api/deposit/depositions/8376658/files/2d23bb2d-2156-4137-ae36-d1a9d407b36c
2 https://zenodo.org/api/deposit/depositions/8376658/files/09654bb7-c282-4356-ab35-f15974d3d7e9
3 https://zenodo.org/api/deposit/depositions/8376658/files/51224840-0df9-422c-a972-97564a52b6e3
4 https://zenodo.org/api/deposit/depositions/8376658/files/42197cb3-500a-4f1b-9b5a-feca4828ae4b
5 https://zenodo.org/api/deposit/depositions/8376658/files/853bd4c5-5aa0-4228-a19e-8825fe94727d
6 https://zenodo.org/api/deposit/depositions/8376658/files/5492f57d-7874-4043-904d-d9454e1fc792
7 https://zenodo.org/api/deposit/depositions/8376658/files/d44c43a2-5781-472d-98a7-61022179f4bf
8 https://zenodo.org/api/deposit/depositions/8376658/files/bdbf9bc7-a2df-4dd8-a57d-a88445ee0bed
[1] 8376658
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.8376658.svg"
[1] "https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842"
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.8376657.svg"
[1] "https://doi.org/10.5281/zenodo.8376657"
[1] "https://doi.org/10.5281/zenodo.8376658"
[1] "https://zenodo.org/record/8376658"
[1] "https://zenodo.org/api/records/8376658"
[1] "https://zenodo.org/record/8376658"
[1] "https://zenodo.org/api/records/8376658"
[1] "open"
1 oscd
1 Delft University of Technology, Open Science Programme
2 Delft University of Technology, Open Science Programme
3 Delft University of Technology, Open Science Programme
4 Delft University of Technology, Open Science Programme
5 Delft University of Technology, Open Science Programme
6 Delft University of Technology, Open Science Programme
7 Delft University of Technology, Open Science Programme
8 Delft University of Technology, Open Science Programme
9 Delft University of Technology, Open Science Programme
10 Delft University of Technology, Open Science Programme
11 Janey Roanna company (graphic designer)
12 4TU.ResearchData
13 Delft University of Technology
1 Michiel de Jong
2 Marcell Várkonyi
3 Tanya Yankelevich
4 Frederique Belliard
5 Just de Leeuwe
6 Meta Keijzer- de Ruijter
7 Julie Beardsell
8 Santosh Ilamparuthi
9 Jerry de Vos
10 Ymke Bresser
11 Janey de Jong
12 Alessandra Soro
13 Francesca Morselli
[1] "<p>These posters about the TU Delft Open Science Programme, Open Science Community Delft and 4TU.ResearchData describe/provide links to support [services] available for researchers at TU Delft to make it easier/possible to practice Open Science.</p>"
[1] "10.5281/zenodo.8376658"
[1] "Open Science, FAIR Data, FAIR Software, Citizen Science, Open Hardware, Open Education"
[1] "eng"
[1] "CC-BY-4.0"
[1] "10.5281/zenodo.8376658"
[1] 8376658
[1] "2023-09-25"
identifier relation scheme
1 10.5281/zenodo.8376657 isVersionOf doi
[1] "OPEN and FAIR community event_posters"
[1] "poster"
[1] "2023-09-26T02:26:58.782942+00:00"
[1] 397445
[1] 8376658
[1] "done"
[1] TRUE
[1] "OPEN and FAIR community event_posters"
As long as we are only creating ‘GET’ calls, there’s not much you can mess up. Once you start working with ‘POST’ and ‘PATCH’ calls, it can get hairy because you are actually uploading and modifying content, so make sure you make use of req_dry_run()
We can also download the file.
<- resp_content$files[[7]]$links$download
file_url <- resp_content$files[[7]]$filename
file_name #file_url <- "https://zenodo.org/api/files/2f1c7e4c-71fb-4f71-9ab8-ca876274323c/DelftBicycleDataViewerAndData.zip"
request(file_url) |>
#req_auth_bearer_token(token) |> # authentication
req_method("GET") |> # request method
req_perform(path = file_name)
Status: 200 OK
Content-Type: application/octet-stream
Body: On disk 'body'
API query
We can also use the API to search for records. This is not longer possible from sandbox, so we need to modify the base URL. Then we can add a query to the API call. All parameters can be found in the documentation.
<- "https://zenodo.org/api/"
zenodo_url <- "records/"
<- request(paste0(zenodo_url, path))
req <- req |>
resp req_auth_bearer_token(token) |> # authentication
req_method("GET") |> # request method
"Accept" = "application/json") |>
req_url_query(q = "Delft Bicycle") |> # our query
Translate the body of the response and we have the same as before.
<- resp |>
resp2_content resp_body_json()
1]] resp2_content[[
[1] "610707"
[1] "2015-06-24T15:20:03+00:00"
[1] "10.5281/zenodo.18862"
[1] "https://doi.org/10.5281/zenodo.18862"
[1] "db2307016b7d736bbb7f6bd18835e24e"
[1] "DelftBicycleDataViewerAndData.zip"
[1] 1027881721
[1] "d013f688-745c-47a5-92ac-d156550eade1"
[1] "https://zenodo.org/api/files/13077bb6-e7fd-4b03-acdf-d615a0c1a87f/DelftBicycleDataViewerAndData.zip"
[1] "https://zenodo.org/api/deposit/depositions/18862/files/d013f688-745c-47a5-92ac-d156550eade1"
[1] 18862
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.18862.svg"
[1] "https://zenodo.org/api/files/13077bb6-e7fd-4b03-acdf-d615a0c1a87f"
[1] "https://doi.org/10.5281/zenodo.18862"
[1] "https://zenodo.org/record/18862"
[1] "https://zenodo.org/api/records/18862"
[1] "https://zenodo.org/record/18862"
[1] "https://zenodo.org/api/records/18862"
[1] "open"
[1] "zenodo"
[1] "TU Delft"
[1] "Moore, Jason K."
[1] "TU Delft"
[1] "Koojiman, J. D. G."
[1] "TU Delft"
[1] "Schwab, A. L"
[1] "<p>This is the data collected and analyzed in the following paper:</p>\n\n<p>Kooijman, J. D. G.; Schwab, A. L. & Moore, J. K. Some Observations on Human Control of a Bicycle Proceedings of the ASME 2009 International Design and Engineering Technical Conferences & Computers and Information in Engineering Conference, 2009</p>\n\n<p>It is in the a form to use with this software:</p>\n\n<p>https://github.com/moorepants/DelftBicycleDataViewer</p>"
[1] "10.5281/zenodo.18862"
[1] "bicycle"
[1] "dynamics"
[1] "control"
[1] "video"
[1] "CC0-1.0"
[1] "10.5281/zenodo.18862"
[1] 18862
[1] "2015-06-23"
[1] "Delft Instrumented Bicycle Data and Videos"
[1] "dataset"
[1] "2020-01-24T19:25:38.855324+00:00"
[1] 6017
[1] 18862
[1] "done"
[1] TRUE
[1] "Delft Instrumented Bicycle Data and Videos"
Please note that we’re using Zenodo as an example for demonstration purposes. You can access the Zenodo.org records via zen4R
Closer look at API documentation (15min)
So far, we’ve been pointing you to the specific points in the documentation. But oftentimes, when you want to use an API to retrieve the data, you will need to got through the documentation yourself.
4TU repository
Let’s have a look together at the documentation of the 4TU.ResearchData repository, to find our way around such a documentation.
First, you will need to log in to data.4tu.nl
. You can use your netID for this. Your landing page will be your Dashboard
, where you can upload a dataset or create a new collection. You can also look at your Sessions and API tokens
Below this table, there is information about the API documentation.
Your turn! (30min)
Now it’s time for you to practice: