Use APIs to source your data in R

Welcome (5min)

  • TU Delft R Cafe is an initiative supported by Open Science Community Delft (OSCD)
  • We’re organizing a Plot-a-thon (11/10); Next session 17th November
  • Ice breaking

What is API? (10min)

An API (Application Programming Interface) is used for computer programs to communicate with each other.

In the session today, we will focus on the APIs that allow us to upload, download and search for data. In this context, API is an intermediary between a dataset (usually a very large one) and the rest of the world. APIs provide an accessible way to request a dataset, which is referred to as making a call to the API.

The commonly used type of an API, that we will also use today, is a REST (Representational State Transfer) API (also called RESTful). This kind of API uses HTTP protocol to send request to a server and receive a standardized response.

REST API methods

Source: https://www.numpyninja.com/post/rest-api-for-dummies-explained-using-mommies

There are five HTTP methods that you can use when making an API request:

  • GET - This method is used to retrieve a data from database / server.

  • POST - This method is used to create a new record.

  • PUT - This method is used to modify / replace the record. It replaces the entire record.

  • PATCH - This method is used to modify / update the record. It replaces parts of the record.

  • DELETE - This method is used to delete the record.

Request structure

Source: https://www.altexsoft.com/blog/rest-api-design/

Apart from the HTTP methods, you need a few other components to make the API request. The components are:

  • HTTP method - to explain what action you want to perform
  • endpoint - a URL to find the resource you are trying to reach on the Internet. The endpoint contains of Base URL (or root endpoint) - a consistent part of the URL to use and relative URL - reference to specific resource you want to access.
  • headers - provides information relevant both for client (us) and the server.It can be used for example for authentication or to provide information about the body content.See the full list of valid HTTP headers
  • body - contains data that you want to send to the server.
Passing parameters
  • GET request parameters are usually included in the endpoint URL.
  • PUT and POST methods accept parameters in the request body.

HTTP status codes

Once you send the request to the server, you will receive a response with a status code. Here are some responses that you might see:

Status Code Description
200 OK Request has succeeded
201 Created Request has succeeded and a new resource has been created as a result
400 Bad Request Request could not be understood due to incorrect syntax
403 Forbidden Client does not have access rights to the content
404 Not Found Server can not find the requested resource
500 Internal server error Server encountered an unexpected condition that prevented it from fulfilling the request

Any status codes in the 200s mean the request was successful (although this doesn’t necessarily mean it did what you wanted it to do). The 400s mean we did something wrong. 500s means something is likely wrong on the other end. We might see 401, which means we either aren’t authorised to access what we are trying to access, or our authentication step went wrong. A 404 means the resource we are looking for was not found (just like for websites).

Please see the full list of HTTP response status codes.

Authentication

The APIs that we’re going deal with today are public APIs - anyone can access them. However, usually you need to authenticate yourself to be able to use them (especially if you’re using a method that alters the database). There are a few ways to do that, but today we will discuss (and use) authentication using a personal access token.

Remember, it’s a secret key, that you never want to share with the world.

More Important

Never type you token in the console. This will be saved in the .Rhistory file. Also don’t include it in an R script (that you may share, accidentally or otherwise).

A good way to store your personal access token is to include it in the .Renviron file and then use it be calling Sys.getenv('<ACCESS_TOKEN_NAME>').

For more details, you can have a look at this discussion on storing personal tokens.

Use APIs in R (20min)

Requirements

First, install and load the {httr2} package.

# install.packages(c("httr2", "jsonlite", "usethis"))
library(httr2)

Find your target

Our target will be Zenodo. Each API may have a different way of accessing the data, so you will have to read the documentation.

There are a couple of things you can do with the Zenodo API.

  • Records
    • create
    • modify
    • delete
    • search
      • specific record
      • query records
  • List
    • user records

Authentication

Some APIs require authentication to access the data. For Zenodo, we are required to create an API token. You can do this under Applications > Personal access tokens > +New token

Call it zenodo_api (or whatever you like).

Important

Make sure you copy the access token. You will not be able to access it again once you navigate away from the page.

When you have copied the token, you will need to store it on your computer so that you can access it from R. Your access token is personal, so it’s important to make sure you don’t accidentally publish it.

Some useful ways to store the token are described in this blog post.

We will store it in .Renviron.In the console, type usethis::edit_r_environ(). In the window that pops up, enter ZENODO_API=<your-access-token>.

Now restart R.

We can retrieve (and store) the token using Sys.getenv():

token <- Sys.getenv("ZENODO_TOKEN")

Creating an API request

To construct an API call, we first need the base URL. You can find this in the API documentation for whatever site you are using. The base URL for Zenodo is https://zenodo.org/api/. The base URL remains the same for all calls to the Zenodo API, so we can save it as a variable.

zenodo_url <- "https://zenodo.org/api/"

Accessing a Zenodo record

To access a specific record on Zenodo, we need the path to the resource we want to access. We will access a specific repository using the ‘records’ endpoint, records/:id, where :id is replaced with the record number.

The record we will access is here: https://zenodo.org/record/8376658

path <- "records/8376658"

Now we have the path, we need to add our authorisation, the request method, and a header with some additional information. We can do this by combining httr2 functions into a pipe. This will be the request we send to the API endpoint.

If we are retrieving something from the site, the method we use is ‘GET’. If we are creating something, like a new record, we would use ‘POST’, and if we are making modifications, we use ‘PATCH’. Since we are retrieving a record, we’ll be using ‘GET’,

req <- request(paste0(zenodo_url, path))
resp <- req |>
  req_auth_bearer_token(token) |> # authentication
  req_method("GET") |> # request method
  req_headers("Accept" = "application/json") |>
  req_dry_run()
GET /api/records/8376658 HTTP/1.1
Host: zenodo.org
User-Agent: httr2/0.2.3 r-curl/5.0.2 libcurl/7.84.0
Accept-Encoding: deflate, gzip
Authorization: <REDACTED>
Accept: application/json

Our call has been constructed. We can use req_dry_run() to see what httr2 will send with the request. When we like what we see we can use req_perform() instead of req_dry_run().

req <- request(paste0(zenodo_url, path))
resp <- req |>
  req_auth_bearer_token(token) |> # authentication
  req_method("GET") |> # request method
  req_headers("Accept" = "application/json") |>
  req_perform()

If successful, we want to see status_code: 200. Let’s see…

resp$status_code: 200 ✅

You can find all the error messages and their meanings in the documentation.

The interesting part of our resp is body. But right now it’s unintelligible. We can extract the content using resp_body_json (because we used ‘application/json’).

resp_content <- resp_body_json(resp)

Now we have stored information about the Zenodo record as a list in resp_content. For example, we can take a look at the metadata:

resp_content$metadata
$access_right
[1] "open"

$communities
$communities[[1]]
$communities[[1]]$identifier
[1] "oscd"



$creators
$creators[[1]]
$creators[[1]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[1]]$name
[1] "Michiel de Jong"


$creators[[2]]
$creators[[2]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[2]]$name
[1] "Marcell Várkonyi"


$creators[[3]]
$creators[[3]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[3]]$name
[1] "Tanya Yankelevich"


$creators[[4]]
$creators[[4]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[4]]$name
[1] "Frederique Belliard"


$creators[[5]]
$creators[[5]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[5]]$name
[1] "Just de Leeuwe"


$creators[[6]]
$creators[[6]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[6]]$name
[1] "Meta Keijzer- de Ruijter"


$creators[[7]]
$creators[[7]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[7]]$name
[1] "Julie Beardsell"


$creators[[8]]
$creators[[8]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[8]]$name
[1] "Santosh Ilamparuthi"


$creators[[9]]
$creators[[9]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[9]]$name
[1] "Jerry de Vos"


$creators[[10]]
$creators[[10]]$affiliation
[1] "Delft University of Technology, Open Science Programme"

$creators[[10]]$name
[1] "Ymke Bresser"


$creators[[11]]
$creators[[11]]$affiliation
[1] "Janey Roanna company (graphic designer)"

$creators[[11]]$name
[1] "Janey de Jong"


$creators[[12]]
$creators[[12]]$affiliation
[1] "4TU.ResearchData"

$creators[[12]]$name
[1] "Alessandra Soro"


$creators[[13]]
$creators[[13]]$affiliation
[1] "Delft University of Technology"

$creators[[13]]$name
[1] "Francesca Morselli"



$description
[1] "<p>These posters about the TU Delft Open Science Programme,&nbsp;Open Science Community Delft and 4TU.ResearchData describe/provide links to support [services] available&nbsp;for researchers at TU Delft to make it easier/possible to practice Open Science.</p>"

$doi
[1] "10.5281/zenodo.8376658"

$keywords
$keywords[[1]]
[1] "Open Science, FAIR Data, FAIR Software, Citizen Science, Open Hardware, Open Education"


$language
[1] "eng"

$license
[1] "CC-BY-4.0"

$prereserve_doi
$prereserve_doi$doi
[1] "10.5281/zenodo.8376658"

$prereserve_doi$recid
[1] 8376658


$publication_date
[1] "2023-09-25"

$related_identifiers
$related_identifiers[[1]]
$related_identifiers[[1]]$identifier
[1] "10.5281/zenodo.8376657"

$related_identifiers[[1]]$relation
[1] "isVersionOf"

$related_identifiers[[1]]$scheme
[1] "doi"



$title
[1] "OPEN and FAIR community event_posters"

$upload_type
[1] "poster"

You may want to convert some elements of the response into a data frame instead of a list:

json_response <- resp_body_string(resp)
resp_df <- jsonlite::fromJSON(json_response, flatten = TRUE)
resp_df
$conceptdoi
[1] "10.5281/zenodo.8376657"

$conceptrecid
[1] "8376657"

$created
[1] "2023-09-25T13:07:27.192656+00:00"

$doi
[1] "10.5281/zenodo.8376658"

$doi_url
[1] "https://doi.org/10.5281/zenodo.8376658"

$files
                          checksum                             filename
1 80e921ca6d054de0d0047cdd8a433f81               4openscience_flyer.pdf
2 8f66f79c0aead2d274a8460698188ea7          4TU.ResearchData_poster.pdf
3 2b3acbb5f4e31803f9283db8efedad05           Citizen Science_poster.pdf
4 897eb34b6906291ddde91bb86502b9a3 Fair data & Fair software_poster.pdf
5 6331ba7687c398eb072517d4f4832698            Open Education_poster.pdf
6 6f955c027b37a7f5150ee70959c9be65             Open Hardware_poster.pdf
7 94eb93d6d24d9add9774d6c6cb118274           Open Publishing_poster.pdf
8 70622447f9ec16f7d11bb428fdae7246                OSCD Board_poster.pdf
  filesize                                   id
1   720199 2d23bb2d-2156-4137-ae36-d1a9d407b36c
2  1655267 09654bb7-c282-4356-ab35-f15974d3d7e9
3  1355262 51224840-0df9-422c-a972-97564a52b6e3
4   813944 42197cb3-500a-4f1b-9b5a-feca4828ae4b
5   890643 853bd4c5-5aa0-4228-a19e-8825fe94727d
6   770902 5492f57d-7874-4043-904d-d9454e1fc792
7   882710 d44c43a2-5781-472d-98a7-61022179f4bf
8  2540194 bdbf9bc7-a2df-4dd8-a57d-a88445ee0bed
                                                                                                    links.download
1                         https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/4openscience_flyer.pdf
2                    https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/4TU.ResearchData_poster.pdf
3                   https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Citizen%20Science_poster.pdf
4 https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Fair%20data%20%26%20Fair%20software_poster.pdf
5                    https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Education_poster.pdf
6                     https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Hardware_poster.pdf
7                   https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Publishing_poster.pdf
8                        https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/OSCD%20Board_poster.pdf
                                                                                     links.self
1 https://zenodo.org/api/deposit/depositions/8376658/files/2d23bb2d-2156-4137-ae36-d1a9d407b36c
2 https://zenodo.org/api/deposit/depositions/8376658/files/09654bb7-c282-4356-ab35-f15974d3d7e9
3 https://zenodo.org/api/deposit/depositions/8376658/files/51224840-0df9-422c-a972-97564a52b6e3
4 https://zenodo.org/api/deposit/depositions/8376658/files/42197cb3-500a-4f1b-9b5a-feca4828ae4b
5 https://zenodo.org/api/deposit/depositions/8376658/files/853bd4c5-5aa0-4228-a19e-8825fe94727d
6 https://zenodo.org/api/deposit/depositions/8376658/files/5492f57d-7874-4043-904d-d9454e1fc792
7 https://zenodo.org/api/deposit/depositions/8376658/files/d44c43a2-5781-472d-98a7-61022179f4bf
8 https://zenodo.org/api/deposit/depositions/8376658/files/bdbf9bc7-a2df-4dd8-a57d-a88445ee0bed

$id
[1] 8376658

$links
$links$badge
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.8376658.svg"

$links$bucket
[1] "https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842"

$links$conceptbadge
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.8376657.svg"

$links$conceptdoi
[1] "https://doi.org/10.5281/zenodo.8376657"

$links$doi
[1] "https://doi.org/10.5281/zenodo.8376658"

$links$html
[1] "https://zenodo.org/record/8376658"

$links$latest
[1] "https://zenodo.org/api/records/8376658"

$links$latest_html
[1] "https://zenodo.org/record/8376658"

$links$self
[1] "https://zenodo.org/api/records/8376658"


$metadata
$metadata$access_right
[1] "open"

$metadata$communities
  identifier
1       oscd

$metadata$creators
                                              affiliation
1  Delft University of Technology, Open Science Programme
2  Delft University of Technology, Open Science Programme
3  Delft University of Technology, Open Science Programme
4  Delft University of Technology, Open Science Programme
5  Delft University of Technology, Open Science Programme
6  Delft University of Technology, Open Science Programme
7  Delft University of Technology, Open Science Programme
8  Delft University of Technology, Open Science Programme
9  Delft University of Technology, Open Science Programme
10 Delft University of Technology, Open Science Programme
11                Janey Roanna company (graphic designer)
12                                       4TU.ResearchData
13                         Delft University of Technology
                       name
1           Michiel de Jong
2          Marcell Várkonyi
3         Tanya Yankelevich
4       Frederique Belliard
5            Just de Leeuwe
6  Meta Keijzer- de Ruijter
7           Julie Beardsell
8       Santosh Ilamparuthi
9              Jerry de Vos
10             Ymke Bresser
11            Janey de Jong
12          Alessandra Soro
13       Francesca Morselli

$metadata$description
[1] "<p>These posters about the TU Delft Open Science Programme,&nbsp;Open Science Community Delft and 4TU.ResearchData describe/provide links to support [services] available&nbsp;for researchers at TU Delft to make it easier/possible to practice Open Science.</p>"

$metadata$doi
[1] "10.5281/zenodo.8376658"

$metadata$keywords
[1] "Open Science, FAIR Data, FAIR Software, Citizen Science, Open Hardware, Open Education"

$metadata$language
[1] "eng"

$metadata$license
[1] "CC-BY-4.0"

$metadata$prereserve_doi
$metadata$prereserve_doi$doi
[1] "10.5281/zenodo.8376658"

$metadata$prereserve_doi$recid
[1] 8376658


$metadata$publication_date
[1] "2023-09-25"

$metadata$related_identifiers
              identifier    relation scheme
1 10.5281/zenodo.8376657 isVersionOf    doi

$metadata$title
[1] "OPEN and FAIR community event_posters"

$metadata$upload_type
[1] "poster"


$modified
[1] "2023-09-26T02:26:58.782942+00:00"

$owner
[1] 397445

$record_id
[1] 8376658

$state
[1] "done"

$submitted
[1] TRUE

$title
[1] "OPEN and FAIR community event_posters"

As long as we are only creating ‘GET’ calls, there’s not much you can mess up. Once you start working with ‘POST’ and ‘PATCH’ calls, it can get hairy because you are actually uploading and modifying content, so make sure you make use of req_dry_run().

We can also download the file.

file_url <- resp_content$files[[7]]$links$download
file_name <- resp_content$files[[7]]$filename
#file_url <- "https://zenodo.org/api/files/2f1c7e4c-71fb-4f71-9ab8-ca876274323c/DelftBicycleDataViewerAndData.zip"

request(file_url) |>
  #req_auth_bearer_token(token) |> # authentication
  req_method("GET") |> # request method
  req_perform(path = file_name)
<httr2_response>
GET
https://zenodo.org/api/files/f91201b7-43e9-4f36-b229-c7b039bb1842/Open%20Publishing_poster.pdf
Status: 200 OK
Content-Type: application/octet-stream
Body: On disk 'body'

API query

We can also use the API to search for records. This is not longer possible from sandbox, so we need to modify the base URL. Then we can add a query to the API call. All parameters can be found in the documentation.

zenodo_url <- "https://zenodo.org/api/"
path <- "records/"

req <- request(paste0(zenodo_url, path))
resp <- req |>
  req_auth_bearer_token(token) |> # authentication
  req_method("GET") |> # request method
  req_headers(
    "Accept" = "application/json") |>
  req_url_query(q = "Delft Bicycle") |> # our query
  req_perform()

Translate the body of the response and we have the same as before.

resp2_content <- resp |>
  resp_body_json()
resp2_content[[1]]
$conceptrecid
[1] "610707"

$created
[1] "2015-06-24T15:20:03+00:00"

$doi
[1] "10.5281/zenodo.18862"

$doi_url
[1] "https://doi.org/10.5281/zenodo.18862"

$files
$files[[1]]
$files[[1]]$checksum
[1] "db2307016b7d736bbb7f6bd18835e24e"

$files[[1]]$filename
[1] "DelftBicycleDataViewerAndData.zip"

$files[[1]]$filesize
[1] 1027881721

$files[[1]]$id
[1] "d013f688-745c-47a5-92ac-d156550eade1"

$files[[1]]$links
$files[[1]]$links$download
[1] "https://zenodo.org/api/files/13077bb6-e7fd-4b03-acdf-d615a0c1a87f/DelftBicycleDataViewerAndData.zip"

$files[[1]]$links$self
[1] "https://zenodo.org/api/deposit/depositions/18862/files/d013f688-745c-47a5-92ac-d156550eade1"




$id
[1] 18862

$links
$links$badge
[1] "https://zenodo.org/badge/doi/10.5281/zenodo.18862.svg"

$links$bucket
[1] "https://zenodo.org/api/files/13077bb6-e7fd-4b03-acdf-d615a0c1a87f"

$links$doi
[1] "https://doi.org/10.5281/zenodo.18862"

$links$html
[1] "https://zenodo.org/record/18862"

$links$latest
[1] "https://zenodo.org/api/records/18862"

$links$latest_html
[1] "https://zenodo.org/record/18862"

$links$self
[1] "https://zenodo.org/api/records/18862"


$metadata
$metadata$access_right
[1] "open"

$metadata$communities
$metadata$communities[[1]]
$metadata$communities[[1]]$identifier
[1] "zenodo"



$metadata$creators
$metadata$creators[[1]]
$metadata$creators[[1]]$affiliation
[1] "TU Delft"

$metadata$creators[[1]]$name
[1] "Moore, Jason K."


$metadata$creators[[2]]
$metadata$creators[[2]]$affiliation
[1] "TU Delft"

$metadata$creators[[2]]$name
[1] "Koojiman, J. D. G."


$metadata$creators[[3]]
$metadata$creators[[3]]$affiliation
[1] "TU Delft"

$metadata$creators[[3]]$name
[1] "Schwab, A. L"



$metadata$description
[1] "<p>This is the data collected and analyzed in the following paper:</p>\n\n<p>Kooijman, J. D. G.; Schwab, A. L. &amp; Moore, J. K. Some Observations on Human Control of a Bicycle Proceedings of the ASME 2009 International Design and Engineering Technical Conferences &amp; Computers and Information in Engineering Conference, 2009</p>\n\n<p>It is in the a form to use with this software:</p>\n\n<p>https://github.com/moorepants/DelftBicycleDataViewer</p>"

$metadata$doi
[1] "10.5281/zenodo.18862"

$metadata$keywords
$metadata$keywords[[1]]
[1] "bicycle"

$metadata$keywords[[2]]
[1] "dynamics"

$metadata$keywords[[3]]
[1] "control"

$metadata$keywords[[4]]
[1] "video"


$metadata$license
[1] "CC0-1.0"

$metadata$prereserve_doi
$metadata$prereserve_doi$doi
[1] "10.5281/zenodo.18862"

$metadata$prereserve_doi$recid
[1] 18862


$metadata$publication_date
[1] "2015-06-23"

$metadata$title
[1] "Delft Instrumented Bicycle Data and Videos"

$metadata$upload_type
[1] "dataset"


$modified
[1] "2020-01-24T19:25:38.855324+00:00"

$owner
[1] 6017

$record_id
[1] 18862

$state
[1] "done"

$submitted
[1] TRUE

$title
[1] "Delft Instrumented Bicycle Data and Videos"
zen4R package

Please note that we’re using Zenodo as an example for demonstration purposes. You can access the Zenodo.org records via zen4R package.

Closer look at API documentation (15min)

So far, we’ve been pointing you to the specific points in the documentation. But oftentimes, when you want to use an API to retrieve the data, you will need to got through the documentation yourself.

4TU repository

Let’s have a look together at the documentation of the 4TU.ResearchData repository, to find our way around such a documentation.

First, you will need to log in to data.4tu.nl. You can use your netID for this. Your landing page will be your Dashboard, where you can upload a dataset or create a new collection. You can also look at your Sessions and API tokens.

Below this table, there is information about the API documentation.

Discrepancies from Figshare documentation

One thing to note here is that the 4TU repository went through a major change recently. Namely, so far it has been supported by a commercial platform called Figshare. Right now, it is based on an open-source software developed in-house, called Djehuty. To accommodate the needs of everybody that have been using 4TU.ResearchData API, Djehuty is compatible with version v2 of Figshare API, with a few important changes:

  • base URL is different than in the Figshare documentation
  • we need to generate a new personal access token
  • the new datasets will only have the uuid number, while id will be empty.

Figshare API documentation

Let’s now look into the Figshare API documentation.

Base URL

First, we see a Base URL (the common base of the web address we will use to request resources) which in our case will be

https://data.4tu.nl/v2

API description

Let’s now go to the API description.

Sending parameters

Here we find out that GET request parameters are included in the endpoint URL, while PUT and POST methods accept parameters in the request body.

One important information is that the API only accepts application/json body.

Authentication

This section describes how the access token should be included in the request. The preferred option is via HTTP header:

Authorization : token <ACCESS_TOKEN>
Searching filtering and pagination

Here you can find some parameters that are applicable to server responses that return lists. You can see that by default the API will retrieve one page with 10 results. If you expect more results, you can increase page and page_size.

You can order your list by a specific field.

You can search for a specific term using the search_for parameter.

Articles

When looking through the API documentation, you need to get familiar with the vocabulary used in each repository. In Zenodo we worked with ‘depositions’ and ‘records’.

In the Figshare documentation we deal with Articles. There are also other entities like Collections, but we won’t delve into that ). When you look through the Articles section, you can see division between Public articles and Private articles. This is roughly the same as Records and Depositions in Zenodo, respectively.

In the Public article(s) you will find information on how to build requests regarding publicly available records. In the Private article(s), you will see details on how to modify data sets on your own account, that have not been published.

For example, when we go to the Public articles search we will see, that unlike in the case of Zenodo, we need to use the POST method to perform the search. We can also find the endpoint here which is Base URL+ /articles/search.

As we found out earlier we need to:

  • use the each parameter search_for to look for specific terms
  • we need to include it in the body of the text.

So the request would be something like

req <- request(paste0(base_url, "/articles/search"))

resp <- req |>
  req_method("GET") |> 
  req_body_json(list(search_for = "search term")) |>
  req_perform()

Your turn! (30min)

Now it’s time for you to practice:

Option 1: 4TU

Find all resources related to ‘cycling’

  • Go to data.4tu.nl to see the documentation
  • Log in and create an API access token
  • Safely store it in your environment
  • Search for a term key
  • Convert the json file to a data.frame()
Code
# base URL:
api4tu_url <- "https://data.4tu.nl/v2"
path <- "/articles/search"
api4tu_token <- Sys.getenv("API_TOKEN_4TU")


req <- request(paste0(api4tu_url, path))
resp <- req |>
  req_auth_bearer_token(api4tu_token) |> 
  req_method("POST") |> 
  req_body_json(list(search_for = "cycling", 
                     order = "title", 
                     order_direction = "asc",
                     page = 1,
                     page_size = 1000
                     )) |>
  req_perform()

resp_content <- resp_body_json(resp, simplifyVector = TRUE) 

json_response <- resp_body_string(resp)
resp_df <- jsonlite::fromJSON(json_response, flatten = TRUE)

Additional exercise

  • Select one article from the list and download files pertaining to the article
Code
#retrieve file ID 
article_id <- resp_df$uuid[1]

path <- paste0("/articles/", article_id)

req <- request(paste0(api4tu_url, path))
resp <- req |>
  req_auth_bearer_token(api4tu_token) |> 
  req_method("GET") |>
  req_perform()

resp_content <- resp_body_json(resp, simplifyVector = TRUE) 

file_url <- resp_content$files$download_url[1]
file_name <- resp_content$files$name[1]

# Download file
req <- request(file_url)
resp <- req |>
  req_auth_bearer_token(api4tu_token) |>
  req_method("GET") |>
  req_perform(path = file_name)
Option 2: Zenodo

Find all resources related to ‘beetroot’

  • Go to https://developers.zenodo.org/ to see documentation
  • Log in and create an API access token
  • Safely store it in your environment
  • Look for data about your favorite artist
  • Convert the json file to a data.frame()
Code
# base URL:
zenodo_url <- "https://zenodo.org/api/"
path <- "records"
zenodo_token <- Sys.getenv("ZENODO_API")


req <- request(paste0(zenodo_url, path))
resp <- req |>
  req_auth_bearer_token(zenodo_token) |> # authentication
  req_method("GET") |> # request method
  req_headers("Accept" = "application/json") |>
  req_url_query(q = "beetroot", 
                     size = "100", 
                     sort = "bestmatch" 
                     ) |>
  req_perform()

resp_content <- resp_body_json(resp, simplifyVector = TRUE) 


json_response <- resp_body_string(resp)
resp_df <- jsonlite::fromJSON(json_response, flatten = TRUE)

Additional exercise

  • Select one article from the list and download files pertaining to the article
Code
# Get the download link
flattened_data <- resp_df %>%
    tidyr::unnest(cols = c(files), names_sep = ".")

file_link <- flattened_data$files.links.download[3] # resp_df$files[[1]]$links.download
file_name <- flattened_data$files.filename[3]

# Download - doesn't work!
req <- request(file_link)
resp_rec <- req |>
  req_method("GET") |>
  req_perform(path = file_name)
Option 3: The API of your choice

Use API you’re interested in