How to fetch entities from an API endpoint with pagination using HTTP Subscriber

Introduction

Pagination is a common approach provided by a system in order to prevent a lot of data to be loaded at once. It is usually implemented on front-end, as well as back-end (API).

When using Alumio, you may face a requirement to fetch all data/entities at once from an API endpoint of a system. But how do we deal with this if the entities are paginated?

Alumio provides a functionality on how to handle pagination of an (HTTP) API endpoint. The option is available in HTTP Subscriber within when you are creating an Incoming configuration.

Use Cases

API provides a URL for the next page in the response body

There are some systems that use this approach to let users know where to fetch to get the entities in the next page. For example, some Microsoft Dynamics systems use OData for their API that defines the standard/specification for the use of next page’s URL (nextLink).

In this case, you should select “Follow next links” in the “Follow pagination” options. Then, fill in the pattern to the property that holds the URL to the next page, such as below.

API provides a URL for the next page in the response header

Some systems like Shopify uses this kind of approach, and it’s defined in RFC8288. They put the URL to the next page in a header named Link, such as below.

Link: "<https://{shop}.myshopify.com/admin/api/{api_version}/products.json?page_info=abcdefg&limit=3>; rel=previous, <https://{shop}.myshopify.com/admin/api/{api_version}/products.json?page_info=opqrstu&limit=3>; rel=next"

In order to deal with this type of pagination, you should use “Follow next links in the HTTP header (RFC8288)” in the “Follow pagination” options.

API requires the number of the page/offset in the request query parameter

Another common pagination approach that an API has is that it requires us to pass the number of the page we want to fetch the entities from. It is used by Magento 2, BigCommerce, WooCommerce, and so on. The query parameter is commonly formed such as below.

GET https://somesystems.com/api/entity?page=1&per_page=100

In this case, you can use the “Increase query parameter” option in “Follow pagination”. Then, set the “Query parameter to set for paging” with the name of the parameter that will hold the page number. You are also required to fill in the “Number of expected items” per page. In case of the above URL, then you should fill 100 in the field.

There are two modes when you use this type of pagination, which are “Increase page number” when the system requires page number, and “Increase offset (number of items)” when the system requires offset.

In order to let the pagination works properly for this case, you should set the “Response decoder” to “JSON” and set the “Pattern to items” to the path of the array that holds the items/entities in the response body. For example, the items are located in a property called data in the response body. Therefore, you should fill in the field with data.

API requires the number of the page/offset in the request body parameter

There is one pagination approach that is not very common but exists out there, which requires us to pass the page number in the request body parameter. The sample request is below.

POST https://somesystems.com/api/entities
{
  "page": 1,
  "per_page": 100
}

In order to handle this type of pagination, you can follow the same approach as the previous use case, except setting “Increase body parameter” as the “Follow pagination”. You are also required to fill in the “Body parameter to set for paging” field. In the case of the above example request, you should fill in the field with page, such as below.

Additional Configuration

You can optionally limit the number of pages to fetch by the subscriber. The default value is 100, so Alumio will automatically stop fetching other pages once it has fetched 100 pages. You can set it in the “Maximum number of pages to fetch” field.

Sometimes, you don’t want Alumio to fetch all pages in a single run of incoming configuration. You can configure Alumio to save the state of the next page it has to fetch in the next run by configuring a storage to track the progress.

For example, you set the Maximum number of pages to fetch to 100. Once it has fetched 1,000 pages, Alumio stop the subscriber and save the state of the page it should fetch in the next run to the selected storage. Then the incoming configuration proceeds with the entity processing. In the next run of the incoming configuration, the subscriber will check the storage whether it has any entities that contain the pagination state. If it exists, Alumio will use it to build the HTTP request including the pagination parameters.

1 Like