"Pagination Transformer" on HTTP Calls

floris · July 4, 2023, 2:18pm

Hi Alumio,

We’re running into issues where we need to do some transformation before the pagination takes place.

Let’s say we’re calling an API endpoint using an ‘offset’ parameter for pagination. Currently, Alumio supports this but there’s an issue there: the ‘offset’ param is calculated by Alumio by using the expected entities value. But often, you’ll pre-filter the dataset before they become tasks (i.e. by filtering out everything with no changes).

So what happens is Alumio expects 10.000 entities, but only 100 are delivered because 99.900 are filtered. So Alumio guesses it’s the last page and doesn’t ask the next page (with an offset of 10.000).

I suppose there are multiple ways around this, but the option I’d love is a transformer that’s ‘in between’ the result coming from the Subscriber and the pagination logic.

So in the example above, I could:

Count the number in the array of the result (I expect 10.000, so anything less than that will result in it being the ‘last page’)
If the result is 10.000, I create a node called “nextPage” with the URL to call with the added param (offset = 10.000)
I set the Subscriber to use Pagination = Next Page

This would also solve another issue I sometimes encounter: when for example syncing pricelists, I use one call to get all the active ‘pricelist headers’ (with the code for the pricelist), and then a paginated call for all the prices within the pricelist. Currently there’s no way to create pagination based on other params then the page-number (or offset), but that’s exactly what I’d need in this case.

Thanks in advance!

Gugi · July 6, 2023, 11:24am

Hi @floris

Thank you for your valuable feedback.

You are correct that Alumio calculates the next offset parameter by using the “Expected number of items” field value. You are also correct that Alumio stops fetching the next page when it receives entities less than the number set in “Expected number of items” field.

Therefore, we will pass your feedback on to the team for further discussion of the implementation possibilities. We will be sure to get back to you once we have an update.

Feel free to let us know if you have any other feedback or ideas. Once again, thank you for your feedback.

floris · November 2, 2023, 11:52am

Hi Gugi,

I just read in another forum post that you could also use the “Response Decoder”: “JSON” with “Incremental Read Method”. That way, the result would already be parsed into entities. Do you know if - if you filter an entity out - all created entities are counted for the offset? In that case, most of my issue would be solved.

(Note that there are many other use-cases for my original feature request, including a use-case I just came upon, which uses ‘versioning’. Instead of ‘offset’, you’d simply use the last ‘version’ you called).

Gugi · November 6, 2023, 7:47am

Hi Floris,

I might have misunderstood your initial request a few months ago. In order to make the “Increase query parameter” pagination works, you should manually select JSON in “Response Decoder”. If you select “Whole file (for smaller files)”, you can put the pattern to the items. This way, Alumio will automatically separate the entities based on the pattern. Alumio counts the subscribed entities from HTTP subscriber, it doesn’t matter if you filter the entity using data filters in the incoming configuration or route.

Could you please clarify whether the “filter” you were referring to is the filter in the web service (API), or the Alumio data filters after the entities returned from HTTP subscriber?

floris · November 6, 2023, 12:00pm

Hi Gugi,

The filter I was referring to is filtering with Alumio data-filters. Use-case for example: getting all stock from an ERP, then checking in a storage whether that product is actually present within the eCommerce system and filtering out those products that aren’t there yet.

What I originally tried was not actually selecting a Response Decoder (it defaults to JSON → Whole File, I believe) but using a ‘get branches’. When I tried it then (haven’t tried it now, a few versions later), it only counted the entities “leaving” the incoming - so if you’d have 10.000 entities coming in and filtered out a few, it would think it’s the last page.

Gugi · November 7, 2023, 9:07am

Hi Floris,

Please let me know whether I get it correctly. You are referring to the data-filters you put here, right?

If that’s the case, the pagination will not take the number of filtered entities into account. The pagination functionality works within the HTTP subscriber and only takes the number of entities (not tasks) created by the HTTP subscriber into account.

Yes, if you don’t manually re-select JSON in the Response decoder, it should use “Whole file” as default option. The “Get branches from a pattern” entity transformer separates the entities, but it is not considered as the number of entities returned from HTTP subscriber.

For example, you get all stock from ERP with 10,000 limit per page. If it returns 10,000 entities, even if you have data filters in the incoming configuration to filter out 3,000 stocks that belong to non-exist products in eCommerce, the pagination will still fetch the next page. This only applies if you set the “Pattern to items” after selecting the “Whole file” in the JSON response decoder.

Please let me know if my explanation is not clear to you.

floris · November 7, 2023, 10:28am

Hi Gugi,

Thanks for the explanation!

Because we generally don’t use the ‘Response Decoder’ one if it’s JSON, we never really thought of using it - and never realized that it had any impact on the pagination.

I’d suggest adding this info in a tooltip for the pagination.

Note that the original feature request still stands. Although above solution fixes the most common issue, I have found two other types of pagination that are difficult to parse without the feature request mentioned in the topic:

Pagination by ‘offset-key’: some API’s paginate by saying “after ID=xyz”. If they do this and offer a ‘nextLink’, then it’s fine. However, some API’s don’t offer that nextLink. And then, because the ‘offset’ is determined by numbering, you can’t currently paginate in Alumio. If we’d have a ‘pagination transformer’, we could basically just call upon the last ID in the resultant entities and use that for building the ‘nextLink’.
Pagination by ‘header’: I often have to copy over pricelists. The call you’d do would be something like /getPrices?pricelist=XYZ . If you already know you have 50 pricelists, it’d be great if you can paginate on those. Using a pagination transformer, you could get another pricelist from a storage to use in building the nextLink.

Gugi · November 8, 2023, 8:40am

Hi Floris,

Thank you for your clarification. I’ll be sure to add the additional information regarding the pagination types to the feature request.

We appreciate your valuable feedback. Please don’t hesitate to let us know if you have any other feedback for Alumio.

floris · December 10, 2023, 2:13pm

Hi @Gugi ,

I’d like to give this a bump. I’ve just encountered a pagination method (PowerBI / Business Central related) that works like this:

You get a header containing a ‘Continuation Token’
The next request should contain the header “Continuation: &{continuation_token}”

As far as I know this is currently impossible to do in Alumio. A way around it right now would be to save the continuation token in a storage, and use that storage next time you trigger the incoming. This works for some use-cases, but for others not so much (i.e. stock when using something like an ‘updated_at’ timestamp. Mostly you’d not need to visit page two, but when a large stockupdate has happened, you want to immediately trigger all stockupdates).

floris · February 13, 2024, 8:08pm

Hi @Gugi ,

I’ve found a partial solution to one of the issues we’ve found.

When connecting to Scayle’s API, we had the issue we do not get a page, offset or next link. We do get a cursor (e.g. “SCJSX=”), which you can use to build the URL.

Because the “Pattern to the link for the next page” accepts a JMESPath query, I created this query:
join(‘’,[‘/api/admin/v1/products?with=variants.stocks&limit=500&cursor=’,cursor.next]) || false

This works, because it joins the hardcoded “normal URI” and adds the cursor to the params and gets the cursor from the response body (cursor.next).

However, the moment the cursor is empty (when the last page has been reached), the query will fail and the incoming will get into a failure state (which is annoying for logging purposes). You could use the pipes to fallback, however, that would basically mean you’d forever keep iterating, which isn’t really ideal.

This seems a fairly easy fix (compared to my last suggestion) when thinking of it this way: just a checkbox saying “Use next link as a parameter in the Request URI”.

Gugi · February 27, 2024, 10:07am

Hi @floris,

Thank you for your replies, and I am sorry for replying late.

We will also pass on your feedback regarding the ability to use the continuation (next page) token in the HTTP subscriber’s pagination options. We will let you know once we have an update.

Gugi · March 6, 2024, 12:48pm

Hi @floris

Would it be possible for you to provide us with the API documentation of systems that use the suggested types of pagination?

floris · April 7, 2025, 8:23am

Hi @Gugi ,

Apologies, didn’t see your response. I don’t have the official (it isn’t public) documentation of that API.

Currently, we’re running into a few situations where we have issues:

When the offset/page/etc. has to be set in a Header
When there’s a ‘cursor based pagination’ (i.e. Scayle: Pagination | Getting Started | SCAYLE)
When (current issue) you set both a ‘start’ and ‘end’ offset. So for page 2, you’d have to add ‘start=10&end=20’ in the URL.

Now, a few of these can be fixed by you guys in a structured manner, but unfortunately there’s a million different ways of doing pagination and you’ll never be able to support everything.

I would really love it if you’d look into building a more generic “Pagination Transformer”.