Chunking/batching in Response Decoder

floris · August 8, 2024, 9:04am

Hi Alumio,

I’m often running into comparable issues as these topics of the past few weeks:

Basically, you want to ‘batch’ entities (or rows), but the input (doesn’t really matter what it is, might be JSON or XML) is too large to do at once. You could do it per row, but that means you have to create your own way of doing batches (which is something I do now, using storages and a lot of logic).

I had a case a while back where I need to (often, like once or twice per hour) run through a CSV with ‘price rules’ of around 250.000 lines. It seems that the “consumer received an entity from subscriber” action takes around 2-3ms of overhead (not sure, can’t really measure that, so just a guess). When working with 150.000 lines, that means 150.000 entities, which means 7-8 minutes.

Then I found that the CSV decoder has the option baked in to ‘batch’ (“Items in group” / “The amount of items that are bundled in single entity. 0 means the entire file in one entity.”) lines, up to a maximum of 1440. The time went from 7 minutes to 6 seconds! That’s because it went from 150.000 entities it had to work through, to 104. The actual gains are even bigger: because I was able to remove my own ‘batching’ mechanism (which means writing to a storage entity for every entity it works on), the actual “time to task” went from like 15 minutes to 7 seconds.

What I would like to see: if every Response Decoder has, like the CSV decoder has, a built in way to ‘group’/‘batch’ lines. So let’s say you use the JSON Response Decoder, with the Read Method ‘incremental’ and the path customers[] and you want to work with batches of 1.000, you could just set the “Items in group” to 1.000. The entity you’d get would just be an array of those 1.000 first (and in later entities the following) entities.

This would be very useful in our toolkit. When working with large datasets, primarily you need to take care you don’t use too much memory. For memory usage, it is always best to work with as small a dataset as possible. For speed however, the opposite is often true: when having to run through 100.000 lines, it is likely fastest to run through it all at once.

Right now, we often only have the option to do it per entity (or row) or as one whole. We need something in between.

Gugi · August 9, 2024, 8:02am

Hi Floris,

Thank you for your feedback.

Could you please confirm whether you want to be able to group items while using the incremental read method for every response decoder?

floris · August 9, 2024, 8:18am

Hi Gugi,

Preferably yes, but I’ve not had cases outside of XML & JSON (and CSV, but that one already has an option built-in).

Gugi · August 13, 2024, 2:41pm

Thank you for the answer. We will need to pass this on to the team to see if it’s possible to implement in the future.

We will let you know once we have an update.

Gugi · September 13, 2024, 8:22am

Hi @floris

Thank you for your feature request and feedback.

We’re pleased to inform you that your suggestion has been accepted by our development team. However, we cannot commit to a specific timeline for its implementation just yet.

Thank you for your patience and for helping us improve our product!