"Last Modified" pattern while using the incremental Response Decoder

floris · November 28, 2023, 12:00pm

Hi Alumio,

I’m wondering about something. Generally I like to use this pattern for getting entities (doesn’t really matter what) from a REST API endpoint:
Input Transformers

Value Setter to set datetime in accepted format for the REST API endpoint, setting it back for example 4 hours
Get From Storage where I get the datetime value from the storage

Response Decoder
On default, so JSON/Whole File, not inputting a pattern

Entity Transformers

Value Setter, date is now or for example 1 minute ago or a few seconds ago
Save to Storage with the new datetime
Get Branches to the entities

This pattern works great for most usecases. When the other API is unavailable for a few hours, putting the ‘save to storage’ for the new datetime on the entity transformer side, means that when the other API is unresponsive, the next run will still execute using the older date - so you’re not missing any updates.

However, I’ve recently learned to start using the Response Decoder: JSON / Incremental, because it is significantly less memory-hungry. The pattern above doesn’t work anymore in that case, since it’ll treat each entity as its own separate page. So though it’s possible, if you get a 1.000 entities, you’ll save a 1.000 times the datetime to storage. And if processing takes a long while, you’ll save a datetime that is significantly ‘wrong’.

So, my question is: what is your suggested best-practice on this?

floris · January 16, 2024, 4:03pm

Hey @Gugi ,

Could you check out this topic? I’m wondering what the best-practice would be for this very common pattern.

Gugi · January 17, 2024, 9:45am

Hi @floris

We apologize for missing out the topic.

I think your approach is what we would suggest as a best practice. As you already know that it will write the datetime to the storage entity as many as the number of entities subscribed.

However, if each entity has a last modified datetime property, we would suggest saving the last modified datetime instead of saving relative datetime (1 minute ago or a few seconds ago). It also depends on whether we can filter the entities from the REST API endpoint by adding last modified datetime filter (greater than the saved datetime). This approach will avoid the invalid date time filter for the next run, as it doesn’t depend on processing time anymore.

Feel free to let us know if I missed out on anything.

floris · January 17, 2024, 3:49pm

Hi Gugi,

Saving the relative date seems a good option, yes!

I’m wondering, is the - possibly thousands of times - saving into a storage an issue?

Gugi · January 18, 2024, 7:58am

Hi Floris,

It could be time-consuming. I tried saving a value of separate 8,000 entities, and it took 9 seconds to finish. Of course, doing it thousands of times will take much longer. It also depends on how good the specification of the server. Would you please let us know whether you have any feedback for us about it?

floris · January 18, 2024, 9:47am

Hi @Gugi ,

I think the best solution would be - but I’m not sure this is architecturally possible - a sort of ‘wrap around’ transformer which doesn’t interact with the body data anymore (since that would take a lot of memory etc.).

So the current process is:

Call system (once)
Paginate on pages (multiple times)
Branch to entities (thousands of times)

It would be great if we have an extra “setting” for the HTTP transformer that allows us to write a date-timestamp to a storage (note: it should be the datetimestamp of the first call to the system), but only write it once after all processes (so just before the process ‘finishes’, to make sure you didn’t get any errors).

I can’t be sure, but I think on a process-level this should be possible. Especially if you don’t read from the data.

Gugi · January 18, 2024, 2:32pm

Hi Floris,

We appreciate your valuable feedback.

Could you please confirm whether I understood it correctly? You need to subscribe all the entities, and branch to them (automatically due to Incremental read method), but only write timestamp if it’s the last entity or just before the whole process finishes.

Please let me know if I missed out on anything.

floris · January 22, 2024, 9:41am

Hi Gugi,

Yep, that’s it.

Gugi · January 22, 2024, 10:08am

Hi Floris,

Thank you for confirming. As you may already know, that is not something that we already support at the moment. Therefore, we would like to pass this on to the team to see whether it can be implemented in the future.

We will get back to you once we have an update.