I’m wondering about something. Generally I like to use this pattern for getting entities (doesn’t really matter what) from a REST API endpoint: Input Transformers
Value Setter to set datetime in accepted format for the REST API endpoint, setting it back for example 4 hours
Get From Storage where I get the datetime value from the storage
Response Decoder
On default, so JSON/Whole File, not inputting a pattern
Entity Transformers
Value Setter, date is now or for example 1 minute ago or a few seconds ago
Save to Storage with the new datetime
Get Branches to the entities
This pattern works great for most usecases. When the other API is unavailable for a few hours, putting the ‘save to storage’ for the new datetime on the entity transformer side, means that when the other API is unresponsive, the next run will still execute using the older date - so you’re not missing any updates.
However, I’ve recently learned to start using the Response Decoder: JSON / Incremental, because it is significantly less memory-hungry. The pattern above doesn’t work anymore in that case, since it’ll treat each entity as its own separate page. So though it’s possible, if you get a 1.000 entities, you’ll save a 1.000 times the datetime to storage. And if processing takes a long while, you’ll save a datetime that is significantly ‘wrong’.
So, my question is: what is your suggested best-practice on this?
I think your approach is what we would suggest as a best practice. As you already know that it will write the datetime to the storage entity as many as the number of entities subscribed.
However, if each entity has a last modified datetime property, we would suggest saving the last modified datetime instead of saving relative datetime (1 minute ago or a few seconds ago). It also depends on whether we can filter the entities from the REST API endpoint by adding last modified datetime filter (greater than the saved datetime). This approach will avoid the invalid date time filter for the next run, as it doesn’t depend on processing time anymore.
Feel free to let us know if I missed out on anything.
It could be time-consuming. I tried saving a value of separate 8,000 entities, and it took 9 seconds to finish. Of course, doing it thousands of times will take much longer. It also depends on how good the specification of the server. Would you please let us know whether you have any feedback for us about it?
I think the best solution would be - but I’m not sure this is architecturally possible - a sort of ‘wrap around’ transformer which doesn’t interact with the body data anymore (since that would take a lot of memory etc.).
So the current process is:
Call system (once)
Paginate on pages (multiple times)
Branch to entities (thousands of times)
It would be great if we have an extra “setting” for the HTTP transformer that allows us to write a date-timestamp to a storage (note: it should be the datetimestamp of the first call to the system), but only write it once after all processes (so just before the process ‘finishes’, to make sure you didn’t get any errors).
I can’t be sure, but I think on a process-level this should be possible. Especially if you don’t read from the data.
Could you please confirm whether I understood it correctly? You need to subscribe all the entities, and branch to them (automatically due to Incremental read method), but only write timestamp if it’s the last entity or just before the whole process finishes.
Thank you for confirming. As you may already know, that is not something that we already support at the moment. Therefore, we would like to pass this on to the team to see whether it can be implemented in the future.