Task Deduplication

floris · January 16, 2024, 8:52am

Hi Alumio,

I’m getting more-and-more usecases where the body of a task is really just a placeholder saying “this particular entity has to be updated”. This generally happens when the actual content of a task is too big (and you have to place the update inside a storage or something) or when the ‘update trigger’ (i.e. a particular system saying it has been updated) doesn’t actually give you the entirety of the particular entity you’re working with.

For those use-cases, a ‘task deduplication’ option on the route would be great. Basicly, before creating a task, it would check within that route to see if there’s a new task with that exact entity-identifier and it would skip it.

Gugi · January 18, 2024, 2:43pm

Hi Floris,

Thank you for your feedback.

It seems that kind of functionality is already covered by “Filter previously stored entities” or the entity filter “Filter by storage entities”, right? Or, do you need a more handy option within the route to do so?

floris · January 22, 2024, 9:46am

Hi Gugi,

Let’s say you have a “customer” entity. This customer entity needs to be sent fully to the receiving system, however, you get the actual data to build it from a few different sources:

An ‘Addresses’ endpoint
A ‘Contacts’ endpoint
A ‘Customer Group’ endpoint
A ‘Company’ endpoint

Each has their own updated_at timestamp, and adding, for example, a Contact to the Company will not ‘update’ the Company itself.

So to make this work, you can poll each of these endpoints to check for changes, and if you find a change, you get the Company Number, save it to a storage (= your queue). Add in an incoming which reads that particular storage and creates tasks into a full “update customer/company entity route”.

The “Filter previously stored entities” will not really work in this case. Because you don’t actually have the full entity before running the route.

Gugi · January 23, 2024, 10:06am

Hi Floris,

Thank you for the thorough explanation.

Our existing entity filter “Filter by storage entities” can filter out entities or skip tasks whose identifier already exists in a storage. Please find the below example.

Let’s say we already have a storage filled with entities, such as below.

Each storage entity consists of id and name properties.

You can use “Filter by storage entities” with Condition “An item with the identifier must exist in the storage” in order to filter out new entities coming in.

Even though the coming entity only has the identifier, it’s enough to let Alumio filters out the entity due to having the same identifier with the one stored in the storage.

It’s not as handy as the “Task deduplication” option in the route, but this might help you to achieve your objective with the current functionalities.

Feel free to let me know if this is not the case.

floris · January 23, 2024, 12:34pm

Hi @Gugi ,

Wherever possible, I do use comparable techniques.

The thing is, that a task is computationally expensive for Alumio. So being able to prevent a task from being created at all would be preferable.

Alumio_Bot · January 24, 2024, 10:41am

@Gugi I do like this feature.
With it, you don’t need to setup any storages.

Would more so be usefull with overlapping batch syncs (xml and the like).

Gugi · January 25, 2024, 7:25am

Hi @floris You can put the entity filter in incoming configuration or route so that no task will be created.

@Alumio_Bot Do you mean that the identifier would be compared to the existing tasks, or we would need an “invisible storage” to log all the processed identifiers?

floris · January 25, 2024, 8:47am

Hi @Gugi ,

In many different routes, you don’t actually have the entire entity in the incoming. The entity is basically constructed inside of a task. So that wouldn’t work.

Gugi · January 25, 2024, 11:47am

Hi @floris,

The entity filter only needs the identifier and don’t need the entire entity to compare, like I mentioned before.

floris · January 25, 2024, 12:00pm

Hi @Gugi ,

But that’s not an option for the process as I described it.

You could, I suppose, create a ‘save to storage’ where you only place the entity identifier. That storage would then save all “active tasks”. And within the Outgoing you’d delete that entity from the storage. But then what happens when the Outgoing fails? The entity would then never be re-synced! (Note that you could somewhat fix this by utilizing TTLs etc.)

So no, that’s not an option for the situation I’m mentioning.

Gugi · November 26, 2024, 4:38pm

Hi @floris,

We apologize for the delay in responding to your question—it was missed due to an oversight. Here’s the answer to your last reply:

It could be possible by caching the entity identifier in some kind of invisible storage. However, it might increase over time, and it would take more time to query whether an entity identifier was ever created in the past. But I’ll pass your question on to the corresponding team to check further on the possibility. We will let you know once we have an update.

We’ve since improved our system to ensure all questions are addressed promptly in the future. Please let us know if you have any other questions!

Gugi · November 27, 2024, 8:23am

Hi @floris

We’ve carefully reviewed it, and unfortunately, it’s not something we’re planning to build at this time.

While we won’t be implementing this feature, we truly value your feedback and encourage you to share any other ideas you have. Your input helps shape our future improvements.

Thank you for your understanding and for being a valued part of Alumio.

TMourikSH · January 7, 2025, 8:17am

We completely agree that this feature would be a valuable addition to the application. It’s something we would really like to see implemented, as it currently feels like a significant gap in the functionality.

Gugi · January 8, 2025, 11:40am

Hi @TMourikSH

Thank you for sharing your thoughts. We understand why this feature feels important and appreciate your perspective.

While we agree it could add value, implementing it would require significant time and resources, which makes it unfeasible for our current roadmap. Our focus is on improvements that can benefit the majority of users in a timely manner.

We value your feedback and will keep it in mind as we evolve the platform. Please feel free to share any other ideas or questions.