Scheduler: continue scheduled run after one incoming throws an error

Hi all,

I have a question regarding schedulers—specifically what happens when one of the jobs (either Run Incoming or Run Outgoing) fails and throws an error.

Our situation

  • We have a scheduler that runs every night.

  • This scheduler contains approximately 10 consecutive jobs.

  • The first job runs an incoming configuration that retrieves a large number of products (around 100,000) from an external system.

  • Unfortunately, the API of this external system is unstable. Roughly half of the time, it returns a HTTP 404 error at a random point during execution.

  • We have contacted the API provider and they are working on performance improvements, but this is not expected to be resolved in the short term.

The problem

Because the error occurs at a random moment, Alumio may have already fetched 50%, 80%, or even 99% of the products before the API fails. However, when the error occurs:

  • The entire scheduler fails.

  • None of the downstream jobs in the scheduler are executed.

  • The data that was successfully fetched before the error does not seem to be passed on.

My question

Is there a way to ensure that the products that were successfully fetched before the API returns a 404 are still processed and passed to the subsequent jobs in the scheduler?

At the moment, a partial failure causes the entire scheduler to collapse, and no data reaches the remaining jobs.

Thanks in advance for any insights!

Hi Ties,

Thank you for explaining the issue thoroughly.

The scheduler is responsible only for triggering the configured jobs according to the schedule. It has no awareness of, nor dependency on, the data being processed within those jobs.

Because of this, the scheduler cannot pass data from one job to the next. If subsequent jobs need to continue processing data from a previous run, especially when an error occurs, the data must be stored persistently (for example, in a storage). Any following job that needs to continue processing should explicitly read the required data from that persistent storage and handle it accordingly.

To better understand the setup, could you also clarify the order in which the scheduler executes the jobs? You don’t need to share the name of the incoming configuration or route if that information is sensitive.

Hi Gugi,

I first run an incoming configuration which fetches data from the source system, I store the data inside of the incoming configuration immediately.

Then I run full routes (incoming + outgoing configs) which need the data that is fetched and stored by the first incoming configuration. The problem is that the full scheduler stops if the first incoming config fails at some point.

I have currently set up a workaround by creating two seperate schedulers instead of a single one:

  • One scheduler which fetches and stores the data (this is the one that fails sometimes)
  • One scheduler which contains all other configurations

This does seem to be a solid workaround, since processing continues even if the initial incoming fails.

Thank you for the explanation.

Could you please let us know how you store the data inside the incoming configuration? In addition, did you find what caused the first incoming configuration to fail?

Anyway, it is an expected behavior for a scheduler to fail the remaining scheduled jobs if an earlier job fails.

Agree. As long as the schedulers are configured to run in sequence, there should be no problem.