Retrieving a large number of items from HTTP in a transformer

o.plokha · July 2, 2025, 1:10pm

Introduction

In some data routes, it may be needed to fetch a large number of items from HTTP in a transformer. For example, when working Microsoft Dynamics, a queue can be used. The first step is to retrieve a queue item which contains a URL for the actual work items. In Alumio, such a scenario can be implemented by configuring an incoming to get the queue item and a transformer to fetch the work items.

In this guide, a Microsoft Dynamics work queue will be used to demonstrate a possible implementation, but it also applies to similar scenarios.

Steps to be implemented

Retrieving a queue item

A queue item is fetched from Microsoft Dynamics by calling /api/connector/dequeue/ API endpoint. The queue item contains a download locations such as:

{
  "DownloadLocation":"https://<client>.dynamics.com:443/api/connector/download/12345678-1234-1234-1234567890ab"
}

Retrieving a ZIP with XML files

The URL in the download location is fetched. It contains a large ZIP file with XML files. Each XML file contains a large number of items.

For example, an XML file can look as follows:

<Document>
   <CUSTCUSTOMERV3ENTITY>
       ...
   </CUSTCUSTOMERV3ENTITY>
   <CUSTCUSTOMERV3ENTITY>
       ...
   </CUSTCUSTOMERV3ENTITY>
</Document>

Each CUSTCUSTOMERV3ENTITY tag contains customer data.

Reading the ZIP and parsing the XML files

The ZIP file is read and the XML files in it are parsed. Each item from the XML files is processed separately to limit the amount of used memory.

Retrieving the queue item

An HTTP subscriber can be used to fetch the queue item.

The response is parsed as JSON. Because the response is just a small JSON file we can choose “Whole file” as the read method.

Retrieving a ZIP with XML files

The ZIP in the download location is retrieved in a transformer step. The transformer step will receive the contents of the queue item that was retrieved in the subscriber step. A placeholder can be used to pass the download location as the URL to the HTTP transformer.

To be able to process items from the response separately, it is required to select “Split up: Get items from HTTP request“. This transformer is different from “HTTP transformer” because it can return parsed items one by one.

Reading the ZIP and parsing the XML files

To read the ZIP the “Archive (zip, tar, gz, bz2)” decoder is configured. This decoder will open the ZIP file and get a list of files. Multiple file processors can be configured to specify which files should be processed and which parser should be used.

For the patterns *.xml is used to process all XML files in the ZIP. For the parser XML is chosen and “Incremental (for large files)” is chosen for the read method. With this read method the XML file will be read incrementally and the parser will look for open and close tags in the XML to return items one-by-one. The path is configured as Document (the name of the root node in the XML) and CUSTCUSTOMERV3ENTITY (the tag for each item in the XML).

The result of this transformer will be a list customer items. Only one customer item will be loaded in memory at a time.

Retrieving a large number of items from HTTP in a transformer

Table of contents

Introduction

Steps to be implemented

Retrieving a queue item

Retrieving a ZIP with XML files

Reading the ZIP and parsing the XML files

Retrieving the queue item

Retrieving a ZIP with XML files

Reading the ZIP and parsing the XML files