Remove node when url already exists

Hello,

I’m trying to create a transformer to remove image url’s that already exists in Shopify. I have the following data:

"shopifyProduct": {
    "images": {
      "nodes": [
        {
          "url": "<url>/files/40785087-1.jpg?v=1729606668"
        },
        {
          "url": "<url>/files/40785087-2.jpg?v=1729606668"
        },
      ]
    },
  },
  "media": [
    {
      "originalSource": "<url>/16411820149/40785087-1.jpg"
    },
    {
      "originalSource": "<url>/16411820150/40785087-2.jpg"
    }
  ]

I want to remove data from the “media” node when the image already exists in “shopify.images.nodes”. The idea is to match on the filename (for example 40785087-1.jpg). As you can see the Shopify url contains a version get parameter. So we should ignore that.

Does anyone have an idea how to fix this in Alumio? Thanks in advance!

Hi @yoeri,

Please follow the below steps to achieve your objective.

For instance, you have the below entity:

{
  "shopifyProduct": {
    "images": {
      "nodes": [
        {
          "url": "<url>/files/40785087-1.jpg?v=1729606668"
        },
        {
          "url": "<url>/files/40785087-2.jpg?v=1729606668"
        }
      ]
    }
  },
  "media": [
    {
      "originalSource": "<url>/16411820149/40785087-1.jpg"
    },
    {
      "originalSource": "<url>/16411820150/40785087-3.jpg"
    }
  ]
}
  1. First, copy the URLs of the existing media to a separate array, i.e., existingMedia, using Copy using a pattern.

  2. Fetch only the filename within the URLs using String: File basename mapper, and remove the version parameter using String: PCRE replace and “String: Replace”.

    For your information, since the “String: PCRE replace” mapper requires us to fill in the replacement text, you can put any character that is rarely used and replace it using “String: Replace” with an empty string to remove the character.

  3. Copy the array of the formatted URLs to each object in the media array using Recursively copy values to children.

  4. Loop into the media array by using “Node, transform nodes”. If you can’t find the entity transformer, it means that you are inside a “Data, transform nodes using mappers and conditions”. Then, you can use Execute entity transformers first, and then select “Node, transform nodes”, as shown below.

  5. Within each object, create a new variable that holds the filename within the URL (originalSource) using the String: File basename mapper (point 1 in the picture below).

  6. Since each object has the list of media file names that exist in Shopify (as a result of Step 3), you can now check whether the current media file name (file) exists in the existing list (existingMedia) using JMESPath function contains (point 2 in the picture below).

    It will result in a boolean; true means the media exists in the Shopify product and needs to be removed from the media array, while false means it shouldn’t be removed.

  7. All objects in the media now have a property exists: true/false. Leave the loop and use a Value Setter and JMESPath query to filter out the ones with exists: true.

The result of the above transformers is the entity below:

{
  "shopifyProduct": {
    "images": {
      "nodes": [
        {
          "url": "<url>/files/40785087-1.jpg?v=1729606668"
        },
        {
          "url": "<url>/files/40785087-2.jpg?v=1729606668"
        }
      ]
    }
  },
  "media": [
    {
      "originalSource": "<url>/16411820150/40785087-3.jpg"
    }
  ],
  "existingMedia": [
    "40785087-1.jpg",
    "40785087-2.jpg"
  ]
}

We hope our explanation is clear to you. We also created a sample entity transformer of the above steps below.

export_20241025161824.ndjson (2.2 KB)

Feel free to let us know if you have any questions.

Hello @Gugi,

Thank you for the example transformer and the explanation of each step.

It works exactly how it should work. We only added a transformer to remove the “existingMedia” key at the end :slight_smile:

1 Like