Pagination issue when using Azure Data Factory ETL

TL;DR

I'm trying to query the list of tenants using the Partner API in Azure Data Factory (ADF). The pagination functionality within ADF relies on there being some sort of reference to the next page in the response to the current page, which isn't the case in this situation. If anyone has any thoughts or (even better) Sophos are able to add something into the API responses referring to the next page, I'd be really grateful.

Detailed explanation

Hi,

I am working on pulling in various bits of data into a data lake from the Sophos Central API. I am using Azure Data Factory (ADF), which is Microsoft's cloud-based ETL tool.

I am querying the partner/v1/tenants endpoint to get a list of customer tenants. This API uses the "pagination by offset" method described in the Sophos docs.

The issue I have is that this method of pagination is not supported by ADF's built-in pagination functionality. Unfortunately, all the methods of pagination within ADF rely on there being some sort of link or reference within a response to the next page, whereas the pagination method used by the Partner API relies on the caller implementing some basic logic to increment a page counter until it hits the total page count supplied.

See here for the relevant docs describing the pagination support within ADF: https://docs.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support.

I have found some discussions online regarding a convoluted workaround for such scenarios. Effectively, you implement the pagination yourself using variables and loops within ADF. See here for more details: Unable to use the Pagination Rules in REST sources in ADF (microsoft.com).

I am in the process of setting up a test of the workaround at the moment. However, it was mentioned that one significant downside of this method is that you end up creating separate JSON files within the data lake for each page, rather than a single file. These files then need put through some transformation to combine them. None of this is a show-stopper, but is definitely a lot more work - especially if the logic has to be implemented for every API call that uses the "pagination by offset" method.

It seems like this is perhaps a situation where both Sophos's API and ADF are both partly the issue - the Sophos API in the sense the pagination mechanism isn't as good as many other REST APIs, and ADF for having limited pagination support.

It's quite likely there is little option but to go with the workaround, but if anyone has any thoughts on better solutions, that would be great. I appreciate it's perhaps asking a lot, but if Sophos were able to implement a better pagination mechanism at some point, I'm sure that would help me and others in a similar situation.

Thanks in advance.



Error in post title
[edited by: mdi-db at 3:37 PM (GMT -7) on 28 May 2021]