Hi,
We continuously query Sophos Central API endpoints for alerts, events on a 5 minute basis.
We've noticed that there are missing logs every day.
The next cursor that is returned in every response is used in the next request to query for new data.
But at the end of the day, when i query for events for the entire day using start date instead of next cursor, all of the logs seem to be there.
So, it seems a case of events sneaking in after the response is returned by the API somehow.
Could someone please let us know if anything is missing in the way we query for data ?
Thank you
Ravi
The API is differential. If you have gotten a record down in a previous request it won't return that record again.
Are you seeing only new data in the 5 min intervals and the 'missing' data is the…
Hi Ravi,
Can you show me the request you are making? Specifically how are you handling the pagination of the return?
RichardP
Snr. New Product Introduction Engineer | CISSP | Sophos Technical SupportSupport Videos | Product Documentation | @SophosSupport | Sign up for SMS AlertsIf a post solves your question use the 'Verify Answer' link.
Hi Richard,
I've put together the below code to make it simpler. The initial request does not contain the next_cursor or rather an invalid cursor value.
From the response, the next cursor is stored in the db and re-used in the next request and so on. Hope that explains.
import requests url = 'https://api3.central.sophos.com/gateway/siem/v1/events' def get_header(): return {'Content-Type': 'application/json; charset=utf-8', 'Accept': 'application/json', 'X-Locale': 'en', 'Authorization': 'Basic xxxxxx', 'x-api-key': 'xxxxxx', } def get_event_response(event_next_cursor): response = requests.get(url, headers=get_header(), params={'cursor': event_next_cursor}, timeout=120) return response.json() def get_events(event_next_cursor=None): log_count = 0 has_more_flag = True event_next_cursor = event_next_cursor while has_more_flag: event_response_json = get_event_response(event_next_cursor) events = event_response_json.get('items', list()) event_next_cursor = event_response_json.get('next_cursor', '') log_count += len(events) # post processing on the events is done here # store event cursor in the database # update_latest_event_cursor(event_next_cursor) has_more_flag = event_response_json.get('has_more', False) return log_count get_events()
So, you are using the old SIEM Api that we are replacing.
Have you tried out the script we provide to make sure the data set actually exists?
If you want to see the new API the documentation is here: https://developer.sophos.com/
The data actually exists when you query for 6-7 hours in one go. But we query the API endpoint every 5 mins and this approach seems to be not working i.e., missing logs as a result.
Are you seeing only new data in the 5 min intervals and the 'missing' data is the stuff from the previous requests?
Thank you for that Richard.
I understand that logs received in the previous request are not sent again in the next request. What is happening is illustrated in the below example.
Current time : 2020-12-11 08:00
Request 1 (fromDate is 2020-12-11 08:00)-> Response 1 -> 100 logs
Request 2 (use nextCursor from the previous response)-> Response 2 -> 50 logs
and so on until Request 100.
We were then informed that some logs are missing when compared with a dump of logs from Sophos Admin area.
So, I again made a request with the date from the first request. i.e.,
Current time: 2020-12-11 15:00
Request 101 with fromDate 2020-12-11 08:00 and this time I see more logs than all of the logs received over the previous 100 requests i.e., if the first 100 requests had 1000 logs for example, then, request 101 had 1050 that include the 1000 + 50 missing ones.
What I'm trying to say is when you query every 5 mins for an entire day, at the end of the day, there are missing logs.
But if you dont query every 5 mins, but instead query at the end of the day for the entire day's logs, you do receive all the logs including the missing ones i.e., the ones that are not previously seen when queried every 5 mins.
I am looking into this for you and will reply when I have more information.
Thank you. Could you let us know if there is any update on this yet ? Could you also confirm when the support for the old API that we are currently using will end ?
Not at this time. We are also into the holiday so I don't expect any update until the new year.
Hi Richard. Any update on this yet please ?
Not as of yet, I will chase for you.
Yes please. Thank you
Hi Richard. Any progress on this yet please ?
Not at this time. Apologies.
FYI: the development team is looking at this.
I got the following statement back from development:
Variance in events data retrieved stems from the different time recorded for the event generation vs event ingestion. For example, an endpoint could generate an event, but not send it due to various scenarios (endpoint is offline, bandwidth, etc.). Events are retrieved based on the ingestion time both for UI and API, thus an event generated earlier may appear later in the list.
Hi Richard. Could you confirm this ? when has_more is False in a response, that cursor needs to be used again and again until the log count touches 200 which is the default number of logs sent. At the same time, there is the next cursor that needs to be used as well. Does this mean that we have to store the previous cursor which has less than 200 logs and the next cursor ?
Currently, where there is a nextCursor value supplied, we discard/ignore has_more value and always use the nextCursor to get the next batch of logs. But it seems to be as long as the cursor's log count is not 200, you cannot discard it. Is this right ?
Hi Richard. Could you also let us know why we are not able to see the Sophos Intercept X Logs at all using the above API ?