Missing Logs from the API endpoint

Hi,

We continuously query Sophos Central API endpoints for alerts, events on a 5 minute basis.

We've noticed that there are missing logs every day.

The next cursor that is returned in every response is used in the next request to query for new data.

But at the end of the day, when i query for events for the entire day using start date instead of next cursor, all of the logs seem to be there.

So, it seems a case of events sneaking in after the response is returned by the API somehow.

Could someone please let us know if anything is missing in the way we query for data ?

Thank you

Ravi

Parents
  • Hi Ravi,

    Can you show me the request you are making? Specifically how are you handling the pagination of the return?

    RichardP

    Program Manager, Support Readiness | CISSP | Sophos Technical Support
    Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • Hi Richard,

    I've put together the below code to make it simpler. The initial request does not contain the next_cursor or rather an invalid cursor value.

    From the response, the next cursor is stored in the db and re-used in the next request and so on. Hope that explains.

    import requests
    
    url = 'https://api3.central.sophos.com/gateway/siem/v1/events'
    
    
    def get_header():
        return {'Content-Type': 'application/json; charset=utf-8',
                'Accept': 'application/json',
                'X-Locale': 'en',
                'Authorization': 'Basic xxxxxx',
                'x-api-key': 'xxxxxx',
                }
    
    
    def get_event_response(event_next_cursor):
        response = requests.get(url, headers=get_header(), params={'cursor': event_next_cursor}, timeout=120)
        return response.json()
    
    
    def get_events(event_next_cursor=None):
        log_count = 0
        has_more_flag = True
        event_next_cursor = event_next_cursor
    
        while has_more_flag:
            event_response_json = get_event_response(event_next_cursor)
            events = event_response_json.get('items', list())
            event_next_cursor = event_response_json.get('next_cursor', '')
            log_count += len(events)
    
            # post processing on the events is done here
            
            # store event cursor in the database
            # update_latest_event_cursor(event_next_cursor)
    
            has_more_flag = event_response_json.get('has_more', False)
    
        return log_count
    
    
    get_events()

  • So, you are using the old SIEM Api that we are replacing. 

    Have you tried out the script we provide to make sure the data set actually exists?

    If you want to see the new API the documentation is here: https://developer.sophos.com/

    RichardP

    Program Manager, Support Readiness | CISSP | Sophos Technical Support
    Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • The data actually exists when you query for 6-7 hours in one go. But we query the API endpoint every 5 mins and this approach seems to be not working i.e., missing logs as a result.

  • The API is differential. If you have gotten a record down in a previous request it won't return that record again.

    Are you seeing only new data in the 5 min intervals and the 'missing' data is the stuff from the previous requests?

    RichardP

    Program Manager, Support Readiness | CISSP | Sophos Technical Support
    Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • Thank you for that Richard.

    I understand that logs received in the previous request are not sent again in the next request. What is happening is illustrated in the below example.

    Current time : 2020-12-11 08:00

    Request 1 (fromDate is 2020-12-11 08:00)-> Response 1 -> 100 logs

    Request 2 (use nextCursor from the previous response)-> Response 2 -> 50 logs

    and so on until Request 100.

    We were then informed that some logs are missing when compared with a dump of logs from Sophos Admin area.

    So, I again made a request with the date from the first request. i.e., 

    Current time: 2020-12-11 15:00

    Request 101 with fromDate 2020-12-11 08:00 and this time I see more logs than all of the logs received over the previous 100 requests i.e., if the first 100 requests had 1000 logs for example, then, request 101 had 1050 that include the 1000 + 50 missing ones.

    What I'm trying to say is when you query every 5 mins for an entire day, at the end of the day, there are missing logs.

    But if you dont query every 5 mins, but instead query at the end of the day for the entire day's logs, you do receive all the logs including the missing ones i.e., the ones that are not previously seen when queried every 5 mins.

Reply
  • Thank you for that Richard.

    I understand that logs received in the previous request are not sent again in the next request. What is happening is illustrated in the below example.

    Current time : 2020-12-11 08:00

    Request 1 (fromDate is 2020-12-11 08:00)-> Response 1 -> 100 logs

    Request 2 (use nextCursor from the previous response)-> Response 2 -> 50 logs

    and so on until Request 100.

    We were then informed that some logs are missing when compared with a dump of logs from Sophos Admin area.

    So, I again made a request with the date from the first request. i.e., 

    Current time: 2020-12-11 15:00

    Request 101 with fromDate 2020-12-11 08:00 and this time I see more logs than all of the logs received over the previous 100 requests i.e., if the first 100 requests had 1000 logs for example, then, request 101 had 1050 that include the 1000 + 50 missing ones.

    What I'm trying to say is when you query every 5 mins for an entire day, at the end of the day, there are missing logs.

    But if you dont query every 5 mins, but instead query at the end of the day for the entire day's logs, you do receive all the logs including the missing ones i.e., the ones that are not previously seen when queried every 5 mins.

Children