Sophos NDR "bootlooping"

Hello Community,

i have two NDR VMs active at two locations.

Now one of them works just fine, capturing packets from our network switches and uploading them to Sophos Central.

The other one also captures packets just fine, but doesn't want to start after a reboot.

Booting into the Ubuntu OS work fine but the VA fails to start, showing the startup checklist, trying to reinstall the Machine Learning and NDR detection Pod and jumping back to the Ubuntu startup screen, only to get back to the startup checklist and doing it all over again.

I left it running for 30 minutes but it keeps running in circles.

This already happened with a previous iteration of the VM, after which i deleted and rebuild the NDR integration for that site.

I doubt its a Hardware problem, as both sites' VMs are running on the same Hardware configuration on VMware ESXi 6.7.0 Hosts

Parents
  • Hi Thorben,

    Thanks for reaching out to the Sophos Community Forum.  I've moved your post over to the NDR Community channel as it is a better fit for your question. 

    I suggest checking the logs at "podLogs/cloud-agent-<container id>.txt ".  Let me know if there are any authentication errors present. 

    You can also try verifying that the following protocols are not being blocked on the system/network.
    - DHCP (if this was selected in Central when creating the VM) 
    - NTP 
    - DNS 
    - HTTP 
    - HTTPS 

    Please also ensure that web filtering is not being performed on the following domains.
    a. sophos.com
    b. archive.ubuntu.com
    c. ntp.ubuntu.com
    d. baltocdn.com
    e. sophossecops.jfrog.io
    f. docker.io
    g. amazon.com, amazonaws.com

    Kushal Lakhan
    Team Lead, Global Community Support
    Connect with Sophos Support, get alerted, and be informed.
    If a post solves your question, please use the "Verify Answer" button.
    The New Home of Sophos Support Videos!  Visit Sophos Techvids
  • Hello Kushal,

    thanks for pointing me in the right direction. I checked the firewall rule for the outbound access of the NDR machine, turns out i hadn't allowed all the domains needed for Sophos Central connection, as per https://docs.sophos.com/central/customer/help/en-us/PeopleAndDevices/ProtectDevices/DomainsPorts/index.html

    All of the domains (*.sophos.com and *.amazonaws.com) and Ports (80, 443) are now allowed for the NDR VM and it boots up normally again.

    What i notice now is, that after starting up the NDR doesn't seem to capture any packets. It only does when i set the firewall rule to allow ports 80 and 443 to "any".

    I tried logging the traffic that was running on that rule while access was limited to *.sophos.com and *.amazonaws.com, and i can see packets successfully going out to AWS servers and none getting blocked.

    Are there any other FQDN ranges i have to allow, is it something with the NAT (i doubt it since traffic using "any" as a target should use the same rule). I want to narrow down the FW rule as much as possible.

    Kind regards

    Thorben

    Update: it just started counting packets, about 10-15 minutes after bootup

    Update 2: after rebooting again it immediately began counting packets, might the counter not growing just be a case of the counter initializing later?

Reply
  • Hello Kushal,

    thanks for pointing me in the right direction. I checked the firewall rule for the outbound access of the NDR machine, turns out i hadn't allowed all the domains needed for Sophos Central connection, as per https://docs.sophos.com/central/customer/help/en-us/PeopleAndDevices/ProtectDevices/DomainsPorts/index.html

    All of the domains (*.sophos.com and *.amazonaws.com) and Ports (80, 443) are now allowed for the NDR VM and it boots up normally again.

    What i notice now is, that after starting up the NDR doesn't seem to capture any packets. It only does when i set the firewall rule to allow ports 80 and 443 to "any".

    I tried logging the traffic that was running on that rule while access was limited to *.sophos.com and *.amazonaws.com, and i can see packets successfully going out to AWS servers and none getting blocked.

    Are there any other FQDN ranges i have to allow, is it something with the NAT (i doubt it since traffic using "any" as a target should use the same rule). I want to narrow down the FW rule as much as possible.

    Kind regards

    Thorben

    Update: it just started counting packets, about 10-15 minutes after bootup

    Update 2: after rebooting again it immediately began counting packets, might the counter not growing just be a case of the counter initializing later?

Children