This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

High CPU usage since 2:20 this night

Hello,

I already contacted Sophos Support and now I am waiting for the callback from the senior engineers.

However, I also don't find it wrong to ask here.

I am having 100% usage if I enable the internet connection. We are using a LTE modem (modem, not router). While the connection is started, the whole GUI is extremely laggy, takes sometimes 1-2 Minutes to switch between pages. And basically only disabling the WAN interface and the webadmin interface is almost instantly responsive. 100% CPU usage remains a while, and it also goes down by itself after a while.

Now, I called my ISP, and asked them if there are some issues known, and they told me they "see something, but can't tell me exactly what". And told me basically to wait till tomorrow and see if it's better.

I am also ruling out a firewall overload. We have around 10-15 SSL remote access users, a site to site and RED. Firewall usage is usually between 30-50%. Logs reflect that too.

Sophos Support said it might be that, but it also might be hardware. Even maybe something else. They are now consulting with senior engineers.

Is there something I can do on the firewall to ascertain the cause of the issue?

I already checked top and atop, and there are only weird entries like USER "nobody" and command "HTTPD". Those take 10% and more, and there are more than one. Here are screenshots of those.

Can you make something of this?

Thank you



This thread was automatically locked due to age.
Parents
  • It was terrible to troubleshoot since the firewall was so unresponsive, but I finally managed to spot the cause:

    It apparently is N-Central that we use for management of some of our customers. We have it running over the custom port (call it xxxx).

    Currently it is configured via WAF, since I am using a LE certificate there. As soon as I enable the virtual server for the port where agents are, things start to go south.

    It also explains why it looked like it was a LTE connection, because duh, NC is going over that LTE link. So are many other things.

    I spotted in Interfaces & Routing, as I've seen a concurrent connections today, rising from about 21:50 yesterday evening, nominal being about 800 connections to 9800 at the peak.

    Now, the weird thing is, it doesn't matter if NC is running or not, as soon as I enable the WAF entry, the firewall CPU goes up and it starts being sluggish. It is also by far the highest traffic in last 1 day, and with highest number of packets, oh some 28 million of them.

    This does sound very suspicious to me.

    I currently stopped NC and WAF completely until I can ascertain what is really going on. I can enable it to test, but I am reluctant at what to do next.

    First and foremost, I'd like to be able to surely ascertain whether this is some kind of system error, on Solarwinds / N-Central side, or did we get hacked?

    I see it according to graph that the connection count started going up at about 22:15 yesterday evening.

    So, I looked under Network Usage. Top clients by service, port reported above as 28 mil.

    I am seeing some differences. While my internal servers, which are also monitored internally, had for instance about 500-5000 connections daily. However, yesterday and today, those numbers climbed. I see a consistent number of computers connecting, mostly 65-70. All connections are according to Sophos from Austria, so I am seeing this as a positive sign.

    Nevertheless, the number of connections per client (agent) has climbed a LOT. From usual 500-2000, to 5000-17000 per client, a total of 2,6 mil according to this list.

    So apparently "Conn" ist not the same as "Packet". But comparatively, the received packets are 10x more.

    So I on the right path when it comes to troubleshooting? As far as I can see, Port xxxx is only being accessed by austrian IPs and internal clients. If I open the firewall, the page is barely able to keep responsive due to massive number of packets on the xxxx, now being blocked, since I turned off the WAF.

Reply
  • It was terrible to troubleshoot since the firewall was so unresponsive, but I finally managed to spot the cause:

    It apparently is N-Central that we use for management of some of our customers. We have it running over the custom port (call it xxxx).

    Currently it is configured via WAF, since I am using a LE certificate there. As soon as I enable the virtual server for the port where agents are, things start to go south.

    It also explains why it looked like it was a LTE connection, because duh, NC is going over that LTE link. So are many other things.

    I spotted in Interfaces & Routing, as I've seen a concurrent connections today, rising from about 21:50 yesterday evening, nominal being about 800 connections to 9800 at the peak.

    Now, the weird thing is, it doesn't matter if NC is running or not, as soon as I enable the WAF entry, the firewall CPU goes up and it starts being sluggish. It is also by far the highest traffic in last 1 day, and with highest number of packets, oh some 28 million of them.

    This does sound very suspicious to me.

    I currently stopped NC and WAF completely until I can ascertain what is really going on. I can enable it to test, but I am reluctant at what to do next.

    First and foremost, I'd like to be able to surely ascertain whether this is some kind of system error, on Solarwinds / N-Central side, or did we get hacked?

    I see it according to graph that the connection count started going up at about 22:15 yesterday evening.

    So, I looked under Network Usage. Top clients by service, port reported above as 28 mil.

    I am seeing some differences. While my internal servers, which are also monitored internally, had for instance about 500-5000 connections daily. However, yesterday and today, those numbers climbed. I see a consistent number of computers connecting, mostly 65-70. All connections are according to Sophos from Austria, so I am seeing this as a positive sign.

    Nevertheless, the number of connections per client (agent) has climbed a LOT. From usual 500-2000, to 5000-17000 per client, a total of 2,6 mil according to this list.

    So apparently "Conn" ist not the same as "Packet". But comparatively, the received packets are 10x more.

    So I on the right path when it comes to troubleshooting? As far as I can see, Port xxxx is only being accessed by austrian IPs and internal clients. If I open the firewall, the page is barely able to keep responsive due to massive number of packets on the xxxx, now being blocked, since I turned off the WAF.

Children
  • My next question was if you were using WAF, because of HTTPD items you see in your screenshots.  Have you checked the logs for WAF?  Can you post a snippet of them here?  That's a lot of connections, but I don't believe its necessarily hacking going on.  It could be some type of attack on that port (hopefully you aren't using a common port that is notorious for vulnerabilities).  

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • I actually tried checking the WAF logs. All WAF virtual servers are currently offline.

    Today's WAF log is 75MB! Not sure which part you would want me to post or what I would be looking for in there.

    I am testing right now only port forwarding, without WAF. I can use PF for that port, since it doesn't necessarily require certificate.

    No, NC agents are default at port 443, I changed that to my own port. However, some things remain at 443.

  • Small update: instead using WAF, I set port forwarding only for that agent port now, and the firewall is OK now. I see high number of connections still, but the firewall isn't peaking.

    I still don't understand though why I suddenly have such high number. I opened an emergency ticket with N-Able.