This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM CPU SPIKES AND NON-EXISTENT SOPHOS SUPPORT

I am having CPU spikes every hour between 43 and 45 minutes past the hour.  Atop shows confd.plx and mdw.plx spiking.    I have a case open with sophos support but last I heard they outsourced to India recently which is horribly disappointing.    When the CPU spikes random tunnels drop, reds drop, sslvpns drop and web interface goes unresponsive!    I can not get anyone with sophos on the phone to do any db rebuilds or anything so I am out on a wire here.  Users are getting very frustrated as am I. 

SG 430 Gen1

9.705-3



This thread was automatically locked due to age.
Parents
  • Additional....  I am running in HA Active Passive and even my HA standby node has spikes at the same time however obviously not as huge as it is just sitting there.

  • Looks like a issue with the config. 

    Check confd-debug log and try to figure out, if you see a Job starting at this timeframe. 

    Also cross reference the cronjobs, if you see a job starting. 

    Likely this can be caused by something within the configuration, which will be refreshed every hour. 

    __________________________________________________________________________________________________________________

  • Here is what is currently on my crontab.  Looked again starts at 40-43 past the hour and recovers at about 45-47 past the hour. 

  • Hi Dan - a long-time lurker and your first post - welcome to the UTM Community!

    First, you should insist on escalation of your case - your reseller should be able to do that for you.

    I agree that this is likely caused by a PostgreSQL database problem and that doing a rebuild is the next thing to try.  I suspect that one needs to rebuild both on the Master and Slave, but I haven't seen a recommendation on how to do this.  Out of an abundance of caution, my approach would be first to disable HA, thus doing a Factory Reset and shutdown of the Slave.  After that, do the rebuild on the Master and then re-enable HA.

    To re-initialize all PostgreSQL databases (deletes all graphs and data, but does not affect the logs)

    # /etc/init.d/postgresql92 rebuild

    To re-enable HA:

    1. On the current Master, on the 'Configuration' tab of 'High Availability':
         a. Enable Hot-Standby
         b. Select eth3 as the Sync NIC
         c. Configure it as Node_1
         d. Enter an encryption key (I've never found a need to remember it)
         e. Select 'Enable automatic configuration of new devices'
         f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
    2. Power up the Slave and wait for the good news. Relaxed]

    Cheers - Bob

    PS In the future, please post text, if available, instead of pictures.

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Did do the escalation...   They are still working on it but I am on a Rev1 SG430 and sadly they just realized the limit on number of RED's attached to the appliance is 70 and I am at 82 because we did not know about the limit. :-)  So ya....   That might just be it.

  • I think I'd try the rebuild before spending $25K+ to upgrade.  I would also get rid of hardware compression if it's in use with the RED tunnels.  Let us know your results.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • I will push that again but would the db have not rebuilt when we factory reset and reloaded the backup config?  Asking for a friend!  ;-) 

    Failed to mention that in the above.

  • I you did a Factory Reset of both nodes and then went through the steps above to re-enable HA, then you should be fine.  If you restored the backup to both units, you likely damaged the databases - at least on the Slave.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply
  • I you did a Factory Reset of both nodes and then went through the steps above to re-enable HA, then you should be fine.  If you restored the backup to both units, you likely damaged the databases - at least on the Slave.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Children
No Data