This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Weird UTM freezes randomly approximately once a day ...

I have experienced a strange lockup on my "new" UTM box, but I checked log files and they don't reveal anything, just a bunch of weird characters ...

2023:03:16-01:32:01 escape75 /usr/sbin/cron[25494]: (root) CMD (  nice -n19 /usr/local/bin/gen_inline_reporting_data.plx)
2023:03:16-01:35:01 escape75 /usr/sbin/cron[25649]: (root) CMD (   /usr/local/bin/reporter/system-reporter.pl)
�����������������������������������������������������������������������������������������������������������
2023:03:16-09:03:10 escape75 syslog-ng[4942]: syslog-ng starting up; version='3.4.7' 2023:03:16-09:03:12 escape75 ddclient[5361]: WARNING: cannot connect to checkip.dyndns.org:80 socket: IO::Socket::INET: Bad hostname 'checkip.dyndns.org' 2023:03:16-09:03:24 escape75 system: System was restarted



So,- I've been running the software version of UTM (9.714) on my old unit (an XG115 r2) for a couple of years without any issues,
and recently I have migrated my saved config over to a new unit (XG115 r3) and a few hours after setting up the new unit (at night)

it froze up, and interfaces were not pingable (LAN) so I powered it down and rebooted. It's working again ...

Just wondering if there's something more I can look at to see what the issue was .. I have a hunch maybe it was DHCP related,
as my devices on the LAN were renewing the IP addresses and they were not in the table on the new unit, but it's a wild guess,
so if this doesn't happen again then maybe it's nothing to worry about.

I don't know if there would be an issue moving the config file (and license) from the old unit, but I wouldn't think so.

The new unit was installed the same way as the old unit, using the ssi-9.714-4.1.iso file and removing the /etc/asg with a software license,
and the old unit hasn't experienced any weird issues in years, and the ethernet ports and devices are setup in an identical way, nothing changed.

Just looking for thoughts and ideas ...

Stats from top:

top - 11:32:20 up 2:31, 1 user, load average: 0.09, 0.29, 0.25
Tasks: 163 total, 1 running, 160 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.6%us, 0.5%sy, 0.0%ni, 98.5%id, 0.1%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 3898468k total, 3558768k used, 339700k free, 111124k buffers
Swap: 4194300k total, 112k used, 4194188k free, 1352808k cached

Zombies:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18256 0.0 0.0 0 0 ? Z 11:30 0:00 [aua.bin] <defunct>
root 18595 0.6 0.0 0 0 ? Z 11:32 0:00 [confd.plx] <defunct>



This thread was automatically locked due to age.
Parents
  • Hello there!

    Good day and thanks for reaching out to Sophos Community and hope you are well. 

    Wanted to check if this freeze happens often at night? or just a single occurence? 

    If ths happens often I may recommend you to open a support ticket to have this further checked, if just a single occurence we may want to observe and see if this would persist in the future. 

    Have a nice day ahead and thank you for choosing Sophos

    Cheers,

    Raphael Alganes
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

Reply
  • Hello there!

    Good day and thanks for reaching out to Sophos Community and hope you are well. 

    Wanted to check if this freeze happens often at night? or just a single occurence? 

    If ths happens often I may recommend you to open a support ticket to have this further checked, if just a single occurence we may want to observe and see if this would persist in the future. 

    Have a nice day ahead and thank you for choosing Sophos

    Cheers,

    Raphael Alganes
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

Children
  • I have just had another freeze but during the day, again logs show no strange information that I could find ...

    This was not happening with XG115 r2, and only with XG115 r3, and they are running the exact same configuration.
    The unit wasn't very active at the moment it happened, in fact there was no major downloads/uploads.

    It again did not respond to a single push of the power button to initiate a power down, I had to hold the button in.

    I'm wondering if it's somehow related to using configuration from the r2 unit.

    I have in the meantime run disk check via smartctl and it passed, badblocks did not find any issues as well.
    I have also performed a postgresql rebuild just in case,- after doing all that the unit still locked up ...

    Thank you!

  • This reply was deleted.
  • I could re-install using the asg-9.714-4.1.iso instead of ssi-9.714-4.1.iso and removing the asg file,
    but I'm guessing the installation would be identical at least as far as this problem is concerned ...

    I have also just updated to 9.715-3 ...

    Next step would be to re-create my configuration from scratch and run on a more basic config to see if that helps,
    unless there's some further debugging that could be achieved by tech support personnel Slight smile

  • I'd run a disk and memory diag on the hardware before reinstalling again.

  • I have temporarily installed SFOS HW Firmware 19.x SF300, and run both disk and memory checks from sfloader.
    Both tests showed no errors so I think we can rule out any hardware related issues.

    I have now loaded the software version of UTM (asg instead of ssi) and running on my previous config to see what happens ...

  • Unfortunately the reload using asg-9.714-4.1.iso instead of ssi-9.714-4.1.iso did not solve the freezing, I used the 64 bit kernel.

    I have performed a disk check and memory check and it passes without any error.
    When the unit is frozen, you can't ping the LAN interfaces, and USB keyboard doesn't respond in order to login.
    The freezing seems to happen randomly, it looks like it happens every day ...


    I am including all fail, error, invalid messages from boot log, I do not know if these are useful at all:

    2023:03:19-14:09:12 escape75 kernel: [    0.000000] ACPI BIOS Warning (bug): FADT (revision 6) is longer than ACPI 5.0 version, truncating length 276 to 268 (20130725/tbfadt-311)
    2023:03:19-14:09:12 escape75 kernel: [    0.000000] ACPI Error: Gpe0Block - 32-bit FADT register is too long (32 bytes, 256 bits) to convert to GAS struct - 255 bits max, truncating (20130725/tbfadt-202)

    2023:03:19-14:09:12 escape75 kernel: [    1.785222] TCP: TFO aes cipher alloc error: -2

    2023:03:19-14:09:12 escape75 kernel: [    5.318546] i801_smbus 0000:00:1f.1: can't derive routing for PCI INT A
    2023:03:19-14:09:12 escape75 kernel: [    5.318554] i801_smbus 0000:00:1f.1: PCI INT A: no GSI
    2023:03:19-14:09:12 escape75 kernel: [    5.318591] i801_smbus 0000:00:1f.1: Failed to allocate irq 255: -22
    2023:03:19-14:09:12 escape75 kernel: [    5.318598] i801_smbus: probe of 0000:00:1f.1 failed with error -22

    2023:03:19-14:09:12 escape75 kernel: [    0.004000] tsc: Fast TSC calibration failed
    2023:03:19-14:09:12 escape75 kernel: [    0.012000] tsc: PIT calibration matches HPET. 1 loops
    2023:03:19-14:09:12 escape75 kernel: [    0.012000] tsc: Detected 1592.856 MHz processor

    2023:03:19-14:09:12 escape75 kernel: [    0.199653] acpi PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM
    2023:03:19-14:09:12 escape75 kernel: [    0.199661] acpi PNP0A08:00: Unable to request _OSC control (_OSC support mask: 0x08)

    2023:03:19-20:09:37 escape75 [daemon:notice] rrdcached[3786]:  queue_thread_main: rrd_update_r (/var/log/reporting/rrd/ips.rrd) failed with status -1. (/var/log/reporting/rrd/ips.rrd: illegal attempt to update using time 1679255404 when last update time is 1679256005 (minimum one second step))
    

    2023:03:19-14:09:12 escape75 kernel: [ 0.199505] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) 2023:03:19-14:09:12 escape75 kernel: [ 0.199579] \_SB_.PCI0:_OSC invalid UUID 2023:03:19-14:09:12 escape75 kernel: [ 0.199582] _OSC request data:1 8 0 2023:03:19-14:09:12 escape75 kernel: [ 0.199643] \_SB_.PCI0:_OSC invalid UUID

  • Are you able to install some kind of windows or linux OS to the device?  Maybe run something like prime95 or equiv to load it up.  Could be cooling issues (or lack there of) causes instability.

  • I don't know, it's a regular XG115 R3 from Sophos ...

    I would expect that there would be some kind of troubleshooting path, as this appears caused by my config possibly ...

    I wonder if I could load SFOS HW Firmware and get a trial license and would be able to tell within a few days if it freezes ...

  • Why would you need a trial license?  Just install it and let it run without importing your config.  I would start from scratch and see how that works, if it's still freezing.  If it's not after a few days, it might be something in the config.  If it freezes, it's probably hardware related. 

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • I didn't know I could run it without a license ...