This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Weird UTM freezes randomly approximately once a day ...

I have experienced a strange lockup on my "new" UTM box, but I checked log files and they don't reveal anything, just a bunch of weird characters ...

2023:03:16-01:32:01 escape75 /usr/sbin/cron[25494]: (root) CMD (  nice -n19 /usr/local/bin/gen_inline_reporting_data.plx)
2023:03:16-01:35:01 escape75 /usr/sbin/cron[25649]: (root) CMD (   /usr/local/bin/reporter/system-reporter.pl)
�����������������������������������������������������������������������������������������������������������
2023:03:16-09:03:10 escape75 syslog-ng[4942]: syslog-ng starting up; version='3.4.7' 2023:03:16-09:03:12 escape75 ddclient[5361]: WARNING: cannot connect to checkip.dyndns.org:80 socket: IO::Socket::INET: Bad hostname 'checkip.dyndns.org' 2023:03:16-09:03:24 escape75 system: System was restarted



So,- I've been running the software version of UTM (9.714) on my old unit (an XG115 r2) for a couple of years without any issues,
and recently I have migrated my saved config over to a new unit (XG115 r3) and a few hours after setting up the new unit (at night)

it froze up, and interfaces were not pingable (LAN) so I powered it down and rebooted. It's working again ...

Just wondering if there's something more I can look at to see what the issue was .. I have a hunch maybe it was DHCP related,
as my devices on the LAN were renewing the IP addresses and they were not in the table on the new unit, but it's a wild guess,
so if this doesn't happen again then maybe it's nothing to worry about.

I don't know if there would be an issue moving the config file (and license) from the old unit, but I wouldn't think so.

The new unit was installed the same way as the old unit, using the ssi-9.714-4.1.iso file and removing the /etc/asg with a software license,
and the old unit hasn't experienced any weird issues in years, and the ethernet ports and devices are setup in an identical way, nothing changed.

Just looking for thoughts and ideas ...

Stats from top:

top - 11:32:20 up 2:31, 1 user, load average: 0.09, 0.29, 0.25
Tasks: 163 total, 1 running, 160 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.6%us, 0.5%sy, 0.0%ni, 98.5%id, 0.1%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 3898468k total, 3558768k used, 339700k free, 111124k buffers
Swap: 4194300k total, 112k used, 4194188k free, 1352808k cached

Zombies:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18256 0.0 0.0 0 0 ? Z 11:30 0:00 [aua.bin] <defunct>
root 18595 0.6 0.0 0 0 ? Z 11:32 0:00 [confd.plx] <defunct>



This thread was automatically locked due to age.
Parents
  • I think you've put in a good effort to try to get it going. What was the source of this device? If you can return it, I would. No point in wasting more time trouble shooting.

    I've bought used pc equip on ebay before. If it works great, if I can't get it going within an evening or 3, it goes back.

  • Yes, it was ebay in fact.

    I have purchased a previous XG115 r2 and it's still running great, but this one has issues from the start ...

    I was just trying to figure out if I'm running into some weird bug possible with one of my LAN devices causing a crash,
    I know it would be strange but I've seen strange things, and the seller claims it was running just fine before it was replaced.

    You know, when I see stuff like this it makes me wonder if it was another bad unit or:

     XG115 Rev 3 freezing sometimes on SFOS 18.5.2 MR-2-Build380 

  • Put it to the test. Attach a single client on the lan side, see what happens.  Looking up specs, it appears the device uses pretty standard i211 nic's. These are generally very well supported.

    Doesn't matter what the seller claims. Just put defective and call it a day :).

    One never really knows what they're buying on ebay. New, openbox, chances are if its a volume seller, the item is some kind of return. Sellers buy pallets of this crap then turn around and sell it if it passes basic tests (ie, posts).

  • I was just trying to figure out if I'm running into some weird bug possible with one of my LAN devices causing a crash

    The chances of that area slim to none, leaning more to none.  I think the unit may be faulty in places we can't test/don't want to bother with and personally, I don't buy these units as they are usually under-performing for my taste.  The seller can preach that all day long; they aren't looking out for you, they are making money.

    You could buy your own machine that isn't hardware specific for what you paid or less and it will last for years and years.  I have had a SuperMicro 1U forever, and finally just replaced the hard drive because it was failing (old 5400 RPM disk to a new SSD), and I updated to a Xeon quad core processor just because I had a dual core running in it before.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • I buy and sell (under different ID's ) on ebay too.  I dread selling anything electronic because it may turn into a headache/nightmare.  But that's how the game goes. Don't sell anything there you're not prepared to lose your shirt on.  Have an old toshiba HD dvd player, still in the box never opened. Do I feel lucky..............

Reply
  • I buy and sell (under different ID's ) on ebay too.  I dread selling anything electronic because it may turn into a headache/nightmare.  But that's how the game goes. Don't sell anything there you're not prepared to lose your shirt on.  Have an old toshiba HD dvd player, still in the box never opened. Do I feel lucky..............

Children
  • Is there a way to log via putty to capture what might be happening?

    Possibly similar to this, or is this only available on SFOS:
    Sudden Reboots and freezing system on an XG 115 Appliance (v17 AND v18)

  • There's  remote logging.

    Why don't you want to return this thing?

  • Thank you,- I just want to make sure I did my best to try and resolve the issue Slight smile

    Maybe I can try that if it's easy enough to setup and see what it tells me,
    it would be nice to see if it's like a kernel fault or something, or just a freeze ...

    The fact I don't see anything in the logs is really strange when the event occurs!

  • Did this thing come with PFSense installed by chance?  I've seen a lot of them on ebay selling like that.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • No, it actually came with SFOS.

    Maybe I will try a 30 day trial of SFOS and see if this issue repeats itself,
    it would be interesting if I could accidentally find a software bug in UTM.

    Like you said, chances are it's hardware but I just want to make sure Slight smile

  • I misunderstood you - I thought you had already done that. 

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • I tried UTM but the hardware version instead of the software one ...

    I expect both devices (XG115 r2 & r3) should be able to run on either OS of course Slight smile

  • Just another update, I was able to find some weird entries for the last 3 hours before the crash in syslog ...

    192.168.5.1 Mar 22 12:46:18 daemon warning 2023:03:22-19:27:59 escape75 URID[7870] T=7870 ------ 2 - sxl2_internal_get_time: The clock was set back from 1679510202 to 1679509679\n

    192.168.5.1 Mar 22 13:22:48 daemon warning 2023:03:22-19:27:49 escape75 URID[7870] T=7870 ------ 2 - sxl2_internal_get_time: The clock was set back from 1679510204 to 1679509669\n

    That doesn't look very good ... that's a 30 seconds time jump back!

    This in turn causes DHCP to do this:

    192.168.5.1 Mar 22 13:04:32 daemon debug 2023:03:22-19:27:52 escape75 dhcpd reuse_lease: lease age -337 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.5.5

  • One more update, the unit also freezes on SFOS hardware version, no access to ports or serial possible.

    I have contacted the seller and got a refund. I ended up paying some duties, he paid a little more for shipping,
    but in the end it's a wash and I'm happy that we were able to resolve this situation as best as we could.

    I will maybe play with it some more, it seems like it's starting to lose track of time (RTC clock issue)
    but I'm wondering if this could be a CMOS battery issue, even though it's not losing bios settings ...

    It certainly strange as it was pulled from a working environment, and now that I have excluded the SSD,
    as well as memory issues, there's not much that remains ...

    Thanks for everyone's help!

  • Doubt it's the battery.  Battery comes into play when device is off. When on, power supply provides power for everything. I wonder however if the ps isn't somehow flakey causing the issues.  That would make sense for all the instability.