This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Weird UTM freezes randomly approximately once a day ...

I have experienced a strange lockup on my "new" UTM box, but I checked log files and they don't reveal anything, just a bunch of weird characters ...

2023:03:16-01:32:01 escape75 /usr/sbin/cron[25494]: (root) CMD (  nice -n19 /usr/local/bin/gen_inline_reporting_data.plx)
2023:03:16-01:35:01 escape75 /usr/sbin/cron[25649]: (root) CMD (   /usr/local/bin/reporter/system-reporter.pl)
�����������������������������������������������������������������������������������������������������������
2023:03:16-09:03:10 escape75 syslog-ng[4942]: syslog-ng starting up; version='3.4.7' 2023:03:16-09:03:12 escape75 ddclient[5361]: WARNING: cannot connect to checkip.dyndns.org:80 socket: IO::Socket::INET: Bad hostname 'checkip.dyndns.org' 2023:03:16-09:03:24 escape75 system: System was restarted



So,- I've been running the software version of UTM (9.714) on my old unit (an XG115 r2) for a couple of years without any issues,
and recently I have migrated my saved config over to a new unit (XG115 r3) and a few hours after setting up the new unit (at night)

it froze up, and interfaces were not pingable (LAN) so I powered it down and rebooted. It's working again ...

Just wondering if there's something more I can look at to see what the issue was .. I have a hunch maybe it was DHCP related,
as my devices on the LAN were renewing the IP addresses and they were not in the table on the new unit, but it's a wild guess,
so if this doesn't happen again then maybe it's nothing to worry about.

I don't know if there would be an issue moving the config file (and license) from the old unit, but I wouldn't think so.

The new unit was installed the same way as the old unit, using the ssi-9.714-4.1.iso file and removing the /etc/asg with a software license,
and the old unit hasn't experienced any weird issues in years, and the ethernet ports and devices are setup in an identical way, nothing changed.

Just looking for thoughts and ideas ...

Stats from top:

top - 11:32:20 up 2:31, 1 user, load average: 0.09, 0.29, 0.25
Tasks: 163 total, 1 running, 160 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.6%us, 0.5%sy, 0.0%ni, 98.5%id, 0.1%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 3898468k total, 3558768k used, 339700k free, 111124k buffers
Swap: 4194300k total, 112k used, 4194188k free, 1352808k cached

Zombies:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18256 0.0 0.0 0 0 ? Z 11:30 0:00 [aua.bin] <defunct>
root 18595 0.6 0.0 0 0 ? Z 11:32 0:00 [confd.plx] <defunct>



This thread was automatically locked due to age.
Parents
  • I think you've put in a good effort to try to get it going. What was the source of this device? If you can return it, I would. No point in wasting more time trouble shooting.

    I've bought used pc equip on ebay before. If it works great, if I can't get it going within an evening or 3, it goes back.

  • Yes, it was ebay in fact.

    I have purchased a previous XG115 r2 and it's still running great, but this one has issues from the start ...

    I was just trying to figure out if I'm running into some weird bug possible with one of my LAN devices causing a crash,
    I know it would be strange but I've seen strange things, and the seller claims it was running just fine before it was replaced.

    You know, when I see stuff like this it makes me wonder if it was another bad unit or:

     XG115 Rev 3 freezing sometimes on SFOS 18.5.2 MR-2-Build380 

  • Just a quick update, it's been almost 24 hours and still up, too early to tell if it's good.

    I opened up my old XG115 r2 and tested the battery, which is older, and to my surprise it was 3.2v ...

    It's also interesting that the old unit has a 2450 vs a 2032, with nearly a double capacity!

  • Unfortunately it went down again, lasted about 24 hours ...

    Oh well, it's a hardware fault after all I guess so nothing I can do Slight smile

  • One more update ...

    I tried with another power supply (higher amperage) but the issue remains.

    However, I have been testing Windows 10 on the unit, as well as performed Intel CPU tests and it passed!
    I am also now running pfSense on the unit, and hopefully I will get some more answers as to what could be happening.

  • Feels good to have gone through this exercise? :)

    Run some prime95 torture tests on it for a few hours.. I'd be surprised if it doesn't lock up sooner than later.

    Has *sense been stable on it? I think their igb drivers are more picky than linux's.

  • I think it feels good, I'm learning more details about what the issue might be Slight smile

    So, the interesting part is that my pfSense is setup in a similar configuration,
    port 1 is WAN, and ports 2,3,4 are just bridged LAN, but so far pfSense has
    not gone down and uptime is at 1 day and 23 hours!

    This is much longer than UTM or SFOS has been able to do, and I've also tried taxing
    the CPU by issuing multiple 'yes > /dev/null &' for each core, and there were no issues.

    So, since I am sure there's other people here on UTM using an XG115 r3 without issues,
    I must be running into a hardware fault that's only exposed by UTM and not pfSense ...

    I don't know how that is possible Slight smile

  • Did you do a prime95 test?

  • I did not, yet.

    I wish there was a way to run prime95 on pfSense, maybe there's a way, I just don't know how,
    I don't want to jump every 2 days to a different OS as that wouldn't be a reliable test either ...

    I guess I could try: https://www.ultimatebootcd.com/

  • You could setup a windows to go usb flash drive to run it off off (or external ssd).

  • Assuming no issues after a few hours of prime95. Perhaps the nics are faulty?

    As this is a multiple interface device, you could just run one ethernet cable between 2 ports then set up iperf3 to generate traffic.

    The server and client will need to be bound to the separate interfaces using -B.  IIRC the format is -B {ip of interface).

    Lets say you assign 192.168.1.1 to interface 1, 192.168.1.2 to interface 2

    -B 192.168.1.1 will bind to interface 1, -B 192.168.1.2 will bind to interface 2.

    Run that for a period of time to see if it causes some sort of failure.

  • Some good points, I will keep working in it ...

    My uptime is now 2 Days 17 Hours.

    I would think that if the issue related to a faulty NIC, it would also show up in pfSense, hmm!

    I wish I had access to another XG115 r3, could it be different BIOS settings, or something ...

Reply
  • Some good points, I will keep working in it ...

    My uptime is now 2 Days 17 Hours.

    I would think that if the issue related to a faulty NIC, it would also show up in pfSense, hmm!

    I wish I had access to another XG115 r3, could it be different BIOS settings, or something ...

Children