Advisory: Sophos Endpoint "Your connection isn't private" after reboot. Policy settings can be returned to normal. See: KB-000045954 for the latest updates.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Weird UTM freezes randomly approximately once a day ...

I have experienced a strange lockup on my "new" UTM box, but I checked log files and they don't reveal anything, just a bunch of weird characters ...

2023:03:16-01:32:01 escape75 /usr/sbin/cron[25494]: (root) CMD (  nice -n19 /usr/local/bin/gen_inline_reporting_data.plx)
2023:03:16-01:35:01 escape75 /usr/sbin/cron[25649]: (root) CMD (   /usr/local/bin/reporter/system-reporter.pl)
�����������������������������������������������������������������������������������������������������������
2023:03:16-09:03:10 escape75 syslog-ng[4942]: syslog-ng starting up; version='3.4.7' 2023:03:16-09:03:12 escape75 ddclient[5361]: WARNING: cannot connect to checkip.dyndns.org:80 socket: IO::Socket::INET: Bad hostname 'checkip.dyndns.org' 2023:03:16-09:03:24 escape75 system: System was restarted



So,- I've been running the software version of UTM (9.714) on my old unit (an XG115 r2) for a couple of years without any issues,
and recently I have migrated my saved config over to a new unit (XG115 r3) and a few hours after setting up the new unit (at night)

it froze up, and interfaces were not pingable (LAN) so I powered it down and rebooted. It's working again ...

Just wondering if there's something more I can look at to see what the issue was .. I have a hunch maybe it was DHCP related,
as my devices on the LAN were renewing the IP addresses and they were not in the table on the new unit, but it's a wild guess,
so if this doesn't happen again then maybe it's nothing to worry about.

I don't know if there would be an issue moving the config file (and license) from the old unit, but I wouldn't think so.

The new unit was installed the same way as the old unit, using the ssi-9.714-4.1.iso file and removing the /etc/asg with a software license,
and the old unit hasn't experienced any weird issues in years, and the ethernet ports and devices are setup in an identical way, nothing changed.

Just looking for thoughts and ideas ...

Stats from top:

top - 11:32:20 up 2:31, 1 user, load average: 0.09, 0.29, 0.25
Tasks: 163 total, 1 running, 160 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.6%us, 0.5%sy, 0.0%ni, 98.5%id, 0.1%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 3898468k total, 3558768k used, 339700k free, 111124k buffers
Swap: 4194300k total, 112k used, 4194188k free, 1352808k cached

Zombies:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18256 0.0 0.0 0 0 ? Z 11:30 0:00 [aua.bin] <defunct>
root 18595 0.6 0.0 0 0 ? Z 11:32 0:00 [confd.plx] <defunct>



This thread was automatically locked due to age.
Parents
  • I think you've put in a good effort to try to get it going. What was the source of this device? If you can return it, I would. No point in wasting more time trouble shooting.

    I've bought used pc equip on ebay before. If it works great, if I can't get it going within an evening or 3, it goes back.

  • Yes, it was ebay in fact.

    I have purchased a previous XG115 r2 and it's still running great, but this one has issues from the start ...

    I was just trying to figure out if I'm running into some weird bug possible with one of my LAN devices causing a crash,
    I know it would be strange but I've seen strange things, and the seller claims it was running just fine before it was replaced.

    You know, when I see stuff like this it makes me wonder if it was another bad unit or:

     XG115 Rev 3 freezing sometimes on SFOS 18.5.2 MR-2-Build380 

  • Update,- 20 hours and box is still up, I'm getting confused, but will continue testing to get longer uptime ...

    Now I'm beginning to wonder if the stick of ram is possibly bad after all, but if it is, then why does it work in the r2 unit.

    I clearly need to do more testing, possibly get a VGA display and perform a 24 hour memory test on both units Slight smile

  • If you got pc that accepts that ram, test it there. That will rule out any sg hardware gremlins.

  • Not really, it's a laptop type SODIMM stick, but once I confirm it works with whatever BIOS options,

    then I will reverse the sticks of RAM again and see if the issue comes back, it could be just that stick

    in that box, due to timings, not sure. I've seen weird things and this might be another one of those Slight smile

  • Very weird, after 1 day and 10 hours the unit is still up, now after putting the original RAM it works ...

    I wonder if something was just lose and re-inserting the RAM fixed it, I guess time will tell,-
    I never bothered to remove the RAM or the SSD as they had that factory security glue on them,
    connecting the sides of the RAM and SSD to the socket they were each plugged in to.

    Oh well, I'm continuing to test ...

  • Testing some old hardware I recently put on ebay.

    Among others, there were 2 old intel gigabit nics.  One worked flawlessly, the other would be intermittent in terms of system recognizing.

    Located a #2 pencil and went to town on the contacts. They were someone discolored (oxidized?) after sitting for a decade or more.  After cleaning, the card was recognized each and every time.

    Perhaps all the removal/install of the memory cleaned up the contacts enough so there's no more issue?

  • I guess that's one possible explanation,- but if that's the case, why did a relatively quick memory test pass, and why did 3rd party OS's work fine ...

    I guess it could be because the tests I performed were too short, and 3rd party OS's didn't access those memory locations ...

    But basically if I didn't witness this myself I'd have a hard time believing someone, as this really doesn't make much sense Slight smile

    Hopefully once I perform some more memory testing and play with things more I can finally declare things stable, time will tell!

  • Update: The box froze up after about 4 days or so ...

    Next I will look at BIOS settings again, as I reset them to defaults while testing.
    I'm guessing it's not the settings, so I will also look at longer memory tests.
    Also, will be swapping the sticks of RAM around.

    This won't be quick, as it seems sometimes the box works for days without issues! Slight smile

  • Still fighting this beast Slight smile

    I have determined that BIOS settings don't matter ...

    I have swapped the RAM sticks again between the units, and the r3 (freezing unit) is now over 4 days uptime on the r2's good stick.

    I am also testing the r2 (good unit) with the possibly bad stick of RAM using memtest86, and it's at about 6 hours right now.

    I'm not discounting the possibility that by swapping the RAM both units will be working fine, but time will tell ...

    Update: 24 hours on the r2 with the possibly bad stick of RAM and all good ...

    Next is a 24 hours test of the good stick in the possibly bad r3 box ... update: done, also passed Slight smile

  • Over 5 days on the r3 with the r2's stick of RAM and so far so good ...

    It's possibly the stick of RAM, but it also seems to work fine in the r2 unit, possibly timing glitches.

  • Since both boxes are now stable I have concluded it was the memory module, but it wasn't faulty, it must be a timing issue.

    The swap between the boxes made both boxes work, so possibly the r3 units requires tighter memory timings ...

    Further update: The box is still stable, in the meantime I have replaced the 4 GB with and 8 GB stick and it's still good.

    I would say the stick of RAM was faulty, but only in this particular unit as the same stick worked in the older revision Slight smile

Reply
  • Since both boxes are now stable I have concluded it was the memory module, but it wasn't faulty, it must be a timing issue.

    The swap between the boxes made both boxes work, so possibly the r3 units requires tighter memory timings ...

    Further update: The box is still stable, in the meantime I have replaced the 4 GB with and 8 GB stick and it's still good.

    I would say the stick of RAM was faulty, but only in this particular unit as the same stick worked in the older revision Slight smile

Children
No Data