This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SG 430 (Home Licence) Crashing/Freezing

Hi,

I have a second hand SG 430, running with a home licence. Firmware version = 9.711-5. It has been running for over a year with no issues.

However, since the beginning of May, it has crashed/frozen 3 times. (see attached picture of hardware usage - 3rd time was today). When it freezes, WebAdmin is unavailable, internet access is unavailable, VPN is unavailable, etc. Even the joystick control on the front of the SG 430 doesn't do anything. Only way to fix is to power off at the wall and then power back on.

There seems to be no regularity to the crashes. I have checked the SMART status of the hard disk, which appears to have passed.

I'm wondering how to troubleshoot this issue? Is it likely to be a hardware or software issue? I'm thinking of completely re-installing UTM software, and then restoring configuration from Back-Up. Any advice would be greatly appreciated.

Many thanks

This thread was automatically locked due to age.

Top Replies

Handel078 over 2 years ago in reply to Vivek Jagad +1

Thanks - have set up. Will report back with details once it crashes again....!

Parents

0 Vivek Jagad over 2 years ago

Hello Handel078,

Thank you for reaching out to community, check the disk usage with via command line interface:
> df -kh
> and check the postgres status with the help of the command: ps -aux | grep postgres
> additionally, check the /var/logsystem.log , /var/log/kernel.log and /var/log/fallback.log

Thanks & Regards,
_______________________________________________________________

Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

Log a Support Case | Sophos Service Guide
Best Practices – Support Case | Security Advisories
Compare Sophos next-gen Firewall | Fortune Favors the prepared
Sophos Community | Product Documentation | Sophos Techvids | SMS
If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Cancel

0 Handel078 over 2 years ago in reply to Vivek Jagad

Hi Vivek,

Thank you for helping. It crashed again overnight. Before rebooting, the red LED light on the hard-drive indicator on the front of the box was not on or flashing, and there was no response to any commands executed directly on the console (USB keyboard and VGA screen). However, the joystick did scroll through the menus on the little LCD screen. When I selected the reboot command, it said "rebooting now..." however nothing actually occurred.

I hard rebooted via power socket, and tried your suggestions. Here are the outputs. (I'm not an expert in deciphering the logs, so any help interpreting them would be much appreciated).

Many thanks,

1. Disk Usage:

utm:/root # df -kh
Filesystem                        Size  Used Avail Use% Mounted on
/dev/sda6                         5.2G  3.2G  1.8G  65% /
udev                              7.8G   96K  7.8G   1% /dev
tmpfs                             7.8G  112K  7.8G   1% /dev/shm
/dev/sda1                         331M   16M  295M   5% /boot
/dev/sda5                          84G  2.7G   77G   4% /var/storage
/dev/sda7                         110G  2.4G  101G   3% /var/log
/dev/sda8                         4.6G  9.5M  4.3G   1% /tmp
/dev                              7.8G   96K  7.8G   1% /var/storage/chroot-clientlessvpn/dev
tmpfs                             7.8G     0  7.8G   0% /var/sec/chroot-httpd/dev/shm
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-openvpn/dev
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-ppp/dev
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-pppoe/dev
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-pptp/dev
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-pptpc/dev
/dev                              7.8G   96K  7.8G   1% /var/sec/chroot-restd/dev
tmpfs                             7.8G     0  7.8G   0% /var/storage/chroot-reverseproxy/dev/shm
/var/storage/chroot-smtp/spool     84G  2.7G   77G   4% /var/sec/chroot-httpd/var/spx/spool
/var/storage/chroot-smtp/spx       84G  2.7G   77G   4% /var/sec/chroot-httpd/var/spx/public/images/spx
tmpfs                             7.8G  157M  7.7G   2% /var/storage/chroot-http/tmp
/var/sec/chroot-afc/var/run/navl  5.2G  3.2G  1.8G  65% /var/storage/chroot-http/var/run/navl
tmpfs                             7.8G   60K  7.8G   1% /var/storage/chroot-smtp/tmp/ram
/etc/nwd.d/route                  5.2G  3.2G  1.8G  65% /var/sec/chroot-ipsec/etc/nwd.d/route

2. Postgres

utm:/root Warning: bad ps syntax, perhaps postgres  4274  0.0  0.4 2210548 81504 ? postgres  4285  0.0  0.1 2211732 17276 ? postgres  4286  0.0  0.0 2211576 14148 ? postgres  4287  0.0  0.0 2211576 4900 ? postgres  4288  0.0  0.0 2212688 2400 ? postgres  4289  0.0  0.0  26932   620 ? postgres  4290  0.0  0.0  27212  1104 ? postgres  5280  0.0  0.0 2217832 12632 ? postgres  5806  0.0  0.0 2214312 5184 ? postgres  5807  0.0  0.0 2214312 5120 ? postgres  5844  0.0  0.1 2218788 19472 ? postgres  5845  0.0  0.0 2214976 4916 ? postgres  5856  0.0  0.0 2215072 5912 ? postgres  5901  0.0  0.0 2215072 5848 ? postgres  6045  0.0  0.0 2215064 6264 ? postgres  6076  0.0  0.0 2214968 5624 ? postgres  7165  0.0  0.0 2214852 5148 ? postgres  7181  0.0  0.0 2214988 5956 ? postgres  8319  0.0  0.0 2215132 6960 ? root      8490  0.0  0.0   5672   748 pts/0

# ps -aux | grep postgres a bogus '-'? See http://procps.sf.net/faq.html S 10:45 0:00 /usr/pgsql92-64/bin/postgres -D /var/storage/pgsql92/data Ss 10:45 0:00 postgres: checkpointer process Ss 10:45 0:00 postgres: writer process Ss 10:45 0:00 postgres: wal writer process Ss 10:45 0:00 postgres: autovacuum launcher process Ss 10:45 0:00 postgres: archiver process Ss 10:45 0:00 postgres: stats collector process Ss 10:45 0:00 postgres: reporting reporting [local] idle Ss 10:45 0:00 postgres: smtp smtp [local] idle Ss 10:45 0:00 postgres: smtp smtp [local] idle Ss 10:45 0:00 postgres: reporting reporting [local] idle Ss 10:45 0:00 postgres: reporting reporting [local] idle Ss 10:45 0:00 postgres: hotspot hotspot [local] idle Ss 10:45 0:00 postgres: hotspot hotspot [local] idle Ss 10:45 0:00 postgres: smtp smtp 127.0.0.1(44532) idle Ss 10:45 0:00 postgres: smtp smtp 127.0.0.1(44535) idle Ss 10:46 0:00 postgres: sandbox sandbox [local] idle Ss 10:46 0:00 postgres: sandbox sandbox [local] idle Ss 10:50 0:00 postgres: smtp smtp 127.0.0.1(44630) idle S+ 10:52 0:00 grep postgres

3. Logs around the crash time

A) system.log (one of my WAN interfaces is a 4G connection, in the 10.179.x.x range)

2022:05:24-04:34:44 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:34:44 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:34:44 utm dhclient: bound to 10.179.255.180 -- renewal in 32 seconds.
2022:05:24-04:34:56 utm dns-resolver[4969]: No change to REF_NetDnsSmtpGmail :: smtp.gmail.com
2022:05:24-04:35:01 utm /usr/sbin/cron[1040]: (root) CMD (   /usr/local/bin/reporter/system-reporter.pl)
2022:05:24-04:35:01 utm /usr/sbin/cron[1041]: (httpproxy) CMD (/var/chroot-http/usr/bin/virus_feedback_uploader)
2022:05:24-04:35:16 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:35:16 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:35:16 utm dhclient: bound to 10.179.255.180 -- renewal in 25 seconds.
2022:05:24-04:35:41 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:35:41 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:35:41 utm dhclient: bound to 10.179.255.180 -- renewal in 25 seconds.
2022:05:24-04:35:57 utm dns-resolver[4969]: No change to REF_NetDnsAppleNtp :: time.apple.com
2022:05:24-04:36:06 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:36:06 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:36:06 utm dhclient: bound to 10.179.255.180 -- renewal in 32 seconds.
2022:05:24-04:36:38 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:36:38 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:36:39 utm dhclient: bound to 10.179.255.180 -- renewal in 31 seconds.
2022:05:24-04:36:57 utm dns-resolver[4969]: Updating REF_NetDnsSmtpGmail :: smtp.gmail.com
2022:05:24-04:37:10 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 10.179.255.181 port 67
2022:05:24-04:37:10 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-04:37:10 utm dhclient: bound to 10.179.255.180 -- renewal in 25 seconds.
2022:05:24-10:45:28 utm syslog-ng[5342]: syslog-ng starting up; version='3.4.7'
2022:05:24-10:45:29 utm ntpd[5083]: Listen normally on 12 tun0 10.242.2.1:123
2022:05:24-10:45:29 utm ntpd[5083]: new interface(s) found: waking up resolver
2022:05:24-10:45:33 utm dns-resolver[4982]: DNS server failed to contact!
2022:05:24-10:45:33 utm dns-resolver[4982]: DNS server failed to contact!
2022:05:24-10:45:43 utm dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6
2022:05:24-10:45:44 utm dhclient: DHCPOFFER of 10.179.255.180 from 10.179.255.181
2022:05:24-10:45:44 utm dhclient: DHCPREQUEST for 10.179.255.180 on eth1 to 255.255.255.255 port 67
2022:05:24-10:45:44 utm dhclient: DHCPACK of 10.179.255.180 from 10.179.255.181
2022:05:24-10:45:44 utm dhclient: bound to 10.179.255.180 -- renewal in 27 seconds.

B) kernel.log only has data from after the reboot.

C) fallback.log

2022:05:24-04:30:57 utm [daemon:info] dhcp_updown[819]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:31:25 utm [daemon:info] dhcp_updown[841]:  eth1 - reason:RENEW
2022:05:24-04:31:25 utm [daemon:info] dhcp_updown[841]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:31:52 utm [daemon:info] dhcp_updown[857]:  eth1 - reason:RENEW
2022:05:24-04:31:52 utm [daemon:info] dhcp_updown[857]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:32:22 utm [daemon:info] dhcp_updown[921]:  eth1 - reason:RENEW
2022:05:24-04:32:22 utm [daemon:info] dhcp_updown[921]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:32:50 utm [daemon:info] dhcp_updown[937]:  eth1 - reason:RENEW
2022:05:24-04:32:50 utm [daemon:info] dhcp_updown[937]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:33:19 utm [daemon:info] dhcp_updown[951]:  eth1 - reason:RENEW
2022:05:24-04:33:19 utm [daemon:info] dhcp_updown[951]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:33:48 utm [daemon:info] dhcp_updown[968]:  eth1 - reason:RENEW
2022:05:24-04:33:48 utm [daemon:info] dhcp_updown[968]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:34:18 utm [daemon:info] dhcp_updown[987]:  eth1 - reason:RENEW
2022:05:24-04:34:18 utm [daemon:info] dhcp_updown[987]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:34:44 utm [daemon:info] dhcp_updown[1004]:  eth1 - reason:RENEW
2022:05:24-04:34:44 utm [daemon:info] dhcp_updown[1004]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:35:16 utm [daemon:info] dhcp_updown[1115]:  eth1 - reason:RENEW
2022:05:24-04:35:16 utm [daemon:info] dhcp_updown[1115]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:35:41 utm [daemon:info] dhcp_updown[1131]:  eth1 - reason:RENEW
2022:05:24-04:35:41 utm [daemon:info] dhcp_updown[1131]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:36:06 utm [daemon:info] dhcp_updown[1158]:  eth1 - reason:RENEW
2022:05:24-04:36:06 utm [daemon:info] dhcp_updown[1158]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:36:39 utm [daemon:info] dhcp_updown[1178]:  eth1 - reason:RENEW
2022:05:24-04:36:39 utm [daemon:info] dhcp_updown[1178]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-04:37:10 utm [daemon:info] dhcp_updown[1208]:  eth1 - reason:RENEW
2022:05:24-04:37:10 utm [daemon:info] dhcp_updown[1208]:  dhcp_updown: No IPv4 address change, exiting
2022:05:24-10:45:34 utm [daemon:info] irqd[3749]:  received SIGTERM
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  getting interface notifications
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  lo loopback <loopback,up,running,lowerup> group 0 
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  RPS enabled, XPS enabled
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  lo: detected 1 queue(s), 'network' cpuset
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  lo:0: affinity irq=0x3 rps/xps=0x3
2022:05:24-10:45:34 utm [daemon:info] irqd[6302]:  lo: up

0 Vivek Jagad over 2 years ago in reply to Handel078

If you read the column headers, you'd see that pre-fail is the type of statistic that's collected not the status. The When_Failed column being empty should also give you some hints about whether or not anything has failed (nothing has).

> https://www.linuxjournal.com/article/6983
Each Attribute also has a Threshold value (whose range is 0 to 255) which is printed under the heading "THRESH". If the Normalized value is less than or equal to the Threshold value, then the Attribute is said to have failed. If the Attribute is a pre-failure Attribute, then disk failure is imminent.

So as long as the normalized value is higher than the thresshold value there's nothing to worry about.

Thanks & Regards,
_______________________________________________________________

Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

Log a Support Case | Sophos Service Guide
Best Practices – Support Case | Security Advisories
Compare Sophos next-gen Firewall | Fortune Favors the prepared
Sophos Community | Product Documentation | Sophos Techvids | SMS
If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Cancel
0 Vivek Jagad over 2 years ago in reply to Handel078

Handel078, what if you turn off the IPS and web-filtering and then check the results...

Thanks & Regards,
_______________________________________________________________

Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

Log a Support Case | Sophos Service Guide
Best Practices – Support Case | Security Advisories
Compare Sophos next-gen Firewall | Fortune Favors the prepared
Sophos Community | Product Documentation | Sophos Techvids | SMS
If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Vivek Jagad

turn off IPS & Web-filtering and then re-run smartctl? or wait for it to crash again....
Cancel
Vote Up 0 Vote Down

Cancel
0 Vivek Jagad over 2 years ago in reply to Handel078

Wait for the next crash and then check the following logs during the crash:
/var/log/syslog.log
/var/log/kernel.log
/var/log/fallback.log

And then re-run the test...

Thanks & Regards,
_______________________________________________________________

Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

Log a Support Case | Sophos Service Guide
Best Practices – Support Case | Security Advisories
Compare Sophos next-gen Firewall | Fortune Favors the prepared
Sophos Community | Product Documentation | Sophos Techvids | SMS
If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Vivek Jagad

I can't check the logs whilst it has frozen/crashed, and often they are empty (around the crash time) after rebooting, but can try. Is there any way to output those logs live to putty via console cable?
Cancel
Vote Up 0 Vote Down

Cancel
0 Vivek Jagad over 2 years ago in reply to Handel078

You can use putty to monitor the crash of the appliance after you have turned off the IPS/web-filter on the UTM.
Under the putty, ensure the following:
Connection type: serial
Speed: 38400
Logging: All session output
window > Lines of scrollback: 2000000

Thanks & Regards,
_______________________________________________________________

Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

Log a Support Case | Sophos Service Guide
Best Practices – Support Case | Security Advisories
Compare Sophos next-gen Firewall | Fortune Favors the prepared
Sophos Community | Product Documentation | Sophos Techvids | SMS
If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Vivek Jagad

Thanks very much - I'll let you know
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Handel078

Okay so IPS and web-filter turned off. Server has crashed again - but in a different way, internet (WAN) is still working, but Web-admin, VPN are not.

Ran tail of the kernel log live on console, experienced a "ata3.00 exception" at 12:30:55. Details in the picture below. Any input on how to fix this would be greatly appreciated.
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Handel078
Also when I try to run a command now in the console, for example atop, I get this

utm:/root # atop -bash: /usr/bin/atop: No such file or directory

Web-admin has a 404 error. Quote:

"Not Found

The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request."
Cancel
Vote Up 0 Vote Down

Cancel
0 Jay Jay over 2 years ago in reply to Handel078

Your smart details lack the model of the ssd. There's some stats that are unidentified. They may or may not be of concern.

The dmesg log suggests issues when writing which in part would make sense why the system is unstable. If it were me, I'd try a different ssd. The model number would help identify the type of ssd (sata or pcie/nvme). If the board has regular sata connectors you could try that as well with a different ssd.

Edit: I take that back. the last few lines of the dmesg log identify a ssdsc2bw240h6 ssd. This comes back to a 2.5" 240gb intel ssd. This of course is the sata variety. I would just try a different 2.5" ssd. Even a 120GB would work. Reinstall from scratch then reload your config file. Use a different cable too.
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Jay Jay over 2 years ago in reply to Handel078

Your smart details lack the model of the ssd. There's some stats that are unidentified. They may or may not be of concern.

The dmesg log suggests issues when writing which in part would make sense why the system is unstable. If it were me, I'd try a different ssd. The model number would help identify the type of ssd (sata or pcie/nvme). If the board has regular sata connectors you could try that as well with a different ssd.

Edit: I take that back. the last few lines of the dmesg log identify a ssdsc2bw240h6 ssd. This comes back to a 2.5" 240gb intel ssd. This of course is the sata variety. I would just try a different 2.5" ssd. Even a 120GB would work. Reinstall from scratch then reload your config file. Use a different cable too.
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 Handel078 over 2 years ago in reply to Jay Jay

Jay Jay Thank you for this. I will try to get a new SSD overnighted to me and install tomorrow. Might try and change the cable this evening if I have a spare lying around. Any suggestions on specific ssd? Should it be "enterprise" grade, or would something like this work? https://www.ebuyer.com/822477-crucial-mx500-250gb-ssd-ebuyer-ct250mx500ssd1
Cancel
Vote Up 0 Vote Down

Cancel
0 Jay Jay over 2 years ago in reply to Handel078

Personally I'm partial to samsung ssd's (still have a few evo 850/860's laying around). I suppose anything with a high TBW would work. That crucial one is rated at 100TB which should be fine.

Should add, cable issues usually show up as udma crc errors (line 16 of your smart output). Yours is showing 0.
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Jay Jay

Thanks I've gone for an evo 870. Could you explain the cable issue.... are you saying I do or don't have one?
Cancel
Vote Up 0 Vote Down

Cancel
0 Jay Jay over 2 years ago in reply to Handel078

Inconclusive. I'd replace it anyway just to be on the safe side.

What was the outcome of the memory diag?
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Jay Jay

Didn't know how to investigate memory diag. Could you advise? I found the "ata3.00 exception" in the kernel log which pushed me towards SATA/SSD issue
Cancel
Vote Up 0 Vote Down

Cancel
0 Jay Jay over 2 years ago in reply to Handel078

Isn't there a memory test diag on the ISO? (or when you boot utm).

You can try https://www.memtest.org/ or https://www.memtest86.com/
Cancel
Vote Up 0 Vote Down

Cancel
0 Handel078 over 2 years ago in reply to Jay Jay

Thanks will try
Cancel
Vote Up 0 Vote Down

Cancel
+1 Handel078 over 2 years ago in reply to Handel078

UPDATE:

A brand new SSD seems to have fixed the problem. The SG430 has now run for 24hrs with seemingly no issues. So I am going to assume the problem was with the intel SSD. Jay Jay & Vivek Jagad Thank you both for all your help in diagnosing the issue!
Cancel
Vote Up +1 Vote Down

Cancel

SG 430 (Home Licence) Crashing/Freezing

Top Replies

"Not Found