This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problems after change of hardware

Hello guys!

I have a big problem after changing the hardware my Sophos UTM was installed on, (Home license)  and I could use your help please.

I have been running a Home license UTM for many years now on a Dell machine running an i3-4130 CPU, 8GB Ram and a 100 GB 2.5" HDD

Everything was being smooth, CPU usage usually at 1%, Ram usage about 30%, no problem whatsoever

Recently I acquired a  decommissioned firewall appliance (more details on this later, if problems are solved I will post it in the Hardware section for compatible devices)

This appliance, as far as I understand, is quite similar to Sophos SG 125 (Intel Atom  c2358). It also has 4GB of Ram and a new 120GB Kingston msata SSD.

So, my Dell machine was installed a few years back and has been over these years upgraded to the latest UTM version (9.703-3 at the moment). For the new appliance, I downloaded the same version, performed a clean installation and booted it with a USB stick containing the latest backup. It came up with all the settings in place, no issues there. I also copied all logs from var/log to the new disk so that I have the logs (I know they are not really necessary but I wanted them just because)

 

The problem is that this new appliance is constantly running at 98% CPU. The below are operating on the UTM:

Model:   ASG Software
 
License ID:   ********
 
Subscriptions:   Base Functionality
Email Protection
Network Protection
Web Protection
Webserver Protection
Wireless Protection
Endpoint AntiVirus
 
Uptime:   0d 7h 10m

I expected that the Atom CPU is not a match to the i3, but I was expecting it to be around 20-30% utilization, not a 100%. So started searching a bit and for starters the problem I saw was 

utm postgres[14402]: [3-1] FATAL: database "reporting" does not exist

I started searching a bit, found about rebuilding the database but it did no good. After digging a bit more, I realized that, when a few years back, posgresql was changed to 64bit, I never converted it on my old UTM to 64bit and the backup file I restored was from a machine running the 32bit version (not sure if this is a problem, though). In any case I found instructions to create /var/log/reporting/pgsql92 and /var/storage/pgsql92 folders, changed permissions, restarted the postgresql service but I still have the above FATAL error

I even thought that I screwed up when I copied the old logs and had to change some permissions, so I performed a reset of the configuration and started again from scratch (using the backup file to restore configuration) with no logs at all. The problem is still there

I also disabled IPS and Web Filtering, but CPU usage stays the same (which is strange..)

The result of all this is that:

  1. CPU is always at 98% (don't know if the usage will drop if/when this is fixed, hope it will). The little sunon 40mm fan is really annoying at this high usage. 
  2. I also got a few emails about RRD cache daemon not running - restarted
  3. Reporting is not working at all
  4. System messages log has grown to about 100MB in just 8-9 hours which is certainly not good (a lot of writes, also smart reports temperature of SSD at 50C)

Any help is greatly appreciated guys! Below is a snapshot of top in case it helps:



This thread was automatically locked due to age.
Parents
  • Welcome Chriz,

     

    you bought that device from me on ebay... :)

    As Bob recommended, try to reinstall it fresh and at first - beforce restoring your backup - try a naked "test"-configuration.

    The RAM ist not the problem I think.

    I have installed many devices with 4GB RAM and they are performing very smooth, also one device with 2GB (!)...but 4GB should be really enough for home usage.

     

    Regards,

    Andy.

  • Hello guys!

    Thanks to both of you for the answers!

     @BAlfson: Geiasou, Bob! I somehow missed the notification for your post..

     Hello Andy! For start thanks again for the pleasant ebay transaction! Yes I know this specific device is more than capable. (I already had one and it was working very well during my tests, but stupid me, I managed to brick it before putting it in actual "production". So I knew beforehand it should work without issues)

    I was actually planning on updating this topic today, because I tried what Bob suggested on Friday the 31st of July, before Bob replied, hehe. Long story short, the device is in "Production" now and as expected, CPU is averaging about 6-7% utilization. But let me elaborate a bit:

    So, Friday afternoon (31/7) I put back my Dell machine to production and got this new appliance to my desk to start working with it..

    I reinstalled from scratch and all was running smoothly. Restored my backup and things continued to run smoothly.

    Then it was the time to restore the logs from my old hardware. (I decided to use a usb stick for this, to speed things up a bit). The whole process was a bit tricky, though, since from what I saw, the UTM will not mount NTFS. So I formatted my usb stick to fat32. I did a tar cvpzf backup.tgz /var/log/, but then I had to split the tzg file in more than one part, since it was about 7GB and fat32 handles up to 4GB files, heh. Did that, copied the backup to the new appliance, concatenated  the parts and restored to the new machine. So far so good..

    I even navigated the logs and was able to see all of them (I have logs from 2016, don't ask why, heh). The graphs were also there and was able to see even the yearly graphs.

    After that, I decided to reboot the machine and this is where problems started again...

    After the reboot, I noticed 2 things: 1.CPU was again over 90% and 2.Data partition (it is about 40GB) started rapidly filling up

    I brought the system log and I was greeted with constant messages that the database is in recovery mode (!!!)

    I patiently waited some time and database got out of recovery mode but then I started getting continuous messages like : FATAL: postgresql Could not read critical index 2662 

    I tried to gracefully stop postgresql but it would not stop.. After many retries it did stop. Started it again and the FATAL message was still there.

    So I decided to rebuild the database. The first two tries failed when trying to stop postgresql but the third time was the lucky one and the process completed.

    The CPU utilization dropped to normal again. I rebooted the utm, and it got up again and all was running normal.

    Left it running for an hour or so and then I put it in production where it is running nicely since then. A current screenshot:

     

      

     A weekly CPU usage report: Before Friday the usage is actually from the Dell machine

     

    So I am guessing that something in the restored log files was causing the UTM to crap out (I am guessing the contents of /var/log/reporting? Perhaps the files restored there was making the database trying to create new data and it was simply too much for it? Don't know)

    The important thing is that all is running smoothly right now

    To be honest, I was almost 100% sure from the beginning that the problem was being caused by the fact that I restored the logs. I say almost because there was a small possibility the my new kingston msata disk was acting up, but it turns out it is not

     

    <offtopic> @Andy: The addition of the fan was a nice touch, really handy in my case since the device is enclosed in a cabinet with minimal airflow. While it is not loud, however, it is making a distinctive annoying noise, so I thing I will replace it. (The others can't hear it, but I can - I am a weirdo, what can I say... )

    Since I am too lazy to shut it down, remove it from the cabinet, unscrew it and check, do you, by any chance, remember if the sunon fan is 12v? Since I have a second (bricked) f18 on hand, which has no fan, I think I am going to get a Noctua NF-A4x20 PWM, use some double sided tape to place the fan and transplant the motherboard from your device (Since the sunon  fan is glued, I figured  it is better to leave it as is and use the other F18 case that I have). I also have some 10000rpm 40mm Delta Electronics fan laying around. I might use them first, before the Noctua one, so that I can prank wife and kids for a few hours hehe</offtopic>

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Reply
  • Hello guys!

    Thanks to both of you for the answers!

     @BAlfson: Geiasou, Bob! I somehow missed the notification for your post..

     Hello Andy! For start thanks again for the pleasant ebay transaction! Yes I know this specific device is more than capable. (I already had one and it was working very well during my tests, but stupid me, I managed to brick it before putting it in actual "production". So I knew beforehand it should work without issues)

    I was actually planning on updating this topic today, because I tried what Bob suggested on Friday the 31st of July, before Bob replied, hehe. Long story short, the device is in "Production" now and as expected, CPU is averaging about 6-7% utilization. But let me elaborate a bit:

    So, Friday afternoon (31/7) I put back my Dell machine to production and got this new appliance to my desk to start working with it..

    I reinstalled from scratch and all was running smoothly. Restored my backup and things continued to run smoothly.

    Then it was the time to restore the logs from my old hardware. (I decided to use a usb stick for this, to speed things up a bit). The whole process was a bit tricky, though, since from what I saw, the UTM will not mount NTFS. So I formatted my usb stick to fat32. I did a tar cvpzf backup.tgz /var/log/, but then I had to split the tzg file in more than one part, since it was about 7GB and fat32 handles up to 4GB files, heh. Did that, copied the backup to the new appliance, concatenated  the parts and restored to the new machine. So far so good..

    I even navigated the logs and was able to see all of them (I have logs from 2016, don't ask why, heh). The graphs were also there and was able to see even the yearly graphs.

    After that, I decided to reboot the machine and this is where problems started again...

    After the reboot, I noticed 2 things: 1.CPU was again over 90% and 2.Data partition (it is about 40GB) started rapidly filling up

    I brought the system log and I was greeted with constant messages that the database is in recovery mode (!!!)

    I patiently waited some time and database got out of recovery mode but then I started getting continuous messages like : FATAL: postgresql Could not read critical index 2662 

    I tried to gracefully stop postgresql but it would not stop.. After many retries it did stop. Started it again and the FATAL message was still there.

    So I decided to rebuild the database. The first two tries failed when trying to stop postgresql but the third time was the lucky one and the process completed.

    The CPU utilization dropped to normal again. I rebooted the utm, and it got up again and all was running normal.

    Left it running for an hour or so and then I put it in production where it is running nicely since then. A current screenshot:

     

      

     A weekly CPU usage report: Before Friday the usage is actually from the Dell machine

     

    So I am guessing that something in the restored log files was causing the UTM to crap out (I am guessing the contents of /var/log/reporting? Perhaps the files restored there was making the database trying to create new data and it was simply too much for it? Don't know)

    The important thing is that all is running smoothly right now

    To be honest, I was almost 100% sure from the beginning that the problem was being caused by the fact that I restored the logs. I say almost because there was a small possibility the my new kingston msata disk was acting up, but it turns out it is not

     

    <offtopic> @Andy: The addition of the fan was a nice touch, really handy in my case since the device is enclosed in a cabinet with minimal airflow. While it is not loud, however, it is making a distinctive annoying noise, so I thing I will replace it. (The others can't hear it, but I can - I am a weirdo, what can I say... )

    Since I am too lazy to shut it down, remove it from the cabinet, unscrew it and check, do you, by any chance, remember if the sunon fan is 12v? Since I have a second (bricked) f18 on hand, which has no fan, I think I am going to get a Noctua NF-A4x20 PWM, use some double sided tape to place the fan and transplant the motherboard from your device (Since the sunon  fan is glued, I figured  it is better to leave it as is and use the other F18 case that I have). I also have some 10000rpm 40mm Delta Electronics fan laying around. I might use them first, before the Noctua one, so that I can prank wife and kids for a few hours hehe</offtopic>

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Children
  • Nice to read that the CPU utilization problem is solved now.

    I also tried to recover log files from an old installation into a new and I got also problems then...so for me I decided not to recover them in a reinstallation after massive problems before.... ;)

     

    <offtopic> @Andy: The addition of the fan was a nice touch, really handy in my case since the device is enclosed in a cabinet with minimal airflow. While it is not loud, however, it is making a distinctive annoying noise, so I thing I will replace it. (The others can't hear it, but I can - I am a weirdo, what can I say... )

    Since I am too lazy to shut it down, remove it from the cabinet, unscrew it and check, do you, by any chance, remember if the sunon fan is 12v? Since I have a second (bricked) f18 on hand, which has no fan, I think I am going to get a Noctua NF-A4x20 PWM, use some double sided tape to place the fan and transplant the motherboard from your device (Since the sunon  fan is glued, I figured  it is better to leave it as is and use the other F18 case that I have). </offtopic>

     

    Sorry, I used that fan because it is very silent in my opinion and also the brand SUNON - it will never fail (nearly) - also this was the reason for me to use a glue instead of screwing around with no fitting possibility /holes. But I could imagine it should be possible to use a cutter knife - because yes, it is a very strong glue. ;)

    In your case, if you have another bricked one and you could use the case for it, yes, you can change it - less work - and replace it with a noctua fan with 12V. The SUNON also is supplied by 12V voltage. In any case I recommend to use a fan - I don´t know what Barracuda was thinking while designing that device, but durability was not the goal. :)

     

    Please tell me the temperatures and result after finnishing your project.

     

    Regards,

    Andy.

  • Hello again, Andy!

    There is absolutely nothing to be sorry about!!! Especially in my case, the fan is really useful. And the fan itself is not really loud. I hear a bzzzz almost like an electrical noise that annoys me (again, the others don't mind/hear it, I am the weirdo one hehe)

    Imagine that my Dell machine, when it was getting hot inside that cabinet (fortunately this was happening only on really hot days with temps over 30-35C), it was actually making a loud noise because the fan was reaching high rpm. That noise, although could not be heard much since the cabinet was closed, was much louder than the noise I get from the Sunon fan. However this specific noise from the Sunon is more annoying to me 

    Yes I do have another F18, unfortunately.. While it was working fine, I decided to use a 4-pin to SATA power adapter with a  SATA cable and an old SSD I had laying around just to verify that the appliance was recognizing the SATA SSD without issues. I plugged them in, plugged the power cable and the appliance started beeping continuously and it won't turn on any leds, won't do anything. After a few days I tried to use that same SSD in a usb case and discovered it was fried (it even smelled a bit). Now I am not sure if something shorted on the appliance that caused the SSD to fry, or the SSD was fried and caused the appliance to die, too. A visual inspection of the appliance's motherboard show nothing obvious and there are no schematics available anywhere, so even the usage of a multimeter is worthless, since I don't know what I am looking for. I hope that I will eventually find someone who can identify and fix the issue, but until now, in a store I asked, they wanted around €120 to fix it (if they can, which is a big if). And since I got that other F18 for around €75, it is not worth paying almost double to repair it..

    This is my last week working, then I will be on family vacation, so most probably I will deal with it afterwards, around the last week of August. (It really depends on my free time, hehe)

    As it sits right now, the appliance, even being inside a closed cabinet with extremely limited air circulation, is at about 35C. I don't think I will get much lower than that with noctua, but I do hope that it will be a bit quieter. In any case I will update this topic/ let you know!

     P.S: The fact it does not originally have a fan is not very weird since this atom is a 7w CPU and these devices are meant to be inside a Computer Room. I actually have a few Checkpoint appliances deployed to some branches at work which are also fanless and their CPU is a bit more power hangry (atom c3558), but since they are in cooled environments they get no hotter than 35C     

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)