I have a SG-330 running latest firmware 9.601-5. I have a second SG-330, new in box. I am looking at implementing HA (failover, not active-active).
First, are there any downsides of doing HA? Or would I be better to leave the spare in the box (guessing not, but...).
Should I fire up the spare HA, not connected to anything first? To burn in?
To do HA, do I just connect the second box interaces into the network? (I have five interfaces connected, one of which is a trunk going to a Cisco switch doing VLANs,) Then connect the eth3 HA interface between the boxes, and fire up the spare box? Is it that simple?
The spare box hasn't had any updates done to it - is that a part of the HA process? Or should it be updated first?
How long does the process take? Minutes? Hours?
Any best practices or rulz to follow?
Alex's prescription is the right one. I made the following "cheat sheet" for one of my customers:
1. If needed, do a quick, temporary install so that the new device can download Up2Dates.2. Apply the desired Up2Dates (if possible, stop at 9.605 today (changed 2019-10-01)), do a factory reset and shutdown.3. On the current UTM in use, on the 'Configuration' tab of 'High Availability': a. Enable Hot-Standby b. Select eth3 as the Sync NIC c. Configure it as Node_1 d. Enter an encryption key (I've never found a need to remember it) e. Select 'Enable automatic configuration of new devices' f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'4. Cable eth3 to eth3 on the new device.5. Cable all of the other NICs exactly as they are on the original UTM.6. Power up the new device and wait for the good news. [;)]
Cheers - Bob
I've got the second SG-330 connected to a PC and to an unused static IP on our external network connection. When I powered up the box, it shows it's on firmware 9.308.
After connecting to an internet connection, the dashboard shows 41 updates available. When I go to Up2Date, it says the firmware is up to date and no downloads available. I tried both manually, and automatically (letting it sit for several days). I've tried rebooting the box, no change.
From the second SG, I can ping www.google.com, etc. So appears the internet connection and DNS are both working.
Thanks. The current production box is on 9.601-5. I downloaded ssi-9.510-5.1 which was the previous one on the web site. I'll try that tomorrow.
Then should I try and update or let the HA process take care of that?
It should take care of it, John, but I would have just gotten the 9.601 ISO.
The download site has ssi-9.601-5.1, and the online box shows 9.601-5 (without the ".1"). Didn't know if that made any difference or not, or if the version just doesn't show the ".1".
You could take the 9.601-5.1 from the download site. It's the same version.
Well, best laid plans, … Was working on implementing SA, and got busy with other stuff.
Decided this week would be the time. I got on what is going to be the backup box, reset to factory with the control panel, rebooted and connected.
Asked some setup info, then got to a "your license has expired", and wanted a license key. Tried a factory reset again, and same thing.
I had already updated the backup box to 9.601. The operational box is on 9.605.
Will this present issues? Or just hook the backup box up, connect the HA ports between the two boxes, and go into the on-line box and set it for HA, and let it rip?
I contacted Sophos tech support and they said during the configuration process to not update the backup box, let them come up, then update the backup box? I thought that if they were on different version, they would update to the same version?
why just don't follow the steps described above?There is no need to connect to the "slave box". Follow the steps of Bob and you'll succeed.
Well, the HA now appears to be working. But not without several calls to Sophos support, two remote sessions - one of which was over 1-1/2 hours. And multiple hours, after hours.
Also, per Sophos, the procedure to implement HA seems to be different as shown in this kb
This KB has the steps, which, in addition to above, you have to log onto the slave box and configure HA on it before things will work. And to be able to get onto it, the box has to have a valid license.
Before seeing this KB, I tried all sorts of things to get it to work following the sage advice of those on this group. After updating to the latest firmware, when attempting to set up HA, the slave box would attempt to sync up, then throw an error, set itself back to factory default and power off.
Then after the online chat with Sophos, they said try a different port for the HA. Same results.
Another call to Sophos, and they emailed back with the KB with the different procedure, and tried that. Set up the slave box so I could log on, and it said the license was expired, even though it had not expired. After 5 or more attempts to get it to take the license, it finally took.
I set up the slave box per the instructions (no change needed to the master), applied, and then plugged in all the cables, identical to the master, along with the HA connection.
It tried to come up, however on the status screen on the master, it showed the slave as being on 9.602 (I think). The master was opn 9.605. Previously I had updated the slave to the same version as the master. And the lcd display on the slave box showed 9.605.
On the status it showed the slave as updating, but stuck there. After letting it run 45+ minutes, I again called Sophos support. After listening to happy music for 18 minutes, got someone. He did a remote session, then I got him shell access.
He went through the boxes, initially didn't see anything. We rebooted the slave box, came back up, and still trying to update.
He then connected to the slave device, and after all kinds of looking around, found the database on it was corrupted. He went through a bunch of PostgreSQL commands, deleted the database and rebuilt it.
Rebooted again, and it showed the slave on 9.605, and syncing. After a few minutes it showed all as OK.
As a note, I had a constant ping to the master box, and to two external sites, one on the internet, and one through a VPN, and through the entire process, none ever dropped.
This last call was nearly 2 hourss. And the person knew what he had to do. Nothing I would ever even think about doing.
Don't know if this was an anomaly or what.
Some comments on Sophos tech support:
I tried to submit a help request through the Sophos web page. It would never work. Kept coming back to the page where you put in your user info.
I tried the online chat, but that was of little use.
On the phone, I was dropped three different times. The music on hold would stop, and thought someone was coming on, but just silence. On the first call, when the license was expired, they transferred me to licensing, and after several minutes of music, silence. Called back. pressed buttons, got to licensing, and they were very helpful and sent me a license file, after probably 10 minutes on hold.
When I did get through on the phone, once, took 15 minutes, another 18 minutes.
The persons helping me, the three times I called, were in Asia/Pacific.
After the first call, I got an initial email back saying the case was being handled by the asia/pacific region and their work times were 3 pm - 11 pm M-F. And asked if I wanted the case transferred to a different region. Because I didn't want to impact users during the work day, I wanted to do this after 4:30 pm central.
Now, particularly the second person last Friday afternoon, knew what he was doing, and was in the asia/pacific region.
John, did you complete the survey about your experience with Support, or should we link upper management to your comments below?
Cheers - BobPS I understand that there are now some SGs being delivered with 'Automatic configuration' not selected by default, but that hasn't been the case with 330s, or did you see that it wasn't selected when you firs looked? I suspect your entire problem was the broken PostgreSQL database.
Hi oh learned one.
I did complete the survey. The last person certainly knew his stuff. But getting through to Sophos was a pain. And, possibly because it was not a flashing red light emergency, they parked it with one physical area and one time zone.
But, was disconcerting that the on-line "open a ticket" didn't work, and tried with different browsers. And got dropped multiple times on the phone.
When I was doing the updates on the slave box, quite honestly didn't look at HA. In retrospect, should have. But, oh well.
Have had HA running fine for several months.
Yesterday did the first firmware update on the HA setup.
Went very smooth. I had a constant ping to the internal IP of the HA cluster, and to 126.96.36.199,
It did the update to the standby box, then rebooted it, and then switched traffic over to the second box. It missed two pings to the internal address and three to the 188.8.131.52
Then updated the former main box, rebooted it, synced things and put it into the standby mode.
So, doing updates with the HA went very smooth.