SG-330 High Availability

Hi All:

I have a SG-330 running latest firmware 9.601-5. I have a second SG-330, new in box. I am looking at implementing HA (failover, not active-active).

First, are there any downsides of doing HA? Or would I be better to leave the spare in the box (guessing not, but...).

Should I fire up the spare HA, not connected to anything first? To burn in?

To do HA, do I just connect the second box interaces into the network? (I have five interfaces connected, one of which is a trunk going to a Cisco switch doing VLANs,) Then connect the eth3 HA interface  between the boxes, and fire up the spare box? Is it that simple?

The spare box hasn't had any updates done to it - is that a part of the HA process? Or should it be updated first?

How long does the process take? Minutes? Hours?

Any best practices or rulz to follow?

Thanks !

John S.

  • Hi John,

    as you described it is simple. Connect the cables and fire up the box.

    Two things you have to care for.

    First you have to make sure HA is configured, it is under Management > High Availability > Configuration. Enable configuration of new devices.

    Second make sure the version of the new box is not too far away from the existing one.

    The updates will be automatically handled by the UTM, but if the box is on an very old release it’s better to start with a new iso.

    After powering on a sync process is starting and then you got a HA setup, active/passive or hot standby. Don’t know witch wording is preferred by Sophos.

    If something goes wrong the HA won’t be activated and you could solve the problem.

    The sync process doesn’t take hours, but a couple of minutes.

    Give it a try. And the community is a very good place for questions.

    Best regards

    Alex

  • Alex's prescription is the right one.  I made the following "cheat sheet" for one of my customers:

    1. If needed, do a quick, temporary install so that the new device can download Up2Dates.
    2. Apply the desired Up2Dates (if possible, stop at 9.605 today (changed 2019-10-01)), do a factory reset and shutdown.
    3. On the current UTM in use, on the 'Configuration' tab of 'High Availability':
       a. Enable Hot-Standby
       b. Select eth3 as the Sync NIC
       c. Configure it as Node_1
       d. Enter an encryption key (I've never found a need to remember it)
       e. Select 'Enable automatic configuration of new devices'
       f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
    4. Cable eth3 to eth3 on the new device.
    5. Cable all of the other NICs exactly as they are on the original UTM.
    6. Power up the new device and wait for the good news. Wink

    Cheers - Bob

  • In reply to BAlfson:

    Thanks.

     

    I've got the second SG-330 connected to a PC and to an unused static IP on our external network connection. When I powered up the box, it shows it's on firmware 9.308.

    After connecting to an internet connection, the dashboard shows 41 updates available. When I go to Up2Date, it says the firmware is up to date and no downloads available. I tried both manually, and automatically (letting it sit for several days). I've tried rebooting the box, no change.

    From the second SG, I can ping www.google.com, etc. So appears the internet connection and DNS are both working.

     

    Ideas?

    Thanks,

     

    John S.

  • In reply to jskain:

    Instead of Up2Dating from that far back, John, just go to UTM Support Downloads and download the appropriate ssi (hardware) ISO and use that to re-image the device.  Remember that you don't want to have the new 330 at a newer version than your existing box, so you may need to let it download Up2Dates from 9.415.

    Cheers - Bob

  • In reply to BAlfson:

    Thanks. The current production box is on 9.601-5. I downloaded ssi-9.510-5.1 which was the previous one on the web site. I'll try that tomorrow.

     

    Then should I try and update or let the HA process take care of that?

     

    Thanks.

    John S.

  • In reply to jskain:

    It should take care of it, John, but I would have just gotten the 9.601 ISO.

    Cheers - Bob

  • In reply to BAlfson:

    The download site has ssi-9.601-5.1, and the online box shows 9.601-5 (without the ".1"). Didn't know if that made any difference or not, or if the version just doesn't show the ".1".

  • In reply to jskain:

    You could take the 9.601-5.1 from the download site. It's the same version.

    BR
    Alex

  • In reply to Alexander Busch:

    Hi:

    Well, best laid plans, …  Was working on implementing SA, and got busy with other stuff.

    Decided this week would be the time. I got on what is going to be the backup box, reset to factory with the control panel, rebooted and connected.

    Asked some setup info, then got to a "your license has expired", and wanted a license key. Tried a factory reset again, and same thing.

    I had already updated the backup box to 9.601. The operational box is on 9.605.

    Will this present issues? Or just hook the backup box up, connect the HA ports between the two boxes, and go into the on-line box and set it for HA, and let it rip?

     

    I contacted Sophos tech support and they said during the configuration process to not update the backup box, let them come up, then update the backup box? I thought that if they were on different version, they would update to the same version?

     Ideas, suggestions?

     

    Thanks,

     

    John S.

  • In reply to jskain:

    Hi John,

    why just don't follow the steps described above?
    There is no need to connect to the "slave box". Follow the steps of Bob and you'll succeed.

    Best regards

    Alex

  • In reply to Alexander Busch:

    Well, the HA now appears to be working. But not without several calls to Sophos support, two remote sessions - one of which was over 1-1/2 hours. And multiple hours, after hours. 

    Also, per Sophos, the procedure to implement HA seems to be different as shown in this kb

    community.sophos.com/.../133642  

    This KB has the steps, which, in addition to above, you have to log onto the slave box and configure HA on it before things will work. And to be able to get onto it, the box has to have a valid license. 

    Before seeing this KB, I tried all sorts of things to get it to work following the sage advice of those on this group. After updating to the latest firmware, when attempting to set up HA, the slave box would attempt to sync up, then throw an error, set itself back to factory default and power off. 

    Then after the online chat with Sophos, they said try a different port for the HA. Same results.

    Another call to Sophos, and they emailed back with the KB with the different procedure, and tried that. Set up the slave box so I could log on, and it said the license was expired, even though it had not expired. After 5 or more attempts to get it to take the license, it finally took. 

    I set up the slave box per the instructions (no change needed to the master), applied, and then plugged in all the cables, identical to the master, along with the HA connection.

    It tried to come up, however on the status screen on the master, it showed the slave as being on 9.602 (I think). The master was opn 9.605. Previously I had updated the slave to the same version as the master. And the lcd display on the slave box showed 9.605.

    On the status it showed the slave as updating, but stuck there. After letting it run 45+ minutes, I again called Sophos support. After listening to happy music for 18 minutes, got someone. He did a remote session, then I got him shell access.

    He went through the boxes, initially didn't see anything. We rebooted the slave box, came back up, and still trying to update. 

    He then connected to the slave device, and after all kinds of looking around, found the database on it was corrupted. He went through a bunch of PostgreSQL commands, deleted the database and rebuilt it. 

    Rebooted again, and it showed the slave on 9.605, and syncing. After a few minutes it showed all as OK.

    As a note, I had a constant ping to the master box, and to two external sites, one on the internet, and one through a VPN, and through the entire process, none ever dropped. 

    This last call was nearly 2 hourss. And the person knew what he had to do. Nothing I would ever even think about doing. 

    Don't know if this was an anomaly or what. 

    Anyway. 

    Some comments on Sophos tech support:

    I tried to submit a help request through the Sophos web page. It would never work. Kept coming back to the page where you put in your user info.

    I tried the online chat, but that was of little use. 

    On the phone, I was dropped three different times. The music on hold would stop, and thought someone was coming on, but just silence. On the first call, when the license was expired, they transferred me to licensing, and after several minutes of music, silence. Called back. pressed buttons, got to licensing, and they were very helpful and sent me a license file, after probably 10 minutes on hold. 

    When I did get through on the phone, once, took 15 minutes, another 18 minutes.

    The persons helping me, the three times I called, were in Asia/Pacific.

    After the first call, I got an initial email back saying the case was being handled by the asia/pacific region and their work times were 3 pm - 11 pm M-F. And asked if I wanted the case transferred to a different region. Because I didn't want to impact users during the work day, I wanted to do this after 4:30 pm central. 

    Now, particularly the second person last Friday afternoon, knew what he was doing, and was in the asia/pacific region. 

     

    John S. 

  • In reply to jskain:

    John, did you complete the survey about your experience with Support, or should we link upper management to your comments below?

    Cheers - Bob
    PS I understand that there are now some SGs being delivered with 'Automatic configuration' not selected by default, but that hasn't been the case with 330s, or did you see that it wasn't selected when you firs looked?  I suspect your entire problem was the broken PostgreSQL database.

  • In reply to BAlfson:

    Hi oh learned one.

     

    I did complete the survey. The last person certainly knew his stuff. But getting through to Sophos was a pain. And, possibly because it was not a flashing red light emergency, they parked it with one physical area and one time zone. 

    But, was disconcerting that the on-line "open a ticket" didn't work, and tried with different browsers. And got dropped multiple times on the phone.

    When I was doing the updates on the slave box, quite honestly didn't look at HA. In retrospect, should have. But, oh well. 

    John S.