This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

New RMA unit in HA pair

We've just received an RMA replacement for a SG430 in a HA pair , everything I've read says just remove old unit , connect all interfaces and power on then HA will automatically resync both units etc.

Done that but new unit not being recognised /syncing and only showing eth0 as physically up , spoke to Sophos support who stipulate the new RMA unit needs HA configuring regardless ?

There is a difference in firmware , current unit 9.501 - RMA unit 9.610 , so guess we need to update current to match but do we then still need to configure new for HA or should it just re-sync as suggested ?

Any advise would be welcome.



This thread was automatically locked due to age.
Parents
  • Gents,

    I am curious about what happened, and gloriously - I am awaiting a new UTM to replace a dead one in an HA config so I can try and reproduce what I think happened here in a couple of days.  

    As GreggB said, it should be just plug it in and stand back and wait for it to work.   The only gotchas I know of are: 

    1. Different firmware version in the new box.  Must fix first. 
    2. Forgetting to delete the dead node from the running master.  Must remember next time.  
    3. Occasionally, database corruptions on the master cause the syncing of the new node for ever and ever and ever.  But this is so rare that I only check for this if the syncing never stops.  Then follow the process for fixing corrupt databases in an HA cluster.  Stop repctl on both.  Reset DB on both.   Start repctl on both.  

    I suspect what happened here was the dead node was not deleted from the running master.  BUT configuring the new box as a master will cause a different conversation that goes like this:

    • New Box: Hello I am a master (with no config or logs) submit to my will.  
    • Old Box:  I too am a master (with lots of config and logs we don't want to lose) submit to my will you scoundrel. 
    • Both:  Okay lets do the "who's uptime is the longest" check because the biggest is always better and wins..... 
    • Old Box:  Here is my enormous uptime. 
    • New Box: Dang nabbit, I am slave.   Please configure me, master. 
    • Old Box: I am quietly just going to forget about the dead none thing and replace it with this small-uptime upstart.
    • New Box: Syncing
    • You:  Phew that worked. 

    UPTIME is very very very important - because the master with the longest uptime wins.   Biggest is not always best.  Especially if you chop uptime to zero on the one you want to keep by rebooting it. 

    So when I get the new box, I will (do all the other stuff but) 

    1. I will not delete the dead node. 
    2. I will connect the new box without configuring it.  This should fail because dead node is blocking adding it to the cluster.  
    3. Configure it as master. 
    4. Connect it a second time and see if this overrides the dead node block. 

    Will keep you posted.

    All the best, 

    Adrien. 

Reply
  • Gents,

    I am curious about what happened, and gloriously - I am awaiting a new UTM to replace a dead one in an HA config so I can try and reproduce what I think happened here in a couple of days.  

    As GreggB said, it should be just plug it in and stand back and wait for it to work.   The only gotchas I know of are: 

    1. Different firmware version in the new box.  Must fix first. 
    2. Forgetting to delete the dead node from the running master.  Must remember next time.  
    3. Occasionally, database corruptions on the master cause the syncing of the new node for ever and ever and ever.  But this is so rare that I only check for this if the syncing never stops.  Then follow the process for fixing corrupt databases in an HA cluster.  Stop repctl on both.  Reset DB on both.   Start repctl on both.  

    I suspect what happened here was the dead node was not deleted from the running master.  BUT configuring the new box as a master will cause a different conversation that goes like this:

    • New Box: Hello I am a master (with no config or logs) submit to my will.  
    • Old Box:  I too am a master (with lots of config and logs we don't want to lose) submit to my will you scoundrel. 
    • Both:  Okay lets do the "who's uptime is the longest" check because the biggest is always better and wins..... 
    • Old Box:  Here is my enormous uptime. 
    • New Box: Dang nabbit, I am slave.   Please configure me, master. 
    • Old Box: I am quietly just going to forget about the dead none thing and replace it with this small-uptime upstart.
    • New Box: Syncing
    • You:  Phew that worked. 

    UPTIME is very very very important - because the master with the longest uptime wins.   Biggest is not always best.  Especially if you chop uptime to zero on the one you want to keep by rebooting it. 

    So when I get the new box, I will (do all the other stuff but) 

    1. I will not delete the dead node. 
    2. I will connect the new box without configuring it.  This should fail because dead node is blocking adding it to the cluster.  
    3. Configure it as master. 
    4. Connect it a second time and see if this overrides the dead node block. 

    Will keep you posted.

    All the best, 

    Adrien. 

Children
  • Here's what I give my clients, Adrien.  Note that 3a implies that your malfunctioning UTM should be disconnected.

    1. If needed, do a quick, temporary install so that the new device can download Up2Dates.
    2. Apply the Up2Dates to the same version as the current unit, do a factory reset and shutdown.
    3. On the current UTM in use, on the 'Configuration' tab of 'High Availability':
       a. Disable and then enable Hot-Standby
       b. Select eth3 as the Sync NIC
       c. Configure it as Node_1
       d. Enter an encryption key (I've never found a need to remember it)
       e. Select 'Enable automatic configuration of new devices'
       f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
    4. Cable eth3 to eth3 on the new device.
    5. Cable all of the other NICs exactly as they are on the original UTM.
    6. Power up the new device and wait for the good news. 

     If you did try the approach you mentioned above, please let us know how it went.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA