This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problems with RMA-replaced UTM in HA

Backstory: In our remote office, one of two SG310 appliances had broken down (probably for electrical reasons) and would not boot back into the HA cluster.

Thanks to Sophos support, we tried to boot teh firmware from a USB stick, which at least looked like a complete boot according to the LCD display, but we still did not get HA back: The dead node was previously deleted from the master and master was configured to auto-detect new devices, so the freshly installed node should have been detected and brought into HA. As this did not happen, we initiated a replacement through support.

Today, the replacement arrived, we hooked it up and waited - but again nothing happened. After some hours, we went through the display and made a factory reset - no change. Before bothering support once again, I'd like to know if someone here has expereinced someting like this. I feel like we somehow overlooked something totally stupid.

What we do observe (by flipping through the LCD display):

  • Firmware version is 9.705, as is master
  • HA Config is set to "Not a HA device" and apparently this cannot be changed. Not sure if this is expected in this state and would change only after a successful sync.
  • eth0 seems to be set to 192.168.0.1/24 as expected for a new device. However, I cannot ping this address (from a Windows PC configued with an 192.168.0.x address - our standard LAN is in the 10.*.*.* range). It does not even do ARP resolution!
  • The LAN switch sees practically no traffic on the port connected to eth0. It did not even learn a MAC address for that port


This thread was automatically locked due to age.
  • Don't understand the connection problems using port eth0.

    Possible the switch has some specific port configuration...

    Do you connect a PC/Notebook directly to eth0?

    Which port do you use for HA? Autoconfig only works on eth3.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Nothing special about the switch port (and it's the same port used with the former defective appliance). May try Notebook directly (is crossover needed?)

    HA is via eth3 (with eth0 as backup). On master, highavailabilty log also shows "Netlink: Found link beat on eth3 again!" (but also "Monitoring interfaces for link beat: eth1 eth0")

    A tcpdump on eth0 shows only lots of multicasts to udp ports 695, 3780, 501, but nothing coming back ... Maybe I'll try to replace the eth3-eth3 cable

  • you don't need a crossover cable ... but may use one.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Have you gone through the initial configuration on the replacement UTM?  From past experience it won't start syncing until you get past the screen that asks for organization name, location, email etc.

  • No. i think that's not necessary.

    After a factory reset the device should listen on eth3 and start sync automatically.

    ... if firmware matches (maybe a little bit older ... but no newer than the running system)


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Ok, I meanwhile figured that out as well from the initial config guide.

    Unplugging and repluging the eth3-eth3 capble did not help (though it showed lost and re-found link messages in master's log as expected). I postponed any further tests (with a directly connceted laptop, with different cables, with trying different switch ports, etc.) for the moment because our trusted on-site "remote hands" person is currently quarantined and the replacement person that kindly helped us with all attempts so far, well, they may have done everything right according to our directions by phone and checks against photos - but maybe not; and besides that they have their normal job to do primarily. A regular visit is scheduled for next week anyway and will hopefully allow a quick run-through of all pending suggestions

  • We've had similar experience from before we learned how auto-snycing new devices works.

    Unfortunately, even doning the initial configuration was has not been possible yet: IP 192.168.0.1 seems not to respond, and doing the config vie LCD display is not something I'll try by phone directions to only a "layperson" available on-site right now. This may be something we try, however, when one of us admins visits the site next week. As of now and until controlled by someone more qualified, it may very well turn out to be a very facepalm-y fault at work

    I'll keep y'all informed

  • It'll be interesting to hear the end of the story.

    For others that might find this thread, here's what I give to my clients:

    1. If needed, do a quick, temporary install so that the new device can download Up2Dates.
    2. Apply the Up2Dates to the same version as the current unit, do a factory reset and shutdown.
    3. On the current UTM in use, on the 'Configuration' tab of 'High Availability':
        a. Disable and then enable Hot-Standby
        b. Select eth3 as the Sync NIC
        c. Configure it as Node_1
        d. Enter an encryption key (I've never found a need to remember it)
        e. Select 'Enable automatic configuration of new devices'
        f. I prefer to use 'Preferred Master: None' and 'Backup interface: Internal'
    4. Cable eth3 to eth3 on the new device.
    5. Cable all of the other NICs exactly as they are on the original UTM.
    6. Power up the new device and wait for the good news. Wink

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • The story continues:

    Turns out we could not get any further with accessing the UTM, not even via a cross-over cable and laptop 192.168.0.2 -> 192.168.0.1

    Out of curiosity, we tried the USB stick firmware reinstall that was formerly suggested to us by support for the original appliance (see original post). This finally produces something reachable, but also explained why we were out of luck back then: The image turned the appliance into an XG instead of SG and it is hardly surprising that these won't do HA together well.

    Right now, we are trying to bring back an SG image, but that seems to be harder than first thought. Still work in progress.