WAN downtime after LAG group configuration on Sophos SG230

Hi Community,

we had a pretty strange issue with out WAN connection yesterday.

We configured a new LAG on our Sophos SG230.
After hitting the save button within the LAG configuration suddenly our WAN broke down.
And with the WAN a few of our websites that are reachable externally.

The  LAG configuration look like the following:
    



On our switches where the firewall is connected to we also created a lacp config.
We have two ports within the lacp group on our redundant switches that are in a special vlan.

Yesterday we rebooted one of our Sophos SG230 firewalls, after the reboot took place the WAN came up again for a few minutes.
It also seemed like if our Sophos SG230 had some issues with it's WebUI, sometimes it was not reacting to any actions.

The latest firmware version 9.702-1 is installed on our Sophos SG230 cluster.

Please let me know if you have any ideas, hints etc. so we can dig deeper into the root cause analysis.

Many thanks in advance & best regards,
Judith

  • Possible the LACP configuration at the switches isn't correct.

    With Cisco for example we need mode "active".

    Witch switch do you use?

  • In reply to dirkkotte:

    Hello Dirk,

    thanks for the  reply.

    We are using Avaya Switches, former known as Nortel.

    The mode on our switches also is "active", down below I pasted an output of it:

    On another switch where we have the same scenario running the config looks simular.

     

    Regards,

    Judith

  • In reply to jv-1994:

    these are settings ... can you check the LACP status?

    do you connect booth sophos ports with the same switch/stack?

    do you use different LACP channels for every sophos SG unit? (With cisco we got problems, if all 4 LACP ports are within the same LACP channel)

  • In reply to dirkkotte:

    The LACP status just tells that it's active on our Avaya switches.

    We are running a Sophos SG230 cluster, both firewalls that belong to the cluster are in different datacenters.

     

    Each Sophos is connected to a different switch, the switches aren't connected to eachother like in a stack or something simular.

     

    On both switches the two lacp ports do have the same channel number, but since the switches itself aren't connected to each other I guess they shouldn't care about it, correct?

     

    Thanks in advance & BR.

  • In reply to jv-1994:

    Ja, das hört sich alles richtig an.

    Da gehen mir auch langsam die Ideen aus.

    ... Wenn der LACP gerade nicht funktioniert ... hilft das Ziehen eines Kabels, oder das Abschalten eines Ports am Switch?

  • In reply to dirkkotte:

    with lag1 you create a new mac / Port ,..depending on wheres is it going LAN or WAN , spanning tree, mac buffers, routing table have to reorganice

    Neighbors / SpanningTree can go in trouble  

     

    You give "Default GW" to the IF > All Roules have to check, also NAT 

     

    ---

    In cisco LACP is not all - you must also define if access(Vlan1) or trunk(multi VLAN)

     

     

    My switch has:

    interface GigabitEthernet1/0/9
     description dual-link3 LAN-Sophos1
     switchport mode trunk
     channel-protocol lacp
     channel-group 3 mode active
    !
    interface GigabitEthernet1/0/10
     description dual-link3 LAN-Sophos1
     switchport mode trunk
     channel-protocol lacp
     channel-group 3 mode active

    ---

    !
    interface Port-channel3
     description dual-link3 LAN-Sophos1
     switchport mode trunk

     

  • In reply to Joerg-ST:

    Hi there,

     

    sorry for the late reply.

    I checked out the lacp config again on our Avaya switches.

    On the switches we have a MLT and LACP configuration:

    MLT:

    ! *** MLT (Phase 1) ***
    !
    mlt 3 name "FW lag1" disable member NONE
    mlt 3 bpdu single-port
    mlt 3 loadbalance advance

    LACP:

    ! *** LACP (Phase 2) ***
    !
    lacp key 3 mlt-id 3

    lacp port-mode advance
    interface Ethernet ALL
    lacp key port 1/11 3
    lacp key port 1/38 3

    lacp aggregation port 1/11,1/38 enable

    The configuration on the switches looks fine to me.

     

    But we did have to reboot our nodes of our Sophos cluster multiple times due to a few strange things happening.

    Our second node was stuck with the sync to node 1, after the sync was done at some point we did a firmware upgrade to the latest firmware verion on the second node and it was stuck with the upgrade as well.

    Afterwards we manually upgraded the Sophos as described within this article: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/83346/ha-is-stuck-in-up2date

     

    After these issues were fixed, the LAG was working properly somehow - this behaviour of our Sophos cluster was pretty strange to us, it kind of came out of the blue.

  • In reply to jv-1994:

    Hi there,

    just wanted to update this discussion.

     

    This issue was resolved, as it turned out our provider had WAN issues on his side, which affected our WAN.

    Furthermore we also had a defect cable on one of our provider router, which we were informed by our provider.

     

    We also did the following adjustments on our Sophos UTM LAG configuration, as it was recommended to us by an Sophos support engineer:

    https://community.sophos.com/kb/en-us/132399

     

    Thanks for the support & regards,

    Judith