This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

No failover on HA Cluster if Nic is in Error-State

Hello,

we are Using two SG 310 as a active / passive cluster.

Unfortunately we experienced that HA is not working.

Approximately once in a month we are loosing internet-connection.

The external WAN-Interface shows "ERROR" under "Link" and "UP" under "State"

Unfortunately the Cluster isn't switching to the other Node and the whole Office is cut from Internet.

If we turn the Interface off and on again, everything is working fine.

Rebooting the affecting Node will work too.

 

Did someone experince a similar behaviour.

We have to sort this problem out and support is not very helpful in this case

Tibor



This thread was automatically locked due to age.
Parents
  • Which cluster state you see.

    If there is a "unlinked" state the interface-down don't result in cluster failover.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • the problem is that ther is no "interface down". Instead there is a "error" state on the WAN Interface.

    cluster is working fine. If i reboot the affected node then  the second node will take over immediately.

  • Often the "error-state" we see, if the speed & duplex settings not the same at router and SG.

    For example ... you could not have "auto" at one side only.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • The true reason for an 'error' state is that it failed the 'uplink monitoring' conditions. 

    Basically the UTM cannot reach whatever hosts on whatever protocols that it uses to check if it has internet access. The 'error' state itself is not a problem but what caused it is. 

    Do you reboot both nodes when this happens or just one? 

Reply
  • The true reason for an 'error' state is that it failed the 'uplink monitoring' conditions. 

    Basically the UTM cannot reach whatever hosts on whatever protocols that it uses to check if it has internet access. The 'error' state itself is not a problem but what caused it is. 

    Do you reboot both nodes when this happens or just one? 

Children