This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA UNLINKED on Sophos UTM after nothing

Hello,

First of all, i wish you a Happy new year !

I'm writing to you because our customer has a Sophos UTM firewall in version 9.7 for almost 2 years.

Our customer claims to have tested the Slave firewall in the meantime and it was working fine.

However, today and for some time the HA SLAVE firewall remains in "UNLINKED" status. I am unable to determine how long it has been like this as the logs are limited to 30 days due to another log storage problem that was full (We did something about this firewall 3 months ago when we tried to rebuild the DB is maybe related?).

In any case, the MASTER and SLAVE firewalls have already been restarted several times.
Here are the references I can give you:
- Firmware of both UTMs: 9.707-5
- Last big update: more than 1.5 years ago

Here are the actions that have been taken:
- Restart of the two firewalls -> NOK
- Checked the cabling -> They have not touched anything for a long time
- Verification of the HA link, it goes up physically in the CLI (ETH3)

customername:/home/login # ip link show
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 7c:5a:1c:48:cb:53 brd ff:ff:ff:ff:ff

- Verification that I am pinging the other firewall on the HA link:

customername:/home/login # ping 198.19.250.2
PING 198.19.250.2 (198.19.250.2) 56(84) bytes of data.
64 bytes from 198.19.250.2: icmp_seq=1 ttl=64 time=0.155 ms
64 bytes from 198.19.250.2: icmp_seq=2 ttl=64 time=0.156 ms
64 bytes from 198.19.250.2: icmp_seq=3 ttl=64 time=0.246 ms
^C
--- 198.19.250.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.155/0.185/0.246/0.045 ms

- Checking the HA logs:

2022:01:25-14:24:12 customername-1 ha_daemon[4979]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 251 12.413" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:01:25-14:24:12 customername ha_mode[3002]: calling check
2022:01:25-14:24:12 customername ha_mode[3002]: check: waiting for last ha_mode done
2022:01:25-14:24:12 customername ha_mode[3002]: check_ha() role=MASTER, status=ACTIVE
2022:01:25-14:24:12 customername ha_mode[3002]: check done (started at 14:24:12)
2022:01:25-14:24:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:27:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:29:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:32:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 158 12.023" name="Executing (wait) /usr/local/bin/confd-setha mode slave"
2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 159 12.117" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:01:25-14:34:12 customername2 ha_mode[23854]: calling check
2022:01:25-14:34:12 customername2 ha_mode[23854]: check: waiting for last ha_mode done
2022:01:25-14:34:12 customername2 ha_mode[23854]: check_ha() role=SLAVE, status=UNLINKED

- Checking the status of the HA service :

customername:/home/login # service ha_mode status
ha_mode[12752]: calling status
ha_mode[12752]: Missing HA_* variables, not called by ha_daemon? (exit 2)

- Test to disconnect the HA on 25.01.2022 between 12h40 and 12h44 -> Nothing comes up, no VPN, no access to the portal by the public IP...

Can you help me to solve this very worrying problem because it means that if there is a problem, the customer has no more production.

Best regards,
Raphaëlle



This thread was automatically locked due to age.
Parents
  • For more informations, HA_mode is enabled on each port : 

    Here is the version of HA : 

    <M> user@customername:/home/login > ha_daemon status
    Sophos HA daemon v2.0.0 (c) Sophos Ltd. 2020/03/23 17:25

    Regards,

    Raphaëlle

  • and at all ports is the link-state up?

    (i configure a "management" port for direct local access sometimes ... here i must disable HA-monitoring)


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Reply
  • and at all ports is the link-state up?

    (i configure a "management" port for direct local access sometimes ... here i must disable HA-monitoring)


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Children
No Data