This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA UNLINKED on Sophos UTM after nothing

Hello,

First of all, i wish you a Happy new year !

I'm writing to you because our customer has a Sophos UTM firewall in version 9.7 for almost 2 years.

Our customer claims to have tested the Slave firewall in the meantime and it was working fine.

However, today and for some time the HA SLAVE firewall remains in "UNLINKED" status. I am unable to determine how long it has been like this as the logs are limited to 30 days due to another log storage problem that was full (We did something about this firewall 3 months ago when we tried to rebuild the DB is maybe related?).

In any case, the MASTER and SLAVE firewalls have already been restarted several times.
Here are the references I can give you:
- Firmware of both UTMs: 9.707-5
- Last big update: more than 1.5 years ago

Here are the actions that have been taken:
- Restart of the two firewalls -> NOK
- Checked the cabling -> They have not touched anything for a long time
- Verification of the HA link, it goes up physically in the CLI (ETH3)

customername:/home/login # ip link show
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 7c:5a:1c:48:cb:53 brd ff:ff:ff:ff:ff

- Verification that I am pinging the other firewall on the HA link:

customername:/home/login # ping 198.19.250.2
PING 198.19.250.2 (198.19.250.2) 56(84) bytes of data.
64 bytes from 198.19.250.2: icmp_seq=1 ttl=64 time=0.155 ms
64 bytes from 198.19.250.2: icmp_seq=2 ttl=64 time=0.156 ms
64 bytes from 198.19.250.2: icmp_seq=3 ttl=64 time=0.246 ms
^C
--- 198.19.250.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.155/0.185/0.246/0.045 ms

- Checking the HA logs:

2022:01:25-14:24:12 customername-1 ha_daemon[4979]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 251 12.413" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:01:25-14:24:12 customername ha_mode[3002]: calling check
2022:01:25-14:24:12 customername ha_mode[3002]: check: waiting for last ha_mode done
2022:01:25-14:24:12 customername ha_mode[3002]: check_ha() role=MASTER, status=ACTIVE
2022:01:25-14:24:12 customername ha_mode[3002]: check done (started at 14:24:12)
2022:01:25-14:24:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:27:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:29:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:32:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 158 12.023" name="Executing (wait) /usr/local/bin/confd-setha mode slave"
2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 159 12.117" name="Executing (nowait) /etc/init.d/ha_mode check"
2022:01:25-14:34:12 customername2 ha_mode[23854]: calling check
2022:01:25-14:34:12 customername2 ha_mode[23854]: check: waiting for last ha_mode done
2022:01:25-14:34:12 customername2 ha_mode[23854]: check_ha() role=SLAVE, status=UNLINKED

- Checking the status of the HA service :

customername:/home/login # service ha_mode status
ha_mode[12752]: calling status
ha_mode[12752]: Missing HA_* variables, not called by ha_daemon? (exit 2)

- Test to disconnect the HA on 25.01.2022 between 12h40 and 12h44 -> Nothing comes up, no VPN, no access to the portal by the public IP...

Can you help me to solve this very worrying problem because it means that if there is a problem, the customer has no more production.

Best regards,
Raphaëlle



This thread was automatically locked due to age.
Parents Reply Children
  • Hi,

    Thanks for your reply.

    Here, you can find the result :

    <M> user@customername:/home/login > ha_utils status
    - Status -----------------------------------------------------------------------
    Current mode: HA MASTER with id 1 in state ACTIVE
    -- Nodes -----------------------------------------------------------------------
    MASTER: 1 Node1 198.19.250.1 9.707005 ACTIVE since Mon Jan 24 08:35:03 2022
    SLAVE: 2 Node2 198.19.250.2 9.707005 UNLINKED since Tue Jan 25 12:49:12 2022
    -- Load ------------------------------------------------------------------------
    Node  1: [1m] 0.31  [5m] 0.30  [15m] 0.32
    Node  2: [1m] 0.00  [5m] 0.04  [15m] 0.06
    
    - Kernel -----------------------------------------------------------------------
    Current mode: enabled master
    interface: eth3
    Local ID: 198.19.250.1
    debug: off
    verbose: off
    ppp sync: off
    port smtp: XXX
    port pop3: XXXX
    port ftp: XXXX
    
    - PostgreSQL ------------------------------------------------------------------------