Hello,
First of all, i wish you a Happy new year !
I'm writing to you because our customer has a Sophos UTM firewall in version 9.7 for almost 2 years.
Our customer claims to have tested the Slave firewall in the meantime and it was working fine.
However, today and for some time the HA SLAVE firewall remains in "UNLINKED" status. I am unable to determine how long it has been like this as the logs are limited to 30 days due to another log storage problem that was full (We did something about this firewall 3 months ago when we tried to rebuild the DB is maybe related?).
In any case, the MASTER and SLAVE firewalls have already been restarted several times.
Here are the references I can give you:
- Firmware of both UTMs: 9.707-5
- Last big update: more than 1.5 years ago
Here are the actions that have been taken:
- Restart of the two firewalls -> NOK
- Checked the cabling -> They have not touched anything for a long time
- Verification of the HA link, it goes up physically in the CLI (ETH3)
customername:/home/login # ip link show 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 7c:5a:1c:48:cb:53 brd ff:ff:ff:ff:ff
- Verification that I am pinging the other firewall on the HA link:
customername:/home/login # ping 198.19.250.2 PING 198.19.250.2 (198.19.250.2) 56(84) bytes of data. 64 bytes from 198.19.250.2: icmp_seq=1 ttl=64 time=0.155 ms 64 bytes from 198.19.250.2: icmp_seq=2 ttl=64 time=0.156 ms 64 bytes from 198.19.250.2: icmp_seq=3 ttl=64 time=0.246 ms ^C --- 198.19.250.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 0.155/0.185/0.246/0.045 ms
- Checking the HA logs:
2022:01:25-14:24:12 customername-1 ha_daemon[4979]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 251 12.413" name="Executing (nowait) /etc/init.d/ha_mode check" 2022:01:25-14:24:12 customername ha_mode[3002]: calling check 2022:01:25-14:24:12 customername ha_mode[3002]: check: waiting for last ha_mode done 2022:01:25-14:24:12 customername ha_mode[3002]: check_ha() role=MASTER, status=ACTIVE 2022:01:25-14:24:12 customername ha_mode[3002]: check done (started at 14:24:12) 2022:01:25-14:24:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2022:01:25-14:27:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2022:01:25-14:29:16 customername repctl[5075]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2022:01:25-14:32:25 customername2 repctl[4464]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 158 12.023" name="Executing (wait) /usr/local/bin/confd-setha mode slave" 2022:01:25-14:34:12 customername2 ha_daemon[4416]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 159 12.117" name="Executing (nowait) /etc/init.d/ha_mode check" 2022:01:25-14:34:12 customername2 ha_mode[23854]: calling check 2022:01:25-14:34:12 customername2 ha_mode[23854]: check: waiting for last ha_mode done 2022:01:25-14:34:12 customername2 ha_mode[23854]: check_ha() role=SLAVE, status=UNLINKED
- Checking the status of the HA service :
customername:/home/login # service ha_mode status ha_mode[12752]: calling status ha_mode[12752]: Missing HA_* variables, not called by ha_daemon? (exit 2)
- Test to disconnect the HA on 25.01.2022 between 12h40 and 12h44 -> Nothing comes up, no VPN, no access to the portal by the public IP...
Can you help me to solve this very worrying problem because it means that if there is a problem, the customer has no more production.
Best regards,
Raphaëlle
This thread was automatically locked due to age.