EAP 3: Quick HA problems

Hi all,

Have two SG 210 rev. 3 devices.

Waited for EAP3 to test the new HA version (QuickHA)

My primary device has been upgraded from v17 to v18 EAP1 --> EAP 2 and now, successfully EAP 3

My secondary device was installed with EAP 3 ISO.

 

Did setup both, with Quick HA, everything went fine and "green", did a failover to secondary device, all went well and primary rebooted.

 

But..never came back up.

 

Looking in CTsyndc.log on seconday device, all sync are successfull, except for dhcp6.lease file, as I do not use IPv6 dhvcp server.

 

Also LCD in old primary is just blank with a "-" sign in the top line.

 

Attached a screen, and it went into XG Failsafe.

 

The new master (The secondary), have Dead heartbeat service, also UI i VERY slow, CPU is just 6%.

 

I am now formatting both SG210 with EAP 3 ISO, restore from backup (without HA), and will try the Quick HA setup again.

i will keep you posted :-)

Parents
  • Another thing:

     

    PortC (E2) is DMZ on SG devices

    It is no possible to change to Port D (E3) which is labeled HA on the front, why?

     

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect

  • Follow up, this time I rebooted the master and after 6-10 minutes, the HA was synced again, thus the new master now shows this:

    And the old master this:

     

    Of course they are down, as the slave shall not handle theese.

     

    Are there som forgotten checks that are not done, after a master turns into slave?

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect

  • Now I pressed the "Switch to Auxiliary device"

    The master rebooted, the old master took over again, but now i very slow, takes 5-10 secs for ssh commands:

     

    SG210_WP03_SFOS 18.0.0 EAP3# tail -f ctsyncd.log
    [Fri Dec 20 08:46:17 2019] (pid=2507) [ERROR] no dedicated links available!
    [Fri Dec 20 08:46:17 2019] (pid=2507) [ERROR] no dedicated links available!
    [Fri Dec 20 08:52:40 2019] (pid=2507) [notice] committing all external caches
    [Fri Dec 20 08:52:40 2019] (pid=2507) [notice] Committed 56 new entries
    [Fri Dec 20 08:52:40 2019] (pid=2507) [notice] commit has taken 0.009839 seconds
    [Fri Dec 20 08:52:40 2019] (pid=2507) [notice] flushing external cache
    [Fri Dec 20 08:52:43 2019] (pid=2507) [ERROR] no dedicated links available!
    [Fri Dec 20 08:52:43 2019] (pid=2507) [ERROR] no dedicated links available!
    [Fri Dec 20 08:52:47 2019] (pid=2507) [ERROR] no dedicated links available!
    [Fri Dec 20 08:53:09 2019] (pid=2507) [ERROR] no dedicated links available!
    ^C
    SG210_WP03_SFOS 18.0.0 EAP3# tail -f msync.log
    Fri Dec 20 08:57:24 2019:870024:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:57:24 2019:870066:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.
    Fri Dec 20 08:57:40 2019:468113:1452:BACK:MAST:INFO:vrrp.c:1111 no event set for event: MAST
    Fri Dec 20 08:57:40 2019:468131:1452:BACK:MAST:INFO:vrrp.c:1119 flags 2e event tracking stopped for last 5 minutes!!!(GTM:BACK)
    Fri Dec 20 08:57:44 2019:889281:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:57:44 2019:889300:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.
    Fri Dec 20 08:58:04 2019:907287:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:58:04 2019:907308:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.
    Fri Dec 20 08:58:24 2019:923398:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:58:24 2019:923419:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.
    Fri Dec 20 08:58:40 2019:282294:1452:BACK:MAST:INFO:vrrp.c:1111 no event set for event: MAST
    Fri Dec 20 08:58:40 2019:282313:1452:BACK:MAST:INFO:vrrp.c:1119 flags 2e event tracking stopped for last 6 minutes!!!(GTM:BACK)
    Fri Dec 20 08:58:44 2019:941532:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:58:44 2019:941551:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.
    Fri Dec 20 08:59:04 2019:957527:1418:BACK:MAST:DEBUG:worker.c:587 idle workers 10.
    Fri Dec 20 08:59:04 2019:957545:1418:BACK:MAST:DEBUG:worker.c:628 worker_num 14.

     

    The old master is now dead, as it happened firstly in this thread.

     

    I think the new HA is very unstable, with UTM this was a piece of cake to make it work #zeroconfrules

     

    Any others have tried it out? Swithing back and forth?

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect

  • The broken slave shows this on screen:

     

    Booting '18_0_0_255'

    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata: please update microcode to version: 0xb2 (or later)
    Password:

     

    But wasn't this not a bug in EAP2, and was supposed to be fixed in EAP 3?

     

    https://community.sophos.com/products/xg-firewall/sfos-eap/sfos-v18-early-access-program/f/feedback-and-issues/117071/firmware-bug-on-xg210

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect

Reply Children