This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to solve ?? - NUTM-14133 [Cluster, HA] After upgrading from 9.714-4 to 9.715-3 HA breaks

Again the Sophos developers break the Cluster upgrade process.
Is there a simple way to solve the problem?
Possibly as a preventive measure... without SSH?

The effort to solve this is not insignificant...
copy xxxx715003.tgz.gpg to master
scp the file to slave
ssh to slave
copy file to update-directory
install manually
...
Is there a more simple way?

..... Or an intervention by the update-developers?



This thread was automatically locked due to age.
  • Hello Dirk,

    Good day and thanks for reaching out to Sophos Community

    Apologies you have encountered this issue.

    There should a 9.715 firmware respun due to some issues where after updating, HA breaks as described in NUTM-14133

    Could you verify if your firmware package are on the directions following below:

    If updating from 9.7 MR15 (original version) the following .gpg packages can be used: u2d-sys-9.715-9.715-3.4.5.tgz.gpg

    If updating from 9.7 MR14, the new .gpg package: u2d-sys-9.714-9.715-4.4.1.tgz.gpg

    Many thanks for your time and patience and thank you for choosing Sophos.

    Cheers,

    Raphael Alganes
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • Hi,

    i have only the versions after the first installation steps ...

    The device installed first (the old Slave .. current Master) run version 9.715-3

    The current Slave tries to install version 9.714-4 to 9.715-4   

    2023:06:01-14:53:42 kob03fw210-2 audld[27963]: running on HA slave system or cluster node
    2023:06:01-14:53:42 kob03fw210-2 audld[27963]: patch up2date possible
    2023:06:01-14:53:42 kob03fw210-2 audld[27963]: Starting Secured Up2Date Package Downloader
    2023:06:01-14:53:43 kob03fw210-2 audld[27963]: Using static update server list in HA mode
    2023:06:01-14:53:43 kob03fw210-2 audld[27963]: Secured Up2date Authentication
    2023:06:01-14:53:44 kob03fw210-2 audld[27963]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful"
    2023:06:01-14:53:44 kob03fw210-2 audld[27963]: Using static download server list in HA mode
    2023:06:01-14:53:45 kob03fw210-2 audld[27963]: package u2d-sys-9.714004-715004.tgz.gpg needs to be unpacked
    2023:06:01-14:53:45 kob03fw210-2 audld[27963]: id="3707" severity="info" sys="system" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="sys"
    2023:06:01-14:53:45 kob03fw210-2 auisys[28059]: running on HA slave system or cluster node
    2023:06:01-14:53:45 kob03fw210-2 auisys[28059]: running on slave/cluster node, skipping license check
    2023:06:01-14:53:45 kob03fw210-2 auisys[28059]: waiting for db_verify to return (30 seconds max)
    2023:06:01-14:53:46 kob03fw210-2 auisys[28059]: removing '/var/up2date/sys-install'
    2023:06:01-14:53:46 kob03fw210-2 auisys[28059]: Starting Up2Date Package Installer
    2023:06:01-14:53:46 kob03fw210-2 auisys[28059]: version of package '/var/up2date/sys/u2d-sys-9.714004-715004.tgz.gpg' doesn't fit, skipping
    2023:06:01-14:53:46 kob03fw210-2 auisys[28059]: No suitable packages of type <sys> found, skipping
    2023:06:01-14:53:47 kob03fw210-2 auisys[28059]: Up2Date Package Installer finished, exiting
    2023:06:01-14:53:47 kob03fw210-2 auisys[28059]: id="3716" severity="info" sys="system" sub="up2date" name="Up2Date Package Installer finished, exiting"
    2023:06:01-14:54:01 kob03fw210-1 audld[27755]: running on HA master system or cluster node

    and a new big problem .... i am unable to "ha_utils ssh" to the slave ...  (at multiple clusters)

    I use the same logonuser-password as before to access the master :

    <M> kob01fw210:/root # ha_utils ssh
    Connecting to slave 198.19.250.1
    loginuser@198.19.250.1's password:
    Permission denied, please try again.
    loginuser@198.19.250.1's password:
    Permission denied ().


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Hi Dirk,

    Thanks for updating, and again apologies for the inconvenience you have faced. Could you open a support ticket for this? and please do share the caseID with us by replying to this thread. 

    Many thanks for your time and patience and thank you for choosing Sophos.

    Cheers,

    Raphael Alganes
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • Same problem here:

    version of package '/var/up2date/sys/u2d-sys-9.714004-715004.tgz.gpg' doesn't fit, skipping
    No suitable packages of type <sys> found, skipping

    Our SLAVE was updated to 9.715-3 (not 9.715-4) and is now MASTER.

    The former MASTER ist now SLAVE, in status U2DATE with 9.714-4.

    The images seem to work but we need the best way to finish the U2DATE on SLAVE.

    Deleting the sys-package and rebooting doesn't work.

    Any suggestions or procedures?

    Beste regards,

    OZ

  • You have to copy the update to 9.715-3 to the current slave and start the update manually...

    Or ... my Way multiple times ...

    - Pull all NICs from slave (HA-link as last)

    - reboot the device. Should now be a MASTER/primary too

    - connect a notebook to the device and copy the 9.715-3 - update-file

    - update to 9.715-3  via GUI ... now both devices should run the same version again

    - shutdown, reconnect NIC's , start


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Thank you!

    Danke, so machen wir's :-)

    I'll report the result here.

    Best regards
    Oliver

  • Wenn eine größere Unterbrechung (5Minuten) nicht stört .... 

    ein Kollege hat heute die Variante mit der Brechstange getestet.

    Am Webadmin UP2Date auf manuell stellen, die UpdateDatei 9.715-3->9.715-4 manuell hochladen und das Update ohne Rücksicht auf den Slave starten.

    Jetzt wird der MASTER aktualisiert auf 9.715-4, rebootet, der Slave übernimmt NICHT.  -- alles 5min DOWN --

    Kommt der Master wieder hoch, stimmt seine Version mit dem was der Slave installieren möchte und es geht wieder weiter.

    Ein Dank an Thomas !!


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Die Kollegen haben eine noch schonendere Möglichkeit gefunden: sie haben auf dem hängenden Slave das immer wieder automatisch geladene /var/up2date/sys/u2d-sys-9.715003-715004.tgz.gpg durch u2d-sys-9.715003-715003.tgz.gpg ersetzt, den Slave neu gestartet, woraufhin der sich auf 9.715-3 aktualisiert hat. Das war die gleiche Version wie der vorher aktualisierte Master und die beiden konnten sich synchronisieren und am Ende auch wieder die ursprüngliche Master-Slave-Zuordnung einnehmen :-) Heute Nacht aktualisieren wir dann auf 9.715-4, was hoffentlich normal funktioniert (nachts, weil es hier dann wenig L2TP/IPsec-VPN connections gibt, die unterbrochen werden).

    So my theory what went wrong (and this should hit everyone who updates from a firmware below 9.715-3):

    1) The update is incremental not cumulative so it first updates to version 9.715-3. The slave gets 9.715-3 successfully and turns to master.
    2) The new slave then does NOT download 9.715-3 BUT for some reason 9.715-4 instead. Maybe Sophos declares 9.7.15-4 as the preferred version...?
    3) 9.7.15-4 mismatches with 9.7.15-3 which is already installed on the other node so the new slave does not update anymore.

    Solutions would be all that have been described here:

    1) isolate the slave temporarily and update it
    2) isolate the master temporarily and update it (service downtime but update to the latest version)
    3) exchange the wrong version on the slave and let it update within HA

    Big thanks to Dirk Kotte who gave us options and a better understanding of the problem!

  • Dieses Vorgehen ist ja das von mir als erstes beschriebene. 


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • Stimmt. Das zuerst Geschriebene war ja auch schon ein Lösungsweg #-)