This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Upgrade failed, need HELP

Greetings to all out there from Munich [[:)]]

After several hours of testing and even contact with the Astaro support, I could not success in upgrading our second ASG, which failed, as we upgraded from 7.011 to 7.101 in an HA configuration.

So here are the details and what I've done and experienced:

We have two ASG320 running in an HA configuration (erm had... since now we have just one with 7.101 and one still with 7.011 ...:rolleyes[[:)]]. The master node downloaded the patches automatically. On the day of of the planned upgrade we started the installation process. That went smooth as hoped. It installed, rebooted, installed the other one and then started the Up2date process of the slave. And there it stuck.

After waiting around 30 minutes we checked the Logfiles and found enough of this:

2008:02:13-00:05:08 (none) auisys[11210]: Starting up2date package installation

2008:02:13-00:05:08 (none) auisys[11210]: >=========================================================================

2008:02:13-00:05:08 (none) auisys[11210]: Failed testing RPM installation (command: 'rpm --test -U /var/up2date//auav-install/u2d-auav-7.492/rpms/u2d-auav-7-492.i686.rpm')

2008:02:13-00:05:08 (none) auisys[11210]: 

2008:02:13-00:05:08 (none) auisys[11210]:  1. Internal::Systemstep::real_installation:2205() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  2. main:[:P]erform_work:990() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  3. main::auisys_prepare_and_work:517() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  4. main::top-level:34() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]: |=========================================================================

2008:02:13-00:05:08 (none) auisys[11210]: id="371O" severity="error" system="System" sub="up2date" name="Fatal: Up2Date package installation failed: An error occured during the RPM pre-installation test (1)" status="failed" action="install" code="1" package="auav"

2008:02:13-00:05:08 (none) auisys[11210]: 

2008:02:13-00:05:08 (none) auisys[11210]:  1. main::alf:72() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  2. main:[:P]erform_work:1036() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  3. main::auisys_prepare_and_work:517() auisys.pl

2008:02:13-00:05:08 (none) auisys[11210]:  4. main::top-level:34() auisys.pl

2008:02:13-00:19:01 (none) audld[16499]: Starting Up2Date Package Downloader (Version 1.57)

2008:02:13-00:19:02 (none) audld[16499]: id="3701" severity="info" system="System" sub="up2date" name="Authentication successful"

2008:02:13-00:19:06 (none) audld[16499]: id="3707" severity="info" system="System" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="sys"

2008:02:13-00:19:07 (none) audld[16499]: id="3707" severity="info" system="System" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="man-app"

2008:02:13-00:19:07 (none) audld[16499]: id="3707" severity="info" system="System" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="ohelp"

2008:02:13-00:19:07 (none) audld[16499]: id="3707" severity="info" system="System" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="auav"

2008:02:13-00:19:07 (none) audld[16499]: id="3707" severity="info" system="System" sub="up2date" name="Successfully synchronized fileset" status="success" action="download" package="clam"

2008:02:13-00:20:02 (none) auisys[17072]: Starting Up2Date Package Installer (Version 1.65)

2008:02:13-00:20:03 (none) auisys[17072]: Searching for available up2date packages for type 'hwaccel'

2008:02:13-00:20:03 (none) auisys[17072]: id="371B" severity="info" system="System" sub="up2date" name="up2date package is already installed, skipping" status="failed" file="/var/up2date//hwaccel/u2d-hwaccel-7.1860.tgz.gpg" action="install" package="hwaccel"

2008:02:13-00:20:03 (none) auisys[17072]: id="371D" severity="info" system="System" sub="up2date" name="No up2date packages available for installation" status="failed" action="install" package="hwaccel"

2008:02:13-00:20:03 (none) auisys[17072]: Searching for available up2date packages for type 'auav'

2008:02:13-00:20:03 (none) auisys[17072]: Installing up2date package file '/var/up2date//auav/u2d-auav-7.492.tgz.gpg'

2008:02:13-00:20:03 (none) auisys[17072]: Verifying up2date package signature

2008:02:13-00:20:04 (none) auisys[17072]: Unpacking installation instructions

2008:02:13-00:20:04 (none) auisys[17072]: Unpacking up2date package container

2008:02:13-00:20:04 (none) auisys[17072]: Running pre-installation checks

2008:02:13-00:20:04 (none) auisys[17072]: Starting up2date package installation

2008:02:13-00:20:05 (none) auisys[17072]: >=========================================================================

2008:02:13-00:20:05 (none) auisys[17072]: Failed testing RPM installation (command: 'rpm --test -U /var/up2date//auav-install/u2d-auav-7.492/rpms/u2d-auav-7-492.i686.rpm')

2008:02:13-00:20:05 (none) auisys[17072]: 

2008:02:13-00:20:05 (none) auisys[17072]:  1. Internal::Systemstep::real_installation:2205() auisys.pl

2008:02:13-00:20:05 (none) auisys[17072]:  2. main:[:P]erform_work:990() auisys.pl

2008:02:13-00:20:05 (none) auisys[17072]:  3. main::auisys_prepare_and_work:517() auisys.pl

2008:02:13-00:20:05 (none) auisys[17072]:  4. main::top-level:34() auisys.pl

2008:02:13-00:20:05 (none) auisys[17072]: |=========================================================================

Ok than we tried to manually to dowlnoad the patches from the Astaro FTP and loaded em up through the WebAdmin function and started the Up2date from the WebAdmin again. But to no avail, same messages, same disappointment; no sucess whatsoever.

So we disabled the HA config, and tried to manually update it with the auisys.plx command and several flags like --nodeps and --force... When I use the --force flag I have the success that the RPM prechecks are OK (which is somehow stupid, since without the "force" it fails... and stops the command completely), but the installtion of the majority of the rpm packages gave the FAILED status which concluded in the stop of the whole procedure with error code 10...

So I am at loss what to do... I don't want to totally install from scratch... and since we have no fallback solution right now the case is pressing for action...

Any ideas what I should do?

This thread was automatically locked due to age.

Parents

0 FedError over 16 years ago

Hi mnebeling,

we had the same problem with one of our Customer.

Are the 2x ASG320 in Active/Active cluster or Failover ?

Did you already tried to update the 2nd box to 7.100 ? and then to 7.10x ?

Which support have you ( Gold or Platinum ) ?

But honnestly, it will be for you faster to start from scratch than to wait for an answer [:)]

Regards,

Frederic Salzmann
E3T IT Systems
Astaro Distri Luxembourg
Cancel
Vote Up 0 Vote Down

Cancel
0 mnebeling over 16 years ago in reply to FedError

Well right now we just have the formerly as master designated node working, which patches without problems. I have the slave right now seperated and tried my things.

I guess we have Gold...

The slave is still 7.011 and does not get upgraded because of the described errors.
Cancel
Vote Up 0 Vote Down

Cancel
0 FedError over 16 years ago in reply to mnebeling

"When I use the --force flag I have the success that the RPM prechecks are OK (which is somehow stupid, since without the "force" it fails... and stops the command completely)"

For this , it´s because during your tried to update the second Box one part of the software has been changed to new version (7.101), and the command stop cause actually the package are already been updated. So only the --force is working.

If you try to update separatly :

follow this
1. Shutdown der Worker-Node
2. Trennen aller Netzwerkverbindung am Worker
3. Hochfahren der Maschine als Single Unit
4. Verbindung via cross-kabel an einem der erlaubten Interfaces, bspw. Internal interface
5. Öffnen einer ssh Verbindung via (bspw. Putty)
6. Login (loginuser -> root)
7. auidl.plx Befehl ausführen zur Installtion des Up2dates
8. Herunterfahren der Maschine
9. Re-Verkabelung der Box
10. Hochfahren

Regards,

Frederic Salzmann
E3T IT Systems
Astaro Distri Luxembourg
Cancel
Vote Up 0 Vote Down

Cancel
0 mnebeling over 16 years ago in reply to FedError

Greetings Fred,

first thanks for your feedback so far.

But I guess I have somehow confused some of the readers... Well actually the status is as follows: After the upgrade process (up2date in WebAdmin) of the second node (SLAVE) failed in HA environment, I seperated it and tried the things I allready described.

Do you mean, that the failure right now happens because the procedure of upgrade in an HA environment is somehow different to the one without HA? If so, how can I erase those changes properly and try again to upgrade manually on the now seperated node?

Also right now it says it's unlinked... Should I disable the HA activation on the now seperated (formerly SLAVE) for the time being? If I got you right, it somehow "thinks"it is still a "slave" and is "checking with the Master??

Best regards & thanks for your help so far [:)]
Marcel
Cancel
Vote Up 0 Vote Down

Cancel
0 Ma10 over 16 years ago in reply to mnebeling

Trying to solve separately on slave after resetting it and reloading precluster configuration , getting a GPG error:

Installing up2date package version 7.103
Verifying up2date package signature
CoopIS:/root # FATAL: Could not extract tar from gpg: 'Error in GPG verification (return code: 512)'

Tried to redownload twice - no good... yet.
Cancel
Vote Up 0 Vote Down

Cancel
0 FedError over 16 years ago in reply to Ma10

Hi Ma10,

normally in a Failover cluster ( Activ / Passiv ) you don´t need to reload the Config on the slave cause of the synchronisation. You just have to plug it on you cluster Interface and the config will be saved also on the Slave.

@mnebeling,
as described, just shutdown the slave ,unplugged all your interface, restart it as single unit, log in using ssh, and run the auidl.plx, as the update finish, shutdown, reconnect all the interfaces ( also the one for the cluster ) and restart the ASG.

The config should be transferred from the actually Master to this new slave. It could takes some times.

But please take contact with your reseller or Distri they have to open a support Case by Astaro for those kinds of problem [:)]

Regards

Frederic Salzmann
E3T IT Systems
Astaro Distri Luxembourg
f.salzmann@e3t.eu
Cancel
Vote Up 0 Vote Down

Cancel
0 mnebeling over 16 years ago in reply to FedError

Greetz Fred,

well again no luck... We have a Gold Contract, but our distributor will not open a support ticket until he can get access to the machine, which we cant grant since we are working with very sensible data and our clients are banks so our security requirements are very high.

Seems I have to start from scratch... To say the least this s_cks [;)]
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 16 years ago in reply to mnebeling

Astaro support, as well, will likely need access to the system to figure out what's wrong. What we normally do in these cases is to create an admin account for them to use, kill it when they're done. We also reset the SSH and root passwords before allowing them access to the shell (if needed), and set them back afterwards. We also limit what address(es) they access the system from.

It's hard for them to fix something they can't see; sort of like calling your mechanic and expecting him to fix the car without letting him take a peek at it in the shop.

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Sophos Platinum Partner

--------------------------------------

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel
0 Ma10 over 16 years ago in reply to FedError

Hi Ma10,

normally in a Failover cluster ( Activ / Passiv ) you don´t need to reload the Config on the slave cause of the synchronisation. You just have to plug it on you cluster Interface and the config will be saved also on the Slave.

Frederic Salzmann
E3T IT Systems
Astaro Distri Luxembourg
f.salzmann@e3t.eu

Frederic,

Well of course... if it works. the Master node was upgraded, but the slave was stuck in U2D status...
Broken the Cluster to try and upgrade it as a standalone --> Factory reset --> Basic configuration. --> U2D through shell several times after cleaning files from /var/up2date/sys and /var/up2date/sys-install --> GPG verification error 512.

Still no success... (Opened a support ticket)
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Ma10 over 16 years ago in reply to FedError

Hi Ma10,

normally in a Failover cluster ( Activ / Passiv ) you don´t need to reload the Config on the slave cause of the synchronisation. You just have to plug it on you cluster Interface and the config will be saved also on the Slave.

Frederic Salzmann
E3T IT Systems
Astaro Distri Luxembourg
f.salzmann@e3t.eu

Frederic,

Well of course... if it works. the Master node was upgraded, but the slave was stuck in U2D status...
Broken the Cluster to try and upgrade it as a standalone --> Factory reset --> Basic configuration. --> U2D through shell several times after cleaning files from /var/up2date/sys and /var/up2date/sys-install --> GPG verification error 512.

Still no success... (Opened a support ticket)
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 m.fischer over 16 years ago in reply to Ma10

Frederic,

Well of course... if it works. the Master node was upgraded, but the slave was stuck in U2D status...
Broken the Cluster to try and upgrade it as a standalone --> Factory reset --> Basic configuration. --> U2D through shell several times after cleaning files from /var/up2date/sys and /var/up2date/sys-install --> GPG verification error 512.

Still no success... (Opened a support ticket)

This is exactly the same as here with a cluster of two ASG 320. I wait a little bit, if a quick solution comes up here...

Regards,
Manuel
Cancel
Vote Up 0 Vote Down

Cancel
0 da_merlin over 16 years ago in reply to m.fischer

GPG verification error seems like a corrupt up2dat package.

Can you verify the file size and md5 checksum of /var/up2date//sys/u2d-sys-7.103.tgz.gpg on the failed node?

Should be:
MD5sum: d7c5bb6eda6e4ff5c5f1824cd8b7ec71
Size : 5,386,126 bytes

If it its corrupt, delete and download it again, e.g:
audld.plx --ha-override --types=sys

Try to install it with:
auisys.plx --types=sys --upto 7.103

If it succeeds, reboot the node and everything should be fine again.

Note: You can access other HA/Cluster Nodes with the command "ha_utils ssh" from the Master Node.
Cancel
Vote Up 0 Vote Down

Cancel
0 m.fischer over 16 years ago in reply to da_merlin

GPG verification error seems like a corrupt up2dat package.

Can you verify the file size and md5 checksum of /var/up2date//sys/u2d-sys-7.103.tgz.gpg on the failed node?

Should be:
MD5sum: d7c5bb6eda6e4ff5c5f1824cd8b7ec71
Size : 5,386,126 bytes

The MD5sum (testet with md5sum on the ASG and via MD5summer under Windows) is always the same and absolutely identically (and the same as mentioned above), whether

- ASG downloaded the firmware automatically itself
- firmware is uploaded via web-formular manually
- firmware is copied via SCP and tested via ssh-console

Try to install it with:
auisys.plx --types=sys --upto 7.103

This fails every time with the message

Installing up2date package version 7.103
Verifying up2date package signature
FATAL: Could not extract tar from gpg: 'Error in GPG verification (return code: 512)'

Regards,
Manuel
Cancel
Vote Up 0 Vote Down

Cancel
0 andreas over 16 years ago in reply to m.fischer
Oh my, we just found out what happened: The up2date 7.102 contains a new factory reset script, which unfortunately removes the GPG keys required to use the up2date process.

If you are affected (you can tell when the up2date process fails with "FATAL: Could not extract tar from gpg: 'Error in GPG verification (return code: 512)"), you can re-install the ep-up2date RPM package containing those GPG keys like this:

mount /opt/inst/
rpm -ivh --force /opt/inst/rpm/ep-up2date-7.1-104.i686.rpm
umount /opt/inst

You should also run the netselector process afterwards (will be done every couple of hours anyway, but speeds up things), and restart the middleware so that a new up2date configuration file is written:

/usr/local/bin/netselector.plx --infile /etc/up2date/authservers.ini --outfile /etc/up2date/servers.sorted
/etc/init.d/mdw restart

This should get things working again!
Cheers,
andreas
Cancel
Vote Up 0 Vote Down

Cancel
0 m.fischer over 16 years ago in reply to andreas

Oh my, we just found out what happened: The up2date 7.102 contains a new factory reset script, which unfortunately removes the GPG keys required to use the up2date process.

Yes, the file /root/.gnupg/pubring.gpg was empty. We copied it from another ASG and all works again. Your solution is easier :-)

Now the up2date-mechanism works again, both ASG 320 have installed 7.103 without further problems.

Thanks for your quick and competent answers!

Regards,
Manuel
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 16 years ago in reply to m.fischer

Just had the same problem on a new, clean install of 7.100 ... up2dated to 7.103, now it won't load a pattern update... gonna try the GPG key solution... this really should be fixed...

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Sophos Platinum Partner

--------------------------------------

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel
0 mnebeling over 16 years ago in reply to BrucekConvergent

Ok, I'm a little frustrated and at loss right now.

I thought, ok... well I start from scratch. I did the hardware reset (with the button) and from now on I'm unable to connect to the Astaro. I can't get a connection via WebAdmin or SSH and neither with a console port connection... I thought that after a reset, the ip adress would be 192.168.0.1 or is that not the case? The old we used isn't working either.

So please help... I need my frigging Astaro back [;)]

Greetz,
Marcel
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 16 years ago in reply to mnebeling

Do you mean you just reset the box, or are you running an Astaro Hardware Appliance, and chose the Factory Reset Option? If it's the latter, make sure you powered off / powered back on the unit after the Reset was done (the "beeps" indicate when the Factory Reset is complete). If you are running Version 7, the Factory reset defaults the unit back to https://192.168.0.1:4444 for the Webadmin.

If the system's "toast," you will need to reload the system with the "sai" Astaro Image (I would use 7.100) with an USB CDROM... you can't use the public "standard" ISO, as your ASG appliance license won't work, along with other issues. You can get this ISO from your reseller (or possibly support would provide a download link to it, if you started a case).

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Sophos Platinum Partner

--------------------------------------

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 16 years ago in reply to BrucekConvergent

Oh, BTW, the manual reinstallation of the up2date package mentioned a couple of posts up did fix my up2date problem... it's pretty frustrating to have to do this on a clean install, though.

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Sophos Platinum Partner

--------------------------------------

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel
0 mnebeling over 16 years ago in reply to BrucekConvergent

Yes we have two "320er", so it was the hard reset. But right now I got from my spport contact that the ip address should be 192.168.1.1... which is somehow ok, but when I try to conenct nothing works and it is also not pingable. We found out that the mac address is "0000", so I would say this is a glitch from the old HA config, which I had disabled before doing the reset.

Any idea how to conquer my ASG320 getting back...?
Cancel
Vote Up 0 Vote Down

Cancel