This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

postgres LOG: archive command failed with exit code 3

 The Postgres WAL archive_command runs approximately every minute, and attempts to run three times in succession. In all three cases, it fails (I can see this in /var/log/system.log). If I manually run the failing command (/usr/local/bin/postgres_cloud_backup store pg_xlog/000000010000000000000005 000000010000000000000005, in my instance) I get the following output:

 

I, [2017-10-24T12:34:15.497969 #2846]  INFO -- : Celluloid 0.17.3 is running in BACKPORTED mode. [ http://git.io/vJf3J ]
2017:10:24-12:34:15.539 INFO root: Application postgres_backup started
root  ............................................   *info      -T
- <Appenders::Stderr name="stderr">
- <Appenders::Syslog name="postgres_backup">
  AWS  ...........................................    info  +A  -T
  Celluloid  .....................................    info  +A  -T
  Logging  .......................................    *off  -A  -T
2017:10:24-12:34:15.547 INFO CLI::PostgresCloudBackup: HA/AWS Call failed (retry later): Failed to complete request
2017:10:24-12:34:15.547 INFO root: Application postgres_backup ended

 

I've traced the error by modifying /usr/lib/ruby/gems/2.2.0/gems/sophos-iaas-1.0.0/lib/sophos/iaas/cli/postgres_cloud_backup.rb so that calls to logger.debug are replaced with logger.info, and then re-running, which gives me the following output:

 

I, [2017-10-24T12:39:40.987493 #3882]  INFO -- : Celluloid 0.17.3 is running in BACKPORTED mode. [ http://git.io/vJf3J ]
2017:10:24-12:39:41.050 INFO root: Application postgres_backup started
root  ............................................   *info      -T
- <Appenders::Stderr name="stderr">
- <Appenders::Syslog name="postgres_backup">
  AWS  ...........................................    info  +A  -T
  Celluloid  .....................................    info  +A  -T
  Logging  .......................................    *off  -A  -T
2017:10:24-12:39:41.059 INFO CLI::PostgresCloudBackup: JSON RPC -> func_store_wal("/var/storage/pgsql92/data/pg_xlog/pg_xlog/000000010000000000000005", "000000010000000000000005")
2017:10:24-12:39:41.061 INFO CLI::PostgresCloudBackup: JSON RPC <- {"error"=>{"code"=>-32901, "message"=>"Service temporary unavailable. Readonly mode"}, "jsonrpc"=>"2.0", "id"=>6281}
2017:10:24-12:39:41.061 INFO CLI::PostgresCloudBackup: HA/AWS Call failed (retry later): Failed to complete request
2017:10:24-12:39:41.061 INFO root: Application postgres_backup ended

 

There are only two places in Sophos code that I can see that trigger an error code -32901, and they both live in 

/usr/lib/ruby/gems/2.2.0/gems/sophos-iaas-1.0.0/lib/sophos/iaas/cloud-manager/infrastructure/data_service/postgres_wal.rb; only one of them has "Service temporary unavailable," and that's on line 45.

 

Looking at this section of code, I spy an @readonly variable, which apparently is set to true.

 

Looking through the interface, nothing stands out to me as incorrect. This was happening in 9.411 and continues in 9.501. Can someone nudge me in the right direction to fix this backup failure in my logs, so that I can stop thinking about it?



This thread was automatically locked due to age.