This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Postgres Archival Logging generating error after database rebuild to clear log generated disk space

Dear Community:
 
I am running a UTM-HA in AWS with a warm standby. I have a 100GB of space dedicated to the primary EC2 UTM Instance. I noticed it was over 90 % full for a reason I don't know as this a very new deployment. Through Sophos community support, I determined that the proper step would be to run the Postgres rebuild script. After the rebuild I am seeing the system messages below. The UTM is functioning properly and generating new pg_xlog files, but I can't fix this issue through trying different tactics. It's looking for a 16MB pg_xlog file that doesn't exist anymore and I want to get rid of this error so I can turn on archival logs to AWS s3 and CloudWatch. I can access the database as well, so any help to get this resolved would be highly appreciated.
 
Thanks,
 
Scott
 
Postgres is stuck in a loop, unless I turn the archival feature off in the configuration file.
 
 
Details:
2017:06:28-00:56:39 tcp-sophos-utm-ha-1 postgres[339]: [3486-1] LOG: archive command failed with exit code 2
2017:06:28-00:56:39 tcp-sophos-utm-ha-1 postgres[339]: [3486-2] DETAIL: The failed archive command was: /usr/local/bin/postgres_cloud_backup store pg_xlog/000000010000000000000012 000000010000000000000012
2017:06:28-00:56:41 tcp-sophos-utm-ha-1 postgres[339]: [3487-1] LOG: archive command failed with exit code 2
2017:06:28-00:56:41 tcp-sophos-utm-ha-1 postgres[339]: [3487-2] DETAIL: The failed archive command was: /usr/local/bin/postgres_cloud_backup store pg_xlog/000000010000000000000012 000000010000000000000012
2017:06:28-00:56:42 tcp-sophos-utm-ha-1 postgres[339]: [3488-1] LOG: archive command failed with exit code 2
2017:06:28-00:56:42 tcp-sophos-utm-ha-1 postgres[339]: [3488-2] DETAIL: The failed archive command was: /usr/local/bin/postgres_cloud_backup store pg_xlog/000000010000000000000012 000000010000000000000012
2017:06:28-00:56:42 tcp-sophos-utm-ha-1 postgres[339]: [3489-1] WARNING: transaction log file "000000010000000000000012" could not be archived: too many failures
2017:06:28-00:57:01 tcp-sophos-utm-ha-1 /usr/sbin/cron[20038]: (root) CMD (/var/awslogs/bin/awslogs-nanny.sh > /dev/null 2>&1)
 
 
 


This thread was automatically locked due to age.
Parents
  • I'm a little confused, Scott - are you trying to archive log files or PostgreSQL databases?  What version are you running?  What was the installed version if not your current version?  What command did you use to repair your PostgreSQL databases?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply
  • I'm a little confused, Scott - are you trying to archive log files or PostgreSQL databases?  What version are you running?  What was the installed version if not your current version?  What command did you use to repair your PostgreSQL databases?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Children
  • Hi Bob,

    I resolved the problem on my own last week. I have recently deployed in AWS,  the Sophos HA-UTM 9 platform in two separate VPC's, one with Linux based EC2 instances (web-application servers) and an Windows (VPC) with corporate domain controller, VPN's and RDP services. 

    In the Linux VPC where this problem originated, I have the Sophos UTM with one EIP as my main entry point. Then have multiple AWS internal ELB's listening on port 443/80 monitoring EC2 instance pairs within AWS Auto Scaling Groups for each web site. And AWS RDS DB's on the backend.

    On the Sophos UTM in the Linux VPC, I have a 100GB EBS volume. After performing the initial Sophos HA-UTM deployment through AWS Cloudformation, I noticed after a week or so, that my disk space allocation was over 90%. Then I got under the hood, SSHing into the UTM.

    To make a long story short, it appears that after I initially ran the Postgres rebuild script, without specifying "killall REPCTL" before executing "/etc/init.d/postgres -rebuild and then REPCTL after, it removed a WAL log file and was stuck in a loop looking for the specfic 16MB log file.

    To resolve the Postgres DB corruption, I first terminated the Warm Standby EC2 instance and AWS automatically built a new one with the Cloud Formation template or AMI. Then I performed the same process on the Master instance. Once the Master booted and synced up with the Warm Standby everything was fine and has been ever since.

    All I had to do then, was upload my Sophos UTM backup configuration and everything as been "Hunky Dory" every since.

    Thanks for reaching out to me and offering assistance. The Sophos UTM 9 platform is new territory to me from a security appliance perspective, but am catching on fast, as I have a lot of Cisco/Snort IDS/IPS expertise.

    I noticed you are the most dedicated individual in these Sophos forums and I look forward to collaborating more with you in the near future, as I am going into production with 5 corporate websites and switching DNS in Godaddy.

    Best Regards,

    Scott Spangler

     

  • Hi Scott,

    Thanks for the details, I am only starting to use Sophos so I got stuck at the "postgres rebuild" step. Can you please point me to the docs on how to do that properly?

    Thanks and regards,

    Efren

  • Hi Efren and welcome to the UTM Community!

    I'm not sure what your question is.  Are you looking for the following command?

    /etc/init.d/postgresql92 rebuild

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Thanks Bob,

     

    That was the one. Unfortunately I can not find any documentation explaining what it will do or what to consider when running it.

    I also contacted support and they suggested the same, but sadly, the issue remains. It looks like the rebuild worked, but it can not log anyway and the disk usage keeps growing.

  • Efren, If first-level Support can't help you, you should request escalation - someone needs to take a closer look.

    The rebuild just deletes the history in graphs and Reporting, rebuilds some databases like the SMTP Quarantine and Queue and does not touch the log files.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA