This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Postgres issue?

I seem to have this happening in my UTM logs:

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] db_connect(2171): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] db_connect(2171): Is the server running on host "198.19.250.2" and accepting

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] db_connect(2171): TCP/IP connections on port 5432?

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] master_connection(2010): could not connect to server: Connection refused

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] master_connection(2010): Is the server running on host "198.19.250.2" and accepting

2017:04:24-11:55:30 gw01-2 repctl[14683]: [e] master_connection(2010): TCP/IP connections on port 5432?

2017:04:24-11:55:30 gw01-2 repctl[14683]: [i] main(188): cannot connect to postgres on master, retry after 1024 seconds

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] db_connect(2171): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] db_connect(2171): Is the server running on host "198.19.250.2" and accepting

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] db_connect(2171): TCP/IP connections on port 5432?

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] master_connection(2010): could not connect to server: Connection refused

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] master_connection(2010): Is the server running on host "198.19.250.2" and accepting

2017:04:24-12:04:08 gw01-1 repctl[4357]: [e] master_connection(2010): TCP/IP connections on port 5432?

2017:04:24-12:04:08 gw01-1 repctl[4357]: [i] main(188): cannot connect to postgres on master, retry after 1024 seconds

It's slightly concerning to say the least. Anybody know the solution?

This thread was automatically locked due to age.

Parents

0 DavidFinnegan over 7 years ago

Hi Louise,

This looks like a connectivity issue on High Availability connection between your Master and Slave UTM's. You can check the connectivity by logging into the UTM shell and trying the ping command "ping 198.19.250.2" You should see something like this:

<M> utm-cpc:/home/login # ping 198.19.250.2
PING 198.19.250.2 (198.19.250.2) 56(84) bytes of data.
64 bytes from 198.19.250.2: icmp_seq=1 ttl=64 time=0.784 ms
64 bytes from 198.19.250.2: icmp_seq=2 ttl=64 time=0.805 ms
64 bytes from 198.19.250.2: icmp_seq=3 ttl=64 time=0.468 ms
64 bytes from 198.19.250.2: icmp_seq=4 ttl=64 time=0.724 ms
64 bytes from 198.19.250.2: icmp_seq=5 ttl=64 time=0.524 ms
64 bytes from 198.19.250.2: icmp_seq=6 ttl=64 time=0.682 ms
64 bytes from 198.19.250.2: icmp_seq=7 ttl=64 time=0.430 ms
64 bytes from 198.19.250.2: icmp_seq=8 ttl=64 time=2.31 ms
64 bytes from 198.19.250.2: icmp_seq=9 ttl=64 time=0.346 ms
^C
--- 198.19.250.2 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8000ms
rtt min/avg/max/mdev = 0.346/0.786/2.316/0.563 ms
Cancel
Vote Up 0 Vote Down

Cancel
0 Louis-M over 7 years ago in reply to DavidFinnegan

Well, it's not good.

The first symptoms we had of this was users complaining about their internet connection being slow. And it was. We struggled to log onto the UTM also. The masters swap was at 27% which we hadn't seen before on our SG310

So, we rebooted the master to see if that had any effect and yes it did. The slave came on and everything ran fine.

After the reboot, we switched back to master and it appeared to work (swap % and everything running or so we thought)

We were/are still going out to the internet etc but we've lost all mail filtering etc and Postgre is complaining and won't start at all.

Now, we had a nightmare with Sophos support last time (we have premium support and we did it via the portal)

So this time I called them direct and to their due, they tried a few things to get it going but to no avail.

It's now been escalated up to the next level so we will see where that goes.
Hopefully it's fairly quick because if it isn't (after last time taking 8 weeks or so to semi sort something), we aren't going to be in the mood to wait another week for a fix.

In the meantime, I've redirected our mail to our other UTM active/passive cluster which is behaving itself.

So we have internet etc out of the UTM that's semi broke and email out of one on the other site that is ok.

Not sure what happened as there was no config etc going on at the time.
Cancel
Vote Up 0 Vote Down

Cancel
0 DavidFinnegan over 7 years ago in reply to Louis-M

If you shut down the Slave does the Masters performance improve? That would indicate that there are problems with the HA synchronization which can impact Postgres and cause those types of of symptoms.
Cancel
Vote Up 0 Vote Down

Cancel
0 Louis-M over 7 years ago in reply to DavidFinnegan

Not sure what went wrong to be honest. The master slowed down, swap went to 27%. Switch to the slave improved it. Rebooted master, swap was 0%, switch back over to master, things went ok and then performance went down again over the next hour.

Sophos sorted it within a day although it did require a db rebuild.
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Louis-M over 7 years ago in reply to DavidFinnegan

Not sure what went wrong to be honest. The master slowed down, swap went to 27%. Switch to the slave improved it. Rebooted master, swap was 0%, switch back over to master, things went ok and then performance went down again over the next hour.

Sophos sorted it within a day although it did require a db rebuild.
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 sachingurung over 7 years ago in reply to Louis-M

Hi Louis,

In such instances, DB rebuild is one of the solutions. Please PM me the case#.

Thanks

Sachin Gurung
Team Lead | Sophos Technical Support
Knowledge Base | @SophosSupport | Video tutorials
Remember to like a post. If a post (on a question thread) solves your question use the 'This helped me' link.
Cancel
Vote Up 0 Vote Down

Cancel
0 Louis-M over 7 years ago in reply to sachingurung

Hi Sachin,

the issue was resolved by a db rebuild. Sophos support did this for us and their service was first class.
Cancel
Vote Up 0 Vote Down

Cancel