This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

High CPU load after update to UTM 9.700-5

Since the up2date 9.700-5 update, CPU on our master SG210 is running >94%.

Top is showing this: 

Cpu(s): 41.7%us, 47.9%sy,  0.0%ni,  5.0%id,  0.0%wa,  0.0%hi,  5.3%si,  0.0%st

HA lgo says:

2020:01:07-16:57:51 repctl[10837]: [c] standby_clone(936): 'module' => 'postgres-default'
2020:01:07-16:57:51 repctl[10837]: [c] standby_clone(936): };
2020:01:07-16:57:51 repctl[10837]: [c] standby_clone(936): (Attempt #:10)
2020:01:07-16:58:01 repctl[10837]: [w] master_connection(2015): check_dbh: -1
2020:01:07-16:58:03 repctl[10837]: [i] stop_backup_mode(765): stopped backup mode at 00000001000010B0000000DD
2020:01:07-16:58:03 repctl[10837]: [c] standby_clone(950): standby_clone failed: sync aborted (never executed successfully)
2020:01:07-16:58:03 repctl[10837]: [e] prepare_secondary(346): prepare_secondary: clone failed
2020:01:07-16:58:03 repctl[10837]: [c] prepare_secondary(360): failed to get database up, waiting for retry
2020:01:07-16:58:03 repctl[10837]: [c] setup_replication(274): setup_replication was not properly executed
2020:01:07-16:58:03 repctl[10837]: [i] setup_replication(278): checkinterval 300

Any remediation suggestions?

Regards,

Koen



This thread was automatically locked due to age.
Parents
  • Seems there are HA/replication problems.

    Are the devices directly connected ? (without switch between them)

    First I would try to reboot the slave.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Reply
  • Seems there are HA/replication problems.

    Are the devices directly connected ? (without switch between them)

    First I would try to reboot the slave.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Children
  • Just rebooted Slave, been in SYNCING state for at least 8 minutes before getting back to READY.

    >master_ip 198.19.250.2

    >slave_ip 198.19.250.1

    2020:01:07-17:09:10 <SLAVE > ha_daemon[4249]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 47 10.995" name="Initial synchronization finished!"
    2020:01:07-17:09:10 <SLAVE > ha_daemon[4249]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 48 10.995" name="state change SYNCING(2) -> ACTIVE(0)"
    2020:01:07-17:09:11 <MASTER> ha_daemon[4172]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 963 11.459" name="Node 1 changed state: SYNCING(2) -> ACTIVE(0)"
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] db_connect(2206): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.2): could not connect to server: Connection refused
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] db_connect(2206): Is the server running on host "198.19.250.2" and accepting
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] db_connect(2206): TCP/IP connections on port 5432?
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] master_connection(2045): could not connect to server: Connection refused
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] master_connection(2045): Is the server running on host "198.19.250.2" and accepting
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [e] master_connection(2045): TCP/IP connections on port 5432?
    2020:01:07-17:12:53 <MASTER> repctl[4309]: [i] main(188): cannot connect to postgres on master, retry after 512 seconds
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 0
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [i] execute(1768): pg_ctl: no server running
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] db_connect(2206): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.1): could not connect to server: Connection refused
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] db_connect(2206): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] db_connect(2206): TCP/IP connections on port 5432?
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] master_connection(2045): could not connect to server: Connection refused
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] master_connection(2045): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:02 <SLAVE > repctl[10837]: [e] master_connection(2045): TCP/IP connections on port 5432?
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] db_connect(2206): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.1): could not connect to server: Connection refused
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] db_connect(2206): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] db_connect(2206): TCP/IP connections on port 5432?
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] master_connection(2045): could not connect to server: Connection refused
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] master_connection(2045): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:05 <SLAVE > repctl[10837]: [e] master_connection(2045): TCP/IP connections on port 5432?
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] db_connect(2206): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.1): could not connect to server: Connection refused
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] db_connect(2206): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] db_connect(2206): TCP/IP connections on port 5432?
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] master_connection(2045): could not connect to server: Connection refused
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] master_connection(2045): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:08 <SLAVE > repctl[10837]: [e] master_connection(2045): TCP/IP connections on port 5432?
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] db_connect(2206): error while connecting to database(DBI:Pg:dbname=repmgr;host=198.19.250.1): could not connect to server: Connection refused
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] db_connect(2206): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] db_connect(2206): TCP/IP connections on port 5432?
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] master_connection(2045): could not connect to server: Connection refused
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] master_connection(2045): Is the server running on host "198.19.250.1" and accepting
    2020:01:07-17:13:12 <SLAVE > repctl[10837]: [e] master_connection(2045): TCP/IP connections on port 5432?
    2020:01:07-17:13:15 <SLAVE > repctl[10837]: [c] prepare_secondary(315): prepare_secondary failed because master db's status can't be determined! Maybe unreachable?
    2020:01:07-17:13:15 <SLAVE > repctl[10837]: [c] setup_replication(274): setup_replication was not properly executed
    2020:01:07-17:13:15 <SLAVE > repctl[10837]: [i] setup_replication(278): checkinterval 300

     

  • 2020:01:07-17:13:15 <SLAVE > repctl[10837]: [c] prepare_secondary(315): prepare_secondary failed because master db's status can't be determined! Maybe unreachable?

    this looks as the postgres database at master is defect.

    If slave goes to ready-state i would failover to slave node(reboot master)

    ... and repair/rebuild database or reinstall slave

     


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.