HA Slave routing in BGP configuration

As discussed in previous posts (This one and This one), we have a succesfull e-BGP installation running on our UTM (will post how-to in this forum), with thanks to BAlfson.

There is one detail that I'd like to figure out before that. We have a Active-Passive HA configuration. The SLAVE reports the '[WARN-129] Spam Filter cannot query database servers' error every hour or so. 

Not a really big deal, but annoying :)

We have a second Active-Passive HA configuration without BGP which doesn't have this problem.

I (think I) found that the check performed by UTM which causes this error is a script run by selfmonng on the slave: /usr/bin/ctasd_connect_check.sh 

Running this script manually on the Master reports no problems. Running this script on the slave of the BGP configuration reports 'Can't reach any server'. Running this script on te slave of the 'normal' HA configuration reports no problems.

Trying a ping/traceroute from the slave to one of the servers checked by ctasd_connect_check.sh reports 'Network is unreachable'. Ping/Traceroute from the slave of the other BGP configuration works fine. I tried a tcpdump -i any -vv port 80 on the slave of the BGP configuration but this shows no results (as expected). The same tcpdump on the slave of the other configuration shows succesful connections.

One significant difference between the BGP configuration and the 'normal' configuration is the fact that there is no default gateway defined on the BGP configuration. Routing is done with SNAT rules. So I was thinking that a simple SNAT on the HA network range would do the trick.

Unfortunately it doesn't, which results in the fact that the slave in the BGP configuration can only communicate with the master and all networks directly attached to the configuration. As soon as routing is required it's a dead end.

As I am not sure how exactly the HA slave communicates with other networks it seems that I am missing something.

Any thoughts / suggestions?

 

Regards,

 

Karl-Heinz

  • Hi  

    This is an interesting situation and honestly, something I've never come across. I'm going to check with our experts and see if there is something we can do in this. 

  • In reply to Jaydeep:

    Hi Jaydeep,

     

    Thanks! Some thoughts that occurred to me in the past week, hope this helps to clarify even more:

    • It seems that the script failure did not start immediately after implementing eBGP, but only after we added eBGP multipathing to the solution. Before that we had route maps enabled so that specific ranges would go through specific paths. Enabling eBGP Multipathing (and disabling route maps) seemed more efficient as this basically devides all available bandwidth across all available paths. I want to test this again (enable route maps and disable eBGP multipathing), but as this system is running production I cannot just change this without permission :)
    • For a slave node this specific check seems unnecessary. As far as I understand, SPAM database updates are done on the Master which then replicates to the slave. So the Slave doesn't have to check the connection. That's why we think it's annoying but not a real problem.

     

    Best Regards,

     

    Karl-Heinz