Strange drops

We have a customer with a phone switchboard application that periodically freezes, either at an application level (can't click anything), or it just won't show incoming calls. In both cases it can sometimes unfreeze, and then all the calls that have come in in the meantime suddenly flash on the screen. We've ruled out AV as the cause and are now looking into the problem being at the network layer.

drop-packet-capture shows this at the time of freezing:

2017-05-23 08:58:14 0101021 IP 10.10.90.2.8779 > 10.10.10.112.43470 : proto TCP: P 3007061919:3007062115(196) win 330 checksum : 55314
0x0000:  4500 00ec 18b4 4000 3f06 a9d2 0a0a 5a02  E.....@.?.....Z.
0x0010:  <remainder of the packet redacted>
Date=2017-05-23 Time=08:58:14 log_id=0101021 log_type=Firewall log_component=Firewall_Rule log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Lag.90 out_dev=Lag.10 inzone_id=1 outzone_id=8 source_mac=00:1a:e8:8b:15:b4 dest_mac=00:e0:20:11:08:fc l3_protocol=IP source_ip=10.10.90.2 dest_ip=10.10.10.112 l4_protocol=TCP source_port=8779 dest_port=43470 fw_rule_id=0 policytype=1 live_userid=0 userid=0 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=0 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 proxy_flags=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 source_nat_id=0 cluster_node=0 inmark=0x0 nfqueue=101 scanflags=0 gateway_offset=0 max_session_bytes=0 drop_fix=0 ctflags=33554472 connid=2341170016 masterid=0 status=398 state=3 sent_pkts=N/A recv_pkts=N/A sent_bytes=N/A recv_bytes=N/A tran_src_ip=N/A tran_src_port=N/A tran_dst_ip=N/A tran_dst_port=N/A

then the same again exactly 2 minutes later (even the checksum is the same)

The connection came good another minute later.

Any idea where to look next?

thanks

James

  • In reply to MichaelBolton:

    I am so glad to stumble upon this thread.  You are describing my issue to a T!  My users were ready to throw IT out the window and were demanding we get rid of the XG firewall.  We also had days where it worked perfectly such as two days this past week.

    I have also disabled it.

    I am severely frustrated with support.  The reasons I have gotten are: not enough bandwidth, cable bad, switch bad, network issues and just flat out it isn't our problem but yours.

    Thank you gentlemen!

  • In reply to MichaelBolton:

    FWIW, firewall acceleration is enabled on the AP HA setup I've been working with:

    console> system firewall-acceleration show
    Firewall Acceleration is Enabled.

  • My Installation Engineer is telling me that firmware version SFOS 16.05.5 MR5 will fix this issue.  Have any of you installed it?

     

    April

  • In reply to April Beachy:

    Hi April,

    The 2 bugs associated with my case, NC-19062 and NC-19219 are not listed as fixed on the release notes. The engineer on my case did not tell me MR5 has the fix either. I have not installed MR5 yet though to confirm either way. I will probably wait till next week just to make sure others bugs don't pop up.

    Mike

  • I can verify that the update SFOS 16.05.5 MR-5 fixed our connection dropping issue.

     

    Now on to fix the numerous other issues associated with the particular brand of user authentication that the XG firewall uses, STATS. My users call the side effect the white screen of death.

  • In reply to April Beachy:

    Just saw this thread and read through.  Had a customer with the same issues.  HA setup, LAG with VLAN's, drops every few minutes, etc.

    The strange drops were fixed for us by updating STAS (we were still running a v15 variant) and SFOS to v16.05.5 MR5.

    Like April noted, we have some other funky issues but nothing as maddening as the drops.

  • In reply to axsom1:

    DHCP Relay issues have been reported with the newer firmwares so I haven't been game to update yet

  • In reply to jamesharper:

    I've had the same issues.  Updated yesterday to v17 and DHCP relay stopped functioning.  Support had to manually config the relays through the console to get it back up.

    On the drops, I currently have STAS disabled, as it seemed to wreak havoc on the network.  No user rules created, yet every hour or so, it seemed to unauthenticate the users for about a minute, even after increasing the interval.  Seems pretty useless if it's going to unauthenticate the user prior to checking if the user is disconnected.  I feel it should keep the authentication, poll the user and act after the poll.

    Not to mention the DCOM error messages on the event logs of every system, firewalled or not.  I'll be reenabling STAS today to see if anything changed after the update.