Strange drops

We have a customer with a phone switchboard application that periodically freezes, either at an application level (can't click anything), or it just won't show incoming calls. In both cases it can sometimes unfreeze, and then all the calls that have come in in the meantime suddenly flash on the screen. We've ruled out AV as the cause and are now looking into the problem being at the network layer.

drop-packet-capture shows this at the time of freezing:

2017-05-23 08:58:14 0101021 IP 10.10.90.2.8779 > 10.10.10.112.43470 : proto TCP: P 3007061919:3007062115(196) win 330 checksum : 55314
0x0000:  4500 00ec 18b4 4000 3f06 a9d2 0a0a 5a02  E.....@.?.....Z.
0x0010:  <remainder of the packet redacted>
Date=2017-05-23 Time=08:58:14 log_id=0101021 log_type=Firewall log_component=Firewall_Rule log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Lag.90 out_dev=Lag.10 inzone_id=1 outzone_id=8 source_mac=00:1a:e8:8b:15:b4 dest_mac=00:e0:20:11:08:fc l3_protocol=IP source_ip=10.10.90.2 dest_ip=10.10.10.112 l4_protocol=TCP source_port=8779 dest_port=43470 fw_rule_id=0 policytype=1 live_userid=0 userid=0 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=0 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 proxy_flags=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 source_nat_id=0 cluster_node=0 inmark=0x0 nfqueue=101 scanflags=0 gateway_offset=0 max_session_bytes=0 drop_fix=0 ctflags=33554472 connid=2341170016 masterid=0 status=398 state=3 sent_pkts=N/A recv_pkts=N/A sent_bytes=N/A recv_bytes=N/A tran_src_ip=N/A tran_src_port=N/A tran_dst_ip=N/A tran_dst_port=N/A

then the same again exactly 2 minutes later (even the checksum is the same)

The connection came good another minute later.

Any idea where to look next?

thanks

James

  • In reply to MichaelBolton:

    I am so glad to stumble upon this thread.  You are describing my issue to a T!  My users were ready to throw IT out the window and were demanding we get rid of the XG firewall.  We also had days where it worked perfectly such as two days this past week.

    I have also disabled it.

    I am severely frustrated with support.  The reasons I have gotten are: not enough bandwidth, cable bad, switch bad, network issues and just flat out it isn't our problem but yours.

    Thank you gentlemen!

  • In reply to MichaelBolton:

    FWIW, firewall acceleration is enabled on the AP HA setup I've been working with:

    console> system firewall-acceleration show
    Firewall Acceleration is Enabled.

  • My Installation Engineer is telling me that firmware version SFOS 16.05.5 MR5 will fix this issue.  Have any of you installed it?

     

    April

  • In reply to April Beachy:

    Hi April,

    The 2 bugs associated with my case, NC-19062 and NC-19219 are not listed as fixed on the release notes. The engineer on my case did not tell me MR5 has the fix either. I have not installed MR5 yet though to confirm either way. I will probably wait till next week just to make sure others bugs don't pop up.

    Mike

  • I can verify that the update SFOS 16.05.5 MR-5 fixed our connection dropping issue.

     

    Now on to fix the numerous other issues associated with the particular brand of user authentication that the XG firewall uses, STATS. My users call the side effect the white screen of death.

  • In reply to April Beachy:

    Just saw this thread and read through.  Had a customer with the same issues.  HA setup, LAG with VLAN's, drops every few minutes, etc.

    The strange drops were fixed for us by updating STAS (we were still running a v15 variant) and SFOS to v16.05.5 MR5.

    Like April noted, we have some other funky issues but nothing as maddening as the drops.

  • In reply to axsom1:

    DHCP Relay issues have been reported with the newer firmwares so I haven't been game to update yet

  • In reply to jamesharper:

    I've had the same issues.  Updated yesterday to v17 and DHCP relay stopped functioning.  Support had to manually config the relays through the console to get it back up.

    On the drops, I currently have STAS disabled, as it seemed to wreak havoc on the network.  No user rules created, yet every hour or so, it seemed to unauthenticate the users for about a minute, even after increasing the interval.  Seems pretty useless if it's going to unauthenticate the user prior to checking if the user is disconnected.  I feel it should keep the authentication, poll the user and act after the poll.

    Not to mention the DCOM error messages on the event logs of every system, firewalled or not.  I'll be reenabling STAS today to see if anything changed after the update.

  • Hi,

    I've just found this article as I'm suffering the exact same issue with STAT on our XG310 running the latest 17.1 firmware. So it seems the STAT issue is still existing and hasn't been fixed.

    Like others here, I have no User based rules but have been getting strange drop outs of connections and turning STAT off fixes them.

     

    However we had STAT switched on so that the XG could identify users and assist with tracking traffic to users and without STAT turned on the UTQ doesn't appear to report anything.

    Is there another option I can switch on to identify users and get the UTQ working, without causing the dropouts?

     

    I have also got a support ticket opened #8082675 (since a few days after we purchased the device, and nobody in support seemed to be able to help or know about the issue!)

     

    Matthew

  • In reply to M Robinson:

    Hi Matthew,

    The issue still exists, although on our unit it is not as bad as it was after setting the system auth cta unauth-traffic drop-period to 0. Still though, some of our traffic goes through a layer 3 switch for routing to prevent this from causing issues with intervlan routing. From what I am told from product management, in 17.2 there will be the option to turn off the learning period completely which will prevent this from happening. I was originally told maybe July for 17.2 but with 17.1 being so delayed, I bet we won't see 17.2 until October.

    Mike

  • In reply to MichaelBolton:

    Hi,

    Short question - i am not able to read everything sorry - Which facilities do you need with authentication, if you not using userbased policys?  

    I am aware of such a issue but most of the time, i am able to disable the STAS for this XG, because - as mentioned before - no authentication is needed. 

    Cheers

  • In reply to MichaelBolton:

    Not sure if people are still seeing this. We sure are. After a whole lot of digging I found that for us, in the troubleshooting tools on the Advanced tab of the STAS suite application when I tested the IPs of users seeing problems/drops I would get an error. Either RPC server unavailable or similar notification. After a crash course in WMI I found that our PTR records in our DNS servers were not updating and that was causing trouble with the WMI verification. I am in the process of correcting and updating our PTR records in DNS and turning STAS back on at our smaller sites to test things out. So I am not sure if this is the fix for us yet but I wanted to put it out there if it helps others.