This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

QOS recommendations to combat bufferbloat

I have a network at home with 3 VLANS and wired into an Atom based appliance running Sophos XG Home.  The traffic on the network is a mixture of IOT, Windows 10, Server 2022 and such.  Netflix, Amazon Prime, etc. for family internet usage.

Connection is Virgin 100/10 cable connection, are there general QOS recommendations for applying against rules etc?  Bufferbloat is a problem on the connection, but traffic shaping rules haven't been enabled as of yet.

Speed isn't the issue, it's latency..



This thread was automatically locked due to age.
Parents
  • The documentation isn't as clear as it could be and I made some observations based on my experience in another thread that you might find helpful: https://community.sophos.com/sophos-xg-firewall/f/discussions/133871/traffic-shaping-q-about-total-bandwidth-etc

    It's also not clear to me that QoS on your firewall can mitigate download buffer bloat very well, depending on your ISP and the task at hand. With IPv4 there are only indirect mechanisms for throttling the far end, which don't work under all circumstances. Also, you may be trading off maximum download speed for lower/more-consistent latency. In my case, that's a bad tradeoff. If I was in a household with 4 people streaming high-def video while I'm trying to play a high-twitch game, it would matter. But it would also slow me down during the day doing large uploads/downloads -- as far as my experiments indicate.

    None the less, I have set things up using clientless users, as Prism recommended above -- and even if you're not doing QoS, it's the way to organize lots of stuff in SFOS -- and in my case giving priority to certain streaming clients and applications, and I get an A on the buffer bloat test. It's much higher (in +ms) than Prism's screenshot above but I get checkmarks on everything except low-latency gaming (which I don't do).

  • Thank you for this information. Cake, in particular, tries to make this simple, but it really helps, since most networks are very asymmetric, to be able to control both the download and upload. Usually it's the upload that's the biggest problem. If you can't do those things separately, you are not going to see much benefit.

    cake essentially only needs a "bandwidth" parameter to make things work better (up and down). It does per host fair queuing, which makes doing detailed rules less necessary, and diffserv (or other) forms of prioritization for the few other things that might need it. It is a successor the the htb+fq_codel based sqm-scripts.

  • according to @Prism 's post you can set up and down separately. htb+fq_codel is pretty good, but his report is sooooo good, it looks to me like it's cake. :) I am one of the authors of both subsystems. Anyway, some doc on cake: https://arxiv.org/abs/1804.07617

Reply Children
  • You can set the limits in the policies separately. However, my experiments indicate that the overall bandwidth number that you specify in the main settings is used and it is a single number not an up and a down. And if I set it low enough that I have Prism-like buffer bloat it seriously degrades my maximum download speed.

    This could be because I'm leaning on a default somewhere instead of specifying every possible thing explicitly, but it is what I found. As I said, I get an A on that test and also get max download speeds. In my use case it works.

    I'm also a bit skeptical of buffer bloat tests which use ping, since pings are often dropped or lowest-priority, so "buffer bloat" could simply be your system working properly and prioritizing actual traffic over pings. At least I wonder about that being the case.

  • However, my experiments indicate that the overall bandwidth number that you specify in the main settings is used and it is a single number not an up and a down

    If your talking about the main QoS settings then don't use It, set It at the highest limit possible "2560000" and only use custom QoS policies, doing this fixed almost all issues I've had with traffic shaping.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v20 GA @ Home

    XG 115w Rev.3 8GB RAM v19.5 MR3 @ Travel Firewall

  • It sounds to me like you turned off actual throttling by setting Total Available WAN Bandwidth so high that it never has an effect -- i.e. the queuing never actually does any tradeoffs -- and then used lots of limits. My experiments seemed to indicate that having Enforce Guaranteed Bandwidth disabled results in no guaranteed bandwidth even when specified in a QoS policy somewhere else. But it's really hard to tell if gurarantees are working -- unlike limits which are easy to test.

    I have something more like:

    where 36000 is more realistically the bandwidth I actually have for download, plus a little more. (I should probably decrease it to what I actually have. But it really does seem to apply to combined upload/download, which is crazy.)

  • Wayne Folta said:
    i.e. the queuing never actually does any tradeoffs

    it does work as expected, but you need to rely on QoS policies to get what you want.

    Wayne Folta said:
    My experiments seemed to indicate that having Enforce Guaranteed Bandwidth disabled results in no guaranteed bandwidth even when specified in a QoS policy somewhere else.

    You need to use the guarantee option within the QoS policy, not on the main QoS setting, then be sure to check the priorities of each QoS policy, here's a snippet from the Docs.

    The default settings offer best-effort bandwidth *and place an upper limit on the QoS*. The default settings apply to traffic to which no traffic shaping policy applies.

    I've tested this multiple times, currently on my setup there's only four QoS policies, each one used for a certain clientless group.

    • The main one for WiFi devices such as laptops and phones. It has a priority (0) with guarantee bandwidth using 90% of my download/upload speeds separately.
    • A server that I have, mostly used for storage and VM/Containers. It has a priority (1) while limiting the bandwidth.
    • A policy for containers/services with priority (2) while limiting the bandwidth.
    • A policy for P2P (A torrent container with clientless auth) with priority (7) while limiting the bandwidth.

    All those policies work as expected, if I start a speedtest on my laptop while downloading something over P2P over the torrent container, I can see the traffic immediately being limited on the container and being guaranteed to my laptop.

    *From personal experience, you will have issues with guaranteed traffic shaping if you have dozens of QoS policies or multiple guarantee policies with the same priority, but I don't expect any home user to go that far.

    EDIT: If you can, please do more testing too with QoS policies, if you find anything weird or that isn't working as expected please post in the community.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v20 GA @ Home

    XG 115w Rev.3 8GB RAM v19.5 MR3 @ Travel Firewall

  • It feels like your use case is what I was imagining: one large class of items with (shared) reserved bandwidth, and then all of your other QoS rules are limits and down-prioritizing for secondary chunks of items. So your use case doesn't really need a Default since your Default is laptops with reserved bandwidth. In my case I feel like there are competing high-priority classes and what I'm doing seems to work well -- though not perfectly.

    My use case is: VoIP should have highest priority and guaranteed (but low) bandwidth. Secondarily movie streaming and video conferencing apps should have high quality (movies more than video conferencing) but not unlimited bandwidth. Third, if neither of those two are operating, everything is general shared bandwidth and a single laptop should be able to use essentially 100% bandwidth for upload or download.

    So mentally I have a few classes:

    • Laptops. High priority, should be able to use 100% of bandwidth as long as it doesn't conflict with other high priority classes.
    • Streaming Video. High priority. An AppleTV, mainly, which should be able to use up to good-quality 4K video (down), but not crazy-high bitrates.
    • VoIP phones. Highest priority, reserved bandwidth, but it's not much compared to everything else.
    • Video Conference apps. High priority. Could run on laptops, phones, tablets. Should have reserved bandwidth and also limits so it's reliable, but no need for anything more than HD for conferencing.
    • Maybe "everything else", whatever that is. Nothing is particularly low-priority or low-bandwidth.

    Which seems different from your use case. But maybe I just don't understand reserved bandwidth and priority. So maybe I can have the the AppleTV be highest priority and have, say, 20% of my bandwidth reserved, and my laptops have second-highest priority and 100% of my bandwidth reserved and if I'm downloading a 50GB file on my laptop but the AppleTV isn't streaming a movie, the laptop gets as close to 100% as it can? But if the AppleTV is streaming a movie that truly does take 20% of my bandwidth and I download on the laptop, it will at most get 80% of my bandwidth?

    That is, I'm "reserving" 110 (or maybe more like 150) percent of my actual bandwidth but prioritization means that not everyone gets their reservation? That might be what I'm missing here, and I've been assuming I need a Default to make it happen.

    Also, I think this depends on some fancy footwork with User Groups. For example, I'd probably need to break my current "Clientless Open Group" into three parts: a) "Clientless Streaming Group", the AppleTV, b) "High Bandwidth" laptops, and c) "Clientless Other Group". That way I could reserve 100% bandwidth for Clientless High Bandwidth Group" at a lower priority than the 20% reserved for the "Clientless Streaming Group". Then, as Prism mentioned, Application-based rules are higher priority than these User (Group) based rules, so video conferencing apps on tablets and phones should work too.

    I think I'll try this.

  • nearly everything you just described is exactly how cake diffserv4 operates by default. https://man7.org/linux/man-pages/man8/tc-cake.8.html  :)

  • Is cake used by SFOS currently? Using the approach I currently am (one user-based QoS rule, two rule-based, two application-based plus using the Default) I have buffer bloat that's not good enough for high-twitch online games but is good enough for anything else and it doesn't appear to interfere with full-bore laptop domination whenever there's not something else that requires it.

    Though it's extremely difficult to test guarantees, as opposed to limits which are obvious with just a simple speed test.

  • I have no idea. Cake went into the linux kernel 4 years ago, was available out of tree for 8. It is certainly possible to construct all the rules you want with it, but unless you have the same kind of quality fq throughout those rules that cake does, not as possible to get where you want to be with twich games. The other sophos product just did fq_codel.  It would be nice to know what lies underneath...

  • It is very hard using just web tools to measure what's going on. We have a professional tool called flent.org that can get very fine grained measurements, which when accompanied by packet captures (wireshark) and tcptrace/xplot,org can tell you loads about what's going on.

  • OK, I reworked everything to work I think the way @Prism does it. I am getting a lower maximum laptop download speed -- I think because Streaming Video's (priority 0) reserved bandwidth trumps High Bandwidth's (priority 1) reservation and so 20Mbps (Streaming Video's reserved download) is being left on the table. The other way didn't do this, but to be honest maybe what I thought was reserved for streaming video wasn't actually reserved. Or maybe priorities were equal (though done in a much different manner)?

    The good news is that I did lower loaded latency considerably. The bad news is that download loaded latency is +10 ms (was +29 ms) which still not low enough for an A+, but it's quite nice. Thanks @Prism!