To use this as some kind of a blog post for both notes to myself and people who wonder about this: I am currently working on integrating more compression options. Since with NDI and Teleport we already have 2 lossy options I was prioritizing lossless with Beam to fill that gap, however, there will also be a JPEG lossy option in the next release and maybe others in future releases.
I am sure for most people the important point is that the sender, which is usually the gaming PC, has the smallest possible performance impact. When offloading resource usage for stream/recording encoding to a secondary machine is the whole point, then obviously resource usage from sending the feed there should be as small as possible, otherwise it would eat up all the benefits of such a setup. This is why I always look at resource usage first when trying new compression methods.
Unlike what people seem to think I am not really knowledgeable in this area, I only ever started to look into this whole topic when I started to work on Beam 7 weeks or so ago and now I just experimented a lot to see what works and what doesn't, like a kid with its new toy. My preliminary impression is that with QOI I seem to be already quite close to the optimum that can be done with lossless image compression while trying to have a low CPU usage at the same time.
I recently tried WebP, and while that peeked at 170 mbps bandwidth usage compared to QOI at 270 mbps in my test video loop, the CPU usage was so insanely high that I couldn't even correctly measure it, it just couldn't keep up and dropped frames all the time. The case for WebP would probably be "don't care about CPU usage, but need binary lossless and the lowest possible bandwidth", but I don't see where that would be sensible in practice. Therefore WebP will not make it into a release, feel free to try and change my mind by showing a scenario where this actually could make sense.
The next thing I tried was JPEG lossless, using the same libjpeg-turbo library that Teleport uses, but in the newer beta 3 version, which introduces support for lossless. The CPU usage is better than with WebP, albeit still too high to be useful, and the bandwidth usage is even worse than with QOI, peeking at a little over 300 in the test loop, making it lose in all relevant aspects against QOI.
The only reason why it will probably still be in future releases is that I want to offer JPEG lossy anyway, for BGRA and Windows it's already implemented, and offering JPEG lossless in addition is literally just an additional checkbox in the settings and a handful of extra code lines, it's more effort to remove it again than to just keep it as it is now. I might reconsider if people start trying to use it the wrong way or with wrong expectations and keep on annoying me about it.
Also QOI is known to compress technical/artificial content like my gaming test loop especially well, so my comparison might be totally unfair, there might be content where JPEG lossless produces a lot better results.
Here's how the settings window is currently looking:
JPEG is mutually exclusive with the other options, selecting it will force-uncheck the other options and vice-versa, it just doesn't make sense to combine them.
The "QOI compression level" logic has a breaking change in the next release, currently level 1 would skip compression for 50% of frames (and you can't skip more than that), with the new logic level 5 would skip 50%, level 1 would compress only 10% of frames and level 10 stays the same and compresses all of them. By this it's giving an even higher range of control over the CPU vs. bandwidth usage trade-off, e.g. if you can almost do raw but just lack that tiny bit of extra bandwidth, level 1 might solve that now.
This makes sense for QOI, because whether compressed or not, ultimately all of the frames on the receiver will be exactly the same as the original frames from the sender. Meaning there will be no way to tell a difference between frames that were originally sent compressed and raw frames.
However, with this all being the toy for the kid I am I did something funny and also left this logic in place for JPEG lossy. It's an entirely different story there, because it means that e.g. with level 5 half of the frames will have a visual degradation from the lossy compression, while every 2nd frame retains the original quality. At 30 fps this means the stream alternates between full and lowered quality from frame to frame, 15 times per second. My assumption would be that there is a 95% chance that this doesn't make sense to configure the CPU/bandwidth usage trade-off compared to just using the JPEG quality option for that, but I still wanted to play with it just for fun, try and see whether it's actually visible to the human eye especially combined with lower JPEG quality and how the human eye interprets it. I also wonder what this means for further procession of the stream, e.g. for x264/x265 encoding. Do the encoders perform bad with this, because it creates a difference between 2 frames that otherwise wouldn't even have been that different, or maybe it doesn't matter too much, since these encoders also apply a lossy logic so the 2 frames end up being very similar anyway? I guess I will find out
When looking at other lossless compression options, there indeed are still a few more interesting candidates. First and foremost there is QOIR, which is an enhanced version of QOI, but better in all aspects, producing a better compression ratio with smaller CPU usage, at least in theory. Swapping QOI for this would be an obvious choice. Unfortunately it's also a lot more complex than QOI, so a pure managed C# implementation as I have for QOI right now doesn't seem to be feasible, I will probably have to include it as a separate native library, making it harder to use cross-platform.
The QOIR author has also produced some benchmarks that are interesting for me
here, because from that table fpng(e) and LZ4PNG could provide a bit higher throughput while sacrificing some compression rate for it, so I definitely want to have a look into that.
Losslessly compressing on the GPU (e.g. with nvComp) might also be interesting, but I have absolutely zero experience with GPU coding, so I don't know whether I really want to go down that route and what to expect from it. I could also try to utilize NVENC (H.264/H.265/AV1) for a lossy GPU based option.
I have yet to decide on whether after finishing all these experiments only a few algorithms will survive and stay (there's still always the compression level option to adapt it to specific needs) or whether I keep that many of them. It will depend on whether I get the impression that they perform very different based on the content they are applied to or whether that doesn't matter a lot. My favorite option would be to eventually keep only one lossy and one lossless compression (The Best™ of each for most common scenarios) because that also means less work maintaining them in the future.
Other things I have been working on and continue to work on right now is enhanced frame buffering/sorting logics, my current code already is a bit improved compared to 0.6.0, which can still produce frames in the wrong order when not synced to the OBS render thread, causing issues in OBS.
And of course I also want to implement a filter solution that can be applied to a single source. I realize that this is probably something that many people would want more than all of the other things mentioned here, but for me it makes sense to do this later to save me from having to do stuff twice for the output and the filter.