Experimental zero-copy screen capture on Linux

w23 · Mar 3, 2019

I was having performance issues when streaming livecoding heavy GLSL raymarching and pathtracing shaders, so instead of optimizing my shaders I read about libdrm, KMS, DMA-BUF and EGL. As a result, I made a very experimental zero-copy screen capture OBS plugin for Linux based on DMA-BUF fds and EGLImages.
It does solve most of my performance woes, and you can see this screen capture method in action here: https://youtu.be/L2Y7_vBfWm8 or here https://www.twitch.tv/videos/389112056
Notice how even for heavy shaders it still maintains real-time fps. Vanilla OBS with XSHM would struggle capturing it and would barely keep ~10 fps, even though the shader itself can be as high as 40-60fps.

Background
There's this DRM infrastructure that is used to talk to GPUs in a vendor-agnostic way on Linux. Among other things it incorporates:

Kernel modesetting (KMS), a technology to enumerate and control video outputs, their video modes and framebuffers
DMA-BUF objects, basically handles to GPU-side memory, e.g. framebuffers and textures. Handles to these are file descriptors.

Then there's a EGL interface that can be used to manage OpenGL contexts and resources. It is analogous to GLX and WGL, but is newer, better, more portable and with more extensions.
Among these extensions we can find:

EGL_EXT_image_dma_buf_import, it allows creating EGLImage objects bound to existing DMA-BUF object (using its fd)
GL_OES_EGL_image, it allows binding GL textures to EGLImage objects. Note that while it is technically an OpenGL ES extenstion, it is exposed in Mesa implementation of desktop OpenGL, and works there just fine on opensource amdgpu and intel drivers.

I bet you can see that all of these fit together perfectly. Moreover, this can be used to capture not only X11, but everything else that is backed by KMS, including Wayland and even bare terminals!

One caveat though is that getting framebuffer DMA-BUF fd requires CAP_SYS_ADMIN, and running OBS with what basically amounts to root privileges is a YOLO practice.
However, you can remember such things as UNIX sockets, sendmsg() functions and SCM_RIGHTS flags. These can be used to transfer open file descriptors between processes. This also works on DMA-BUF fds, and so we can construct a small binary with `setcap cap_sys_admin+ep` with only purpose to get the framebuffer fd and transfer it to whoever.

Now we can make it work.

Overview
Necessary changes to OBS are:

Use EGL instead of GLX to initialize OpenGL context. This is a change to libobs-opengl library.
Create a new source plugin that creates a GL texture from EGLImage created from DMA-BUF fd read from UNIX socket.

You can find these changes in this branch: https://github.com/w23/obs-studio/tree/linux-libdrm-grab

A setuid/cap_sys_admin+ep utility is also needed to send the dma-buf fd.
It can be found in a separate repository: https://github.com/w23/drmtoy/tree/wip-drm-send-recv

How to

Now, as I got it to work only yesterday, this is highly experimental.
It is painful to set up at the time of writing.

It is assumed that you have all the necessary development libraries and tools installed on your system

Get drmtoy enum and drmsend

Code:

git clone https://github.com/w23/drmtoy.git
cd drmtoy
git checkout wip-drm-send-recv
make enum drmsend

Find the right framebuffer id for your screen
Code:
```
./build/rel/enum
```
It will output lots of lines. You're interested in the last few lines that will looks something like these:
Code:
```
count_fbs = 2
       0: 0x56
           width=3200 height=1800 pitch=12800 bpp=32 depth=24 handle=0
       1: 0x55
           width=256 height=256 pitch=1024 bpp=32 depth=32 handle=0
```
Here you can see that there's a framebuffer 0x56 sized 3200x1800. This looks like the main screen.
0x55 is 256x256 and doesn't look like anything to me

(a) Run drmsend with elevated privileges. Replace 0x56 with your framebuffer id

Code:

sudo ./build/rel/drmsend 0x56 drmsend.sock &
# chown socket so that your user can access it
USER=$(whoami) sudo chown $USER drmsend.sock

(b) Alternatively you can set the right caps on drmsend and run it under a regular user. Note that you'd need your fs to be mounted without nosuid flag.

Code:

sudo setcap cap_sys_admin+ep ./build/rel/drmsend
sudo chown root ./build/rel/drmsend
./build/rel/drmsend 0x56 drmsend.sock &

Get patched OBS

Code:

git clone https://github.com/w23/obs-studio.git
cd obs-studio
git checkout linux-libdrm-grab

Build patched OBS

Code:

mkdir build
cd build
cmake .. -DUNIX_STRUCTURE=0 -DUSE_EGL=1 -GNinja
ninja

Run it.
Note that -DUSE_EGL=1 forcefully replaces GLX with EGL, and none of GLX-dependent things are patched to support EGL. E.g. XSHM and Xcomp screen capture modules won't work, and the entire linux-capture plugin won't even load due to missing symbols.
Code:
```
cd rundir/RelWithDebInfo/bin/64bit/
```
-p is for portable mode, for it to not mess with your existing obs configuration
If you're feeling adventurous (like I am), back up your obs config and run it w/o -p flag
Code:
```
./obs -p
```
Now you can add "DMA-BUF source" to your scene as you would with any other regular source.
In configuration dialog you need to specify drmsend socket:
1. Click "Browse" and navigate to the directory with drmsend.sock, e.g. where you checked out drmtoy and ran build/rel/drmsend
2. QFileDialog will be unhelpful enough to not show UNIX sockets, so you'd need to type your filename manually, e.g. drmsend.sock and press enter.
3. Now you should have a preview of your framebuffer screen. Note that there's no cursor capture yet.
4. VYGODA

Please test and enjoy!

Notes and questions
There are several things that I want to point out and discuss with experienced OBS developers.

Current experimental implementation makes EGL vs GLX a compile-time choice using -DUSE_EGL=1 cmake argument. It is not that hard to make a separate libobs-opengl-egl.so plugin and make choosing between it and legacy GLX-based libobs-opengl.so a runtime choice based on user preference, like DX vs GL choice on Windows.
XSHM, Xcomp and DMABUF sources will need to be able to detect GLX vs EGL at runtime.
Is there a better way than `obs_get_video_info()` and then checking `graphics_module` name?
There's also an issue of one of GLX or EGL not being available (e.g. on Wayland or older/weird X respectively) on some system.
I haven't looked at how libglad works, but it will likely require splitting libglad into libglad-gl, libglag-glx and libglad-egl. Also, missing symbols will need to be handled gracefully by linux-capture plugin.
linux-dmabuf plugin is a temporary thing. I feel it needs to be integrated into linux-capture w/ added xcursor support. But see note above about dynamic decision of EGL/GLX and symbols.
dmabuf_source module requires EGLDisplay handle, which lives deep inside libobs-opengl/EGL plugin. Is there a recommended way to get handles internal to some graphics impl? I couldn't find one.
Currently dmabuf_source includes graphics-internal.h, declares struct gl_platform itself (copied from gl-x11-egl.c), calls gs_get_context() and chases some pointers (see https://github.com/w23/obs-studio/b...c3cd0f0ba7/plugins/linux-dmabuf/dmabuf.c#L145). This is obviously not sustainable.
Mode changes aren't handled. A the moment I have no idea how to do that, and how current implementation would behave in such case.
I believe that sampling this DMA-BUF-backed texture will read framebuffer memory directly. No synchronization is implemented. This may or may not be a problem.
Obviously this scheme with manually running enum and drmsend (from another repo!) is not very user-friendly. However, making it user-friendly in general case is hard.
One feasibly-looking approach is:
- integrate drmsend into obs repo
- make drmsend perform framebuffer discovery
- dmabuf_source would have picker for framebuffers in its config dialog
- dmabuf_source would spawn drmsend with right arguments itself
Later we could make drmsend even smarter:
- make it not framebuffer-centric, but CRTC (monitor)-centric.
- make it listen on mode change events
- make dmabuf_source listen on drmsend events
This would make it almost work for mode changes, but not quite. Xorg creates one huge framebuffer for multiple monitors, so mode changes would affect monitor positions and cropping coords. I don't think these rules are generalizable with OBS transformations.
Privileged drmsend itself is too public morozov:
- CAP_SYS_ADMIN is too much, we need to patch kernel with something like CAP_DRM_CAPTURE as a more fine-grained capability
- leave control over which users in the system can capture screens into distro packagers hands (they might e.g. add video-capture group or whatever).

w23 · Mar 5, 2019

I've committed a change that allows user selection between GLX and EGL. So now one needs to go to Settings -> Advanced -> Video and select "OpenGL EGL" in order to enable DMABUF source.
linux-capture/XSHM works under both GLX and EGL.
linux-capture/Xcomposite crashes under EGL, so it is disabled in EGL mode for now.

c3r1c3 · Mar 6, 2019

Very interesting, and good talking to you in Discord. I'm excited to see where this goes and hopefully it will lead to a PR for OBS.

So is XSHM more performant when using EGL?

w23 · Mar 6, 2019

c3r1c3 said:
Very interesting, and good talking to you in Discord. I'm excited to see where this goes and hopefully it will lead to a PR for OBS.

Thanks! it was also nice talking to you and other developers. I appreciate the help.

I'm slowly working towards making it a PR, and hope to be able to submit it by this weekend.

c3r1c3 said:
So is XSHM more performant when using EGL?

I haven't profiled that, but I'd expect XSHM under EGL should perform basically the same as under GLX. It is still essentially the same (1) instruct XCB/Xorg to copy image data from GPU into RAM, (2) upload this data into GL texture back on GPU.

I've also added drmsend util into my obs-studio fork, so now it is built as obs-drmsend binary as part of obs build process. Note that adding necessary capabilities is commented out for now, as it requires building w/ root. I will address this later, or just leave it into developers' or packager hands.

Next I plan to add libdrm framebuffer enumeration into dmabuf settings, but will likely have time for that only tomorrow.

w23 · Mar 10, 2019

I just pushed an update that adds a GUI selector for framebuffers. It works by calling obs-drmsend binary that enumerates libdrm resources and gets fds for all available framebuffers and sends these along with metadata to master obs process via unix socket.

obs-drmsend is integrated into the repo and build system, so there's no need to get the drmtoy repo or build anything manually anymore.

Updated instructions:

Get the linux-libdrm-grab branch from https://github.com/w23/obs-studio
Build it as usual: mkdir build && cd build && cmake .. -DUNIX_STRUCTURE=0 -GNinja && ninja. It should pick up EGL if you have it in your system.
cd rundir/RelWithDebInfo/bin/64bit and manually assign obs-drmsend with CAP_SYS_ADMIN: sudo setcap cap_sys_admin+ep ./obs-drmsend. Make sure that you don't have nosuid bit on your filesystem from where you're running obs.
Run obs while being in the same dir: ./obs -p. Go to Settings -> Advanced -> Video and select "OpenGL EGL".
Restart OBS
Add "DMABUF source"
Property screen should appear where you can pick up your framebuffer.
Have a great optimized stream!

Note that some paths are still hardcoded:

GPU is expected to be accessible at /dev/dri/card0
./obs-drmsend is run from current directory
Unix socket will be at /tmp/drmsend.sock

There's a technical difficulty at getting the filename of the GPU used by current EGL context. It may not be accessible at all. Also, it might be possible to grab framebuffers from another GPU, but I don't have a way to test that.

What would be the right path for obs temp stuff?

c3r1c3 · Mar 21, 2019

Depends on what exactly is the temp material...but if I had to direct you to one safe place, the user's home directory, maybe in the hidden obs-studio profile/settings folder:
~/.config/obs-studio

and maybe make a temp folder in there...but really tmp files in Linux-land are placed depending on what they are for.

bandafas · May 14, 2019

As OBS is preforming much the same work as the compositor, so interfaces that allow it to access the hardware buffers where possible much like the compositor would be preferred. Currently this isnt possible with the X server so everything relies on shared memory segments. However this seems a better approach then pipewire which (as I only briefly glanced at) might require copying entire video buffers back and forth. Ideally a Wayland solution would provide access to the underlying gpu buffer or memory buffer for individual windows as well as for the entire composited screen.

gnif · May 22, 2019

Hi guys... I am the developer behind Looking Glass (https://looking-glass.hostfission.com) and what you have done here is very interesting to me. One of the primary performance issues of LG is copy performance between buffers and I believe that DMA-BUF may help with one particular configuration, but before I do a deep dive into implementing support into a kernel module to export a DMA-BUF fd I would like to know if the idea I have is even feasible.

Here is the scenario. We have two Virtual Machines and each have a special virtual device called IVSHMEM which provides a block of contiguous shared memory that is shared across each VM, this is allocated on the host in system ram, but the VM sees it as device RAM and as such it has to be mapped into user space. In one VM (usually windows) we capture the desktop and copy the frame into the shared buffer, and in the 2nd VM we copy the frame from the shared buffer into a GPU texture using EGL. It's this 2nd memcpy where I can see DMA-BUF potentially being viable.

Currently I am alternating between two PBOs using `glBufferSubData` to update the texture, but when using `callgrind` it seems that most of the CPU time is spent in `memcpy` inside `radeonsi_dri`.

So finally, my questions are:
* Should DMA-BUF be viable for this configuration?
* Are you interested in lending a hand to get this working?

Thank you kindly.

edrex · Aug 25, 2019

Looks like CPU overhead is about 5-7% of one core just previewing the one 1080p DMABUF source, with an additional ~30% when recording. That's a huge improvement over wlrobs source under sway.

For comparison ffmpeg just directly encoding from kmsgrab uses about 3% of one core

LIBVA_DRIVER_NAME=iHD ffmpeg -device /dev/dri/card0 -f kmsgrab -i - -vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1920:h=1080:format=nv12' -c:v h264_vaapi -vstats output.mkv

I'm assuming the order-of-magnitude difference between ffmpeg and obs has to do with the compositing that obs does requiring copying frames around more. I'm curious if there are further opportunites for low-hanging optimization of the number of copies between capture and encoding, or does the need for compositing preclude that?

(moved from https://github.com/obsproject/obs-studio/pull/1758#issuecomment-524664079)

w23 · Aug 26, 2019

gnif said:
So finally, my questions are:
* Should DMA-BUF be viable for this configuration?
* Are you interested in lending a hand to get this working?

First of all, thank you for making LG!
Second:
* It depends
* Yes!
Sent you a discord DM.

edrex said:
I'm assuming the order-of-magnitude difference between ffmpeg and obs has to do with the compositing that obs does requiring copying frames around more. I'm curious if there are further opportunites for low-hanging optimization of the number of copies between capture and encoding, or does the need for compositing preclude that?

Thank you for testing!
There aren't any obvious optimizations, the DMA-BUF capture is as bare as possible.
However, it would be interesting to throw a profiler at it, e.g. system-wide `perf` with segmented flamegraphs. I will try to do this on a slower machine, but not very soon.

Leopard1907 · Oct 12, 2019

First of all , thanks for your effort.

How things are going on this project? Is it possible to that we see this as a PR in the near future? And is it for VAAPI backend only or more of a general thing?

w23 · Oct 13, 2019

I've been using this fork for regular streaming since it was written, and it is updated to track upstream OBS master semi-regularly. However, I have barely made any progress towards merging this upstream, unfortunately. The problem is that there are too many things that need to be cleaned up and productized so I'd consider it a proper PR and not a draft.

I will try to find time today or tomorrow to fix most annoying/blocking issues with it and try to get it merged, or at least evaluated by the community. Maybe we could make incremental progress on less major things after it's in master and people actually start using it.

It's kind of unrelated to VAAPI. For VAAPI there's still a GPU->RAM->GPU transfer AFAIK, that would need to be addressed separately. We'll see whether I'd be able to look into that in any future, can't promise anything at this point.

Leopard1907 · Oct 13, 2019

Thanks for the reply. I hope you can manage it. Since we lack stuff like new Nvenc on OBS Linux builds , performance hit is very bad.

CNLohr · Dec 22, 2019

Is there any mechanism to do this within the existing OpenGL framework in Linux? Right now I'm trying to find some way of carefully marshalling opengl textures from another application into OBS without the copies, but more notably in VR, it's difficult to perform synchronization, so I'm handling that on the consuming side of the textures. Conveniently it means I can own both sides of the pipe. Do you have any relatively concise mechanisms or examples of transferring the OpenGL Textures from a producer to the consumer (OBS)?

scaled · Nov 6, 2022

Sorry for necroposting, but after 2 years of fail attempts, i managed to make it work. This is fork of OBS with KMSGrab plugin. I tested it in Debian Testing, Gnome on X11. It's based on code of w23.

GitHub - scaledteam/obs-studio: Fork OBS Studio to enable zero-copy KMSGrab capture

Fork OBS Studio to enable zero-copy KMSGrab capture - scaledteam/obs-studio

github.com

Reason why it's not a plugin, but fork is because new functions in OBS for binding textures from video memory don't work for capturing entire screen. Also, code contain some hacking into OBS. If you have an idea how to separate it into dedicated plugin - you're welcome!

Also plugin require to set special caps for OBS executable (`sudo setcap cap_sys_admin+ep /usr/local/bin/obs`). w23 plugin used a separate executable to get framebuffer id, but turns out framebuffer id updates every frame, so you need to update it as fast as possible. This is why i merged it into plugin itself.

scaled · Jan 7, 2024

I heard that OBS studio has some changes regarding zero-copy capture and encoding. Pipewire capture still don't work right for me it consumes too much CPU and GPU, so i made another version of the same concept, but based on vkcapture plugin from nowrep. It doesn't require to run OBS from root, but companion utility does.

Also, this version is much simplier (just 163 lines of code), and it tracks every screen update, so image in OBS will always be in sync with screen framerate. Most of actual capture and translating done by nowrep software.

I probably should post this much sooner, but it's better than not posing it anywhere. I made it year ago and it works fine for my streams since then.

Install instructions on main page:

GitHub - scaledteam/obs-vkcapture: VKCapture with KMSGrab included

VKCapture with KMSGrab included. Contribute to scaledteam/obs-vkcapture development by creating an account on GitHub.

github.com

diddums · Aug 22, 2024

Just wanted to say huge thanks for this!

I'm running Nobara40 under a wayland/sway setup, and while most games are captured just fine with the packaged gamecapture setup, I run some games through the browser using the nvidia cloud service which obviously can't be captured the same.

The default options of scpy, bmabuf being too resource heavy causing the rending to drop frames, and xcomposite seemingly being capped at 30fps was really frustrating.

However after pulling your tool, syncing it with upstream (i'll lodge a PR) I was back to native full screen capture. Only issue I'm seeing now is because I run 2 monitors off the same GPU it seems the capture is splicing the second monitor for a frame every few minutes or so. I'm not sure if there's additional configuration I'm missing or if I need to update the tool to force certain bounds.

I'm being verbose here in order help with SEO to maybe save time for people googling in the future, which Is how I found it

scaled · Aug 22, 2024

diddums said:
Just wanted to say huge thanks for this!

I'm running Nobara40 under a wayland/sway setup, and while most games are captured just fine with the packaged gamecapture setup, I run some games through the browser using the nvidia cloud service which obviously can't be captured the same.

The default options of scpy, bmabuf being too resource heavy causing the rending to drop frames, and xcomposite seemingly being capped at 30fps was really frustrating.

However after pulling your tool, syncing it with upstream (i'll lodge a PR) I was back to native full screen capture. Only issue I'm seeing now is because I run 2 monitors off the same GPU it seems the capture is splicing the second monitor for a frame every few minutes or so. I'm not sure if there's additional configuration I'm missing or if I need to update the tool to force certain bounds.

I'm being verbose here in order help with SEO to maybe save time for people googling in the future, which Is how I found it

If you using Sway, there is capture tool just for you! It's called wlrobs . Speaking of my plugin, i designed it for single monitor setup, so it captures every screen update, from every monitor. I wanted to upgrade it and add monitor selection screen, but then my life priorities has changed and i abandoned most of my projects. Please try wlrobs, and if it doesn't work, reach out to me in DMs, and maybe i will find time to fix this and update plugin.

diddums · Aug 24, 2024

thanks for the quick response, I do actually have that installed and shows up as scpy, dmabuf recording options, however they introduce significant overhead in the OBS render times causing dropped frames, I'll spend some time looking into that project before bothering you

diddums · Aug 24, 2024

Yeah this bug report seems related. All good though, I'll fiddle with it

~scoopta/wlrobs#26: Average time to render frame much higher with dmabuf capture than pipewire / xdg capture — sourcehut todo

todo.sr.ht

Experimental zero-copy screen capture on Linux

New Member

New Member

Member

New Member

New Member

Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member