OBS branch with AMD VCE support.

jackun

Developer
Oh wow theres a lot of choices to download... Does it matter which one I get? I have a 7950 and that means vce 1, will any of them work or are there some on the front page I shouldn't download?
Get the latest. Others are for just in case new version breaks something for someone etc. Which reminds me to upload v0.64.
 

dping

Active Member
Oh wow theres a lot of choices to download... Does it matter which one I get? I have a 7950 and that means vce 1, will any of them work or are there some on the front page I shouldn't download?
any will. Just dont set b frames to 1. Even if you do ill just throw an error and continue streaming. Get the latest if i were you. Its the most up to date since jackun ports over the builds from the main obs then adds in his changes.

The problem you're having is nothing to do with VCE and everything to do with either how you capture or other settings.
who are you referring to. Im a bit confused...
 

Balrogos

New Member
Well instead using my AMD Radeon graphics card AMD VCE using hardly my CPU to decode which is less efficient than x264[cpu]

This is very very bad Intel Quick Sync not working also amd VCE ;/
 

jackun

Developer
nice! Only 2 months late. Oh and what's that app SDK 3.0 beta thing?

App SDK has the OpenCL, C++AMP/Bolt stuff. But the "ExtraData" property that returns initial SPS/PPS NALs returns nothing with v1.1, ugh.

Some new settings but of course they are missing from headers/undocumented.

20:00:04: AspectRatio = <rate>1/1
20:00:04: BPicturesDeltaQP = 4
20:00:04: BPicturesPattern = 3
20:00:04: BReferenceEnable = <bool>1
20:00:04: CABACEnable = <bool>1
20:00:04: CodecId = 5
20:00:04: ConstraintSetFlags = 0
20:00:04: DeBlockingFilter = <bool>1
20:00:04: EnforceHRD = <bool>1
20:00:04: EngineType = 0
20:00:04: ExtraData = <empty>
20:00:04: FillerDataEnable = <bool>1
20:00:04: FrameRate = <rate>60/1
20:00:04: FrameSize = <size>1920x1080
20:00:04: GOPSize = 30
20:00:04: HalfPixel = <bool>1
20:00:04: HeaderInsertionSpacing = 120
20:00:04: IDRPeriod = 120
20:00:04: InitialVBVBufferFullness = 64
20:00:04: InstanceID = -1
20:00:04: IntraRefreshMBsNumberPerSlot = 0
20:00:04: LowLatencyInternal = <bool>0
20:00:04: MGSKeyPicturePeriod = 0
20:00:04: MGSVector0 = 0
20:00:04: MGSVector1 = 0
20:00:04: MGSVector2 = 0
20:00:04: MGSVector3 = 0
20:00:04: MGSVectorMode = <bool>0
20:00:04: MaxAUSize = 0
20:00:04: MaxMBPerSec = 616680
20:00:04: MaxNumRefFrames = 4
20:00:04: MaxOfLTRFrames = 0
20:00:04: MaxQP = 51
20:00:04: MaxSliceSize = 2147483647
20:00:04: MinQP = 5
20:00:04: NominalRange = <bool>0
20:00:04: NumOfQualityLayers = 0
20:00:04: NumOfTemporalEnhancmentLayers = 0
20:00:04: PeakBitrate = 3400000
20:00:04: Profile = 100
20:00:04: ProfileLevel = 41
20:00:04: QPB = 20
20:00:04: QPI = 20
20:00:04: QPP = 20
20:00:04: QualityEnhancementMode = 0
20:00:04: QualityPreset = 0
20:00:04: QuarterPixel = <bool>1
20:00:04: RateControlMethod = 1
20:00:04: RateControlSkipFrameEnable = <bool>0
20:00:04: ReferenceBPicturesDeltaQP = 4
20:00:04: ScanType = 0
20:00:04: SliceControlMode = 0
20:00:04: SliceControlSize = 0
20:00:04: SliceMode = 1
20:00:04: SlicesPerFrame = 1
20:00:04: TL0.QL0.BPicturesDeltaQP = 4
20:00:04: TL0.QL0.EnforceHRD = <bool>1
20:00:04: TL0.QL0.FillerDataEnable = <bool>0
20:00:04: TL0.QL0.FrameRate = <rate>30/1
20:00:04: TL0.QL0.GOPSize = 60
20:00:04: TL0.QL0.InitialVBVBufferFullness = 64
20:00:04: TL0.QL0.MaxAUSize = 0
20:00:04: TL0.QL0.MaxQP = 51
20:00:04: TL0.QL0.MinQP = 22
20:00:04: TL0.QL0.PeakBitrate = 10000000
20:00:04: TL0.QL0.QPB = 22
20:00:04: TL0.QL0.QPI = 22
20:00:04: TL0.QL0.QPP = 22
20:00:04: TL0.QL0.RateControlMethod = 2
20:00:04: TL0.QL0.RateControlSkipFrameEnable = <bool>1
20:00:04: TL0.QL0.ReferenceBPicturesDeltaQP = 2
20:00:04: TL0.QL0.TargetBitrate = 10000000
20:00:04: TL0.QL0.VBVBufferSize = 1000000
20:00:04: TL0.QL1.BPicturesDeltaQP = 4
20:00:04: TL0.QL1.EnforceHRD = <bool>1
20:00:04: TL0.QL1.FillerDataEnable = <bool>0
20:00:04: TL0.QL1.FrameRate = <rate>30/1
20:00:04: TL0.QL1.GOPSize = 60
20:00:04: TL0.QL1.InitialVBVBufferFullness = 64
20:00:04: TL0.QL1.MaxAUSize = 0
20:00:04: TL0.QL1.MaxQP = 51
20:00:04: TL0.QL1.MinQP = 22
20:00:04: TL0.QL1.PeakBitrate = 10000000
20:00:04: TL0.QL1.QPB = 22
20:00:04: TL0.QL1.QPI = 22
20:00:04: TL0.QL1.QPP = 22
20:00:04: TL0.QL1.RateControlMethod = 2
20:00:04: TL0.QL1.RateControlSkipFrameEnable = <bool>1
20:00:04: TL0.QL1.ReferenceBPicturesDeltaQP = 2
20:00:04: TL0.QL1.TargetBitrate = 10000000
20:00:04: TL0.QL1.VBVBufferSize = 1000000
20:00:04: TL1.QL0.BPicturesDeltaQP = 4
20:00:04: TL1.QL0.EnforceHRD = <bool>1
20:00:04: TL1.QL0.FillerDataEnable = <bool>0
20:00:04: TL1.QL0.FrameRate = <rate>30/1
20:00:04: TL1.QL0.GOPSize = 60
20:00:04: TL1.QL0.InitialVBVBufferFullness = 64
20:00:04: TL1.QL0.MaxAUSize = 0
20:00:04: TL1.QL0.MaxQP = 51
20:00:04: TL1.QL0.MinQP = 22
20:00:04: TL1.QL0.PeakBitrate = 10000000
20:00:04: TL1.QL0.QPB = 22
20:00:04: TL1.QL0.QPI = 22
20:00:04: TL1.QL0.QPP = 22
20:00:04: TL1.QL0.RateControlMethod = 2
20:00:04: TL1.QL0.RateControlSkipFrameEnable = <bool>1
20:00:04: TL1.QL0.ReferenceBPicturesDeltaQP = 2
20:00:04: TL1.QL0.TargetBitrate = 10000000
20:00:04: TL1.QL0.VBVBufferSize = 1000000
20:00:04: TL1.QL1.BPicturesDeltaQP = 4
20:00:04: TL1.QL1.EnforceHRD = <bool>1
20:00:04: TL1.QL1.FillerDataEnable = <bool>0
20:00:04: TL1.QL1.FrameRate = <rate>30/1
20:00:04: TL1.QL1.GOPSize = 60
20:00:04: TL1.QL1.InitialVBVBufferFullness = 64
20:00:04: TL1.QL1.MaxAUSize = 0
20:00:04: TL1.QL1.MaxQP = 51
20:00:04: TL1.QL1.MinQP = 22
20:00:04: TL1.QL1.PeakBitrate = 10000000
20:00:04: TL1.QL1.QPB = 22
20:00:04: TL1.QL1.QPI = 22
20:00:04: TL1.QL1.QPP = 22
20:00:04: TL1.QL1.RateControlMethod = 2
20:00:04: TL1.QL1.RateControlSkipFrameEnable = <bool>1
20:00:04: TL1.QL1.ReferenceBPicturesDeltaQP = 2
20:00:04: TL1.QL1.TargetBitrate = 10000000
20:00:04: TL1.QL1.VBVBufferSize = 1000000
20:00:04: TL2.QL0.BPicturesDeltaQP = 4
20:00:04: TL2.QL0.EnforceHRD = <bool>1
20:00:04: TL2.QL0.FillerDataEnable = <bool>0
20:00:04: TL2.QL0.FrameRate = <rate>30/1
20:00:04: TL2.QL0.GOPSize = 60
20:00:04: TL2.QL0.InitialVBVBufferFullness = 64
20:00:04: TL2.QL0.MaxAUSize = 0
20:00:04: TL2.QL0.MaxQP = 51
20:00:04: TL2.QL0.MinQP = 22
20:00:04: TL2.QL0.PeakBitrate = 10000000
20:00:04: TL2.QL0.QPB = 22
20:00:04: TL2.QL0.QPI = 22
20:00:04: TL2.QL0.QPP = 22
20:00:04: TL2.QL0.RateControlMethod = 2
20:00:04: TL2.QL0.RateControlSkipFrameEnable = <bool>1
20:00:04: TL2.QL0.ReferenceBPicturesDeltaQP = 2
20:00:04: TL2.QL0.TargetBitrate = 10000000
20:00:04: TL2.QL0.VBVBufferSize = 1000000
20:00:04: TL2.QL1.BPicturesDeltaQP = 4
20:00:04: TL2.QL1.EnforceHRD = <bool>1
20:00:04: TL2.QL1.FillerDataEnable = <bool>0
20:00:04: TL2.QL1.FrameRate = <rate>30/1
20:00:04: TL2.QL1.GOPSize = 60
20:00:04: TL2.QL1.InitialVBVBufferFullness = 64
20:00:04: TL2.QL1.MaxAUSize = 0
20:00:04: TL2.QL1.MaxQP = 51
20:00:04: TL2.QL1.MinQP = 22
20:00:04: TL2.QL1.PeakBitrate = 10000000
20:00:04: TL2.QL1.QPB = 22
20:00:04: TL2.QL1.QPI = 22
20:00:04: TL2.QL1.QPP = 22
20:00:04: TL2.QL1.RateControlMethod = 2
20:00:04: TL2.QL1.RateControlSkipFrameEnable = <bool>1
20:00:04: TL2.QL1.ReferenceBPicturesDeltaQP = 2
20:00:04: TL2.QL1.TargetBitrate = 10000000
20:00:04: TL2.QL1.VBVBufferSize = 1000000
20:00:04: TL3.QL0.BPicturesDeltaQP = 4
20:00:04: TL3.QL0.EnforceHRD = <bool>1
20:00:04: TL3.QL0.FillerDataEnable = <bool>0
20:00:04: TL3.QL0.FrameRate = <rate>30/1
20:00:04: TL3.QL0.GOPSize = 60
20:00:04: TL3.QL0.InitialVBVBufferFullness = 64
20:00:04: TL3.QL0.MaxAUSize = 0
20:00:04: TL3.QL0.MaxQP = 51
20:00:04: TL3.QL0.MinQP = 22
20:00:04: TL3.QL0.PeakBitrate = 10000000
20:00:04: TL3.QL0.QPB = 22
20:00:04: TL3.QL0.QPI = 22
20:00:04: TL3.QL0.QPP = 22
20:00:04: TL3.QL0.RateControlMethod = 2
20:00:04: TL3.QL0.RateControlSkipFrameEnable = <bool>1
20:00:04: TL3.QL0.ReferenceBPicturesDeltaQP = 2
20:00:04: TL3.QL0.TargetBitrate = 10000000
20:00:04: TL3.QL0.VBVBufferSize = 1000000
20:00:04: TL3.QL1.BPicturesDeltaQP = 4
20:00:04: TL3.QL1.EnforceHRD = <bool>1
20:00:04: TL3.QL1.FillerDataEnable = <bool>0
20:00:04: TL3.QL1.FrameRate = <rate>30/1
20:00:04: TL3.QL1.GOPSize = 60
20:00:04: TL3.QL1.InitialVBVBufferFullness = 64
20:00:04: TL3.QL1.MaxAUSize = 0
20:00:04: TL3.QL1.MaxQP = 51
20:00:04: TL3.QL1.MinQP = 22
20:00:04: TL3.QL1.PeakBitrate = 10000000
20:00:04: TL3.QL1.QPB = 22
20:00:04: TL3.QL1.QPI = 22
20:00:04: TL3.QL1.QPP = 22
20:00:04: TL3.QL1.RateControlMethod = 2
20:00:04: TL3.QL1.RateControlSkipFrameEnable = <bool>1
20:00:04: TL3.QL1.ReferenceBPicturesDeltaQP = 2
20:00:04: TL3.QL1.TargetBitrate = 10000000
20:00:04: TL3.QL1.VBVBufferSize = 1000000
20:00:04: TargetBitrate = 3400000
20:00:04: Usage = 0
20:00:04: VBVBufferSize = 3400000


Updated a little: DX11 buffers where not unlocked.
 
Last edited:

dping

Active Member
App SDK has the OpenCL, C++AMP/Bolt stuff. But the "ExtraData" property that returns initial SPS/PPS NALs returns nothing with v1.1, ugh.

Some new settings but of course they are missing from headers/undocumented.




Updated a little: DX11 buffers where not unlocked.
ill have to dig in a bit. Hopefully they will publish documentation updates on it.

Also, i thought the max for profile 4.1 was 1088@30? 4.2 can do 1088@68 max. Could you include a profile selector with the next version?
 
D

Deleted member 30350

Just curious - and it doesn't even have anything to do with OBS - I thought the VCE encoding was done by specific part of the GPU which is separate from all the rendering stuff, but when I try to record some game footage with Afterburner, using VCE as the encoding method, my fps dropped by like 25. How does it really work then?
 

dping

Active Member
Just curious - and it doesn't even have anything to do with OBS - I thought the VCE encoding was done by specific part of the GPU which is separate from all the rendering stuff, but when I try to record some game footage with Afterburner, using VCE as the encoding method, my fps dropped by like 25. How does it really work then?

It still uses up GPU memory bandwidth if DX11 is selected. host/DX9 uses system memory instead which is already shared and not as reliant on game rendering
 

jackun

Developer
Random googling. TLDR: best perf when CPU/GPU work async, asking for front buffer data to capture it makes CPU wait for GPU to finish rendering so CPU can't schedule new commands for GPU and so GPU waits for CPU again probably, etc.

Behind the scenes stuff roughly:
  1. game renders to front buffer
  2. copy FB to separate texture that CPU has access to
  3. copy pixels from texture to RAM
  4. convert BGRA to NV12 with CPU (OBS does it with 2 threads and from YUV444 (32bits))
  5. copy NV12 buffer back to GPU
  6. VCE is told to get next frame from NV12 buffer
With OpenCL interop, OBS remove step 3 and 5 and does step 4 on GPU too but then there's the rendering/compute scheduling issue.
 

dping

Active Member
Random googling. TLDR: best perf when CPU/GPU work async, asking for front buffer data to capture it makes CPU wait for GPU to finish rendering so CPU can't schedule new commands for GPU and so GPU waits for CPU again probably, etc.

Behind the scenes stuff roughly:
  1. game renders to front buffer
  2. copy FB to separate texture that CPU has access to
  3. copy pixels from texture to RAM
  4. convert BGRA to NV12 with CPU (OBS does it with 2 threads and from YUV444 (32bits))
  5. copy NV12 buffer back to GPU
  6. VCE is told to get next frame from NV12 buffer
With OpenCL interop, OBS remove step 3 and 5 and does step 4 on GPU too but then there's the rendering/compute scheduling issue.

Well that somewhat makes. I am no programmer but I dont know when the last time an older brother waited on a younger one :)
 

Lucil

Member
It still uses up GPU memory bandwidth if DX11 is selected. host/DX9 uses system memory instead which is already shared and not as reliant on game rendering

so on games that are high vram intensive its better to go host/dx9 when you are only you sing some of your system ram or have large amounts of system ram?

ie. 290x with 4x4gb 1600hz (16gb ram) on a windows 8.1x64 machine
 

dping

Active Member
so on games that are high vram intensive its better to go host/dx9 when you are only you sing some of your system ram or have large amounts of system ram?

ie. 290x with 4x4gb 1600hz (16gb ram) on a windows 8.1x64 machine
Jackun's description is better. basically with DX10/11 the CPU pass stuff back and forth and wait on eachother. when its in host, that doesn't happen?
 

Scyna

New Member
Is b frames a shared settings? i didn't see a messed up video when using x264 and 1 b frame. If you have any b frames test builds I can test with my 290.
 

jackun

Developer
Jackun's description is better. basically with DX10/11 the CPU pass stuff back and forth and wait on eachother. when its in host, that doesn't happen?
It happens in every mode except when using OpenCL interop (unless drivers or AMF do something screwy in the background).

@Scyna B-frames are support by AMF only and streaming may not work. Otherwise any new builds that have the setting.
 

halsoy

New Member
Hey. First of, love this thing. I read on the first few pages that some people had issues with a green line at the bottom. Also read that there was a fix for it. Said file is no longer available. My problem is that I have the green line on the vertical axis. As seen here (the solid green line to the right):

kfSOfqW.png


The thing is, this line only happens when I'm streaming. If I use OBS with the exact same settings to record to a local file, this doesn't happen. As can be seen here (this is all recorded):

JLSN2BN.png


So I'm not entirely sure where the issue is here. It's not a super big deal, but if there was a fix for it, that would be super dandy.

I may add that I play on a 21:9 monitor (2560x1080) that gets downscaled to 1706x720 when streaming.
 
Top