# Detours based injection in 27.1.0 and compatibility with third party hooks



## Unwinder (Oct 21, 2021)

Hi

Migration to Detours from your own custom hooking implementation in 27.1.0 is a step in right direction. However, if you use Detours as is and without additional tweaks, it will result in third party hooks compatibility regress. There are two main reasons behind it:


Detours builds hook chains in rather specific way, which will be detected as unsafe by most of other third party hook engines. Detours is always overwriting top level JMP to the hook handler, which was previously installed by other hook. Yes, Detours performs such overwrite correctly and correctly relocates other hook’s JMP to own trampoline (with adjusting relative jump address properly), however other many other hooks are protected against such kind of hook relocation in general and do not tolerate modification of JMP they previously installed. Depending on third party hook implementation and architecture, they can do the following in case of installing Detour on top of their previously installed hook:


Deadlock
Crash
Restore their own jump
For example RTSS is the third case, it will restore original JMP inside the hook handler if it detect that it was relocated. So it will effectively kick OBS Detour out of hook chain, if you inject it after RTSS. To prevent such issues many hook engines are using the idea of safe hook chains, which allows combining hooks of different atchitectures with no problems. The idea is pretty simple: when you’re installing new hook, instead of overwriting the very first instructions of target funtion with JMP (that’s what Detours is doing) you simply unwind whole JMP chain and instead inject hook in the body of function after the very last jump. This way you’re achieving one important thing: you NEVER overwrite jump previously installed by any other hook, so you’re always compatible with any hook engine independently of hook architecture. Happily, Detours can be easily appended with such functionality. Probably Microsoft realized that overwriting other hook’s JMP is a bad idea so there is DetourCodeFromPointer, which is internally called by DetourAttach and which was aimed to skip JMPs and inject function body. However, it only supports 2-byte hotpatch jump and it doesn’t support chained jumps. But you may add your own function extending DetourCodeFromPointer functionality with JMP chains unwinding:




```
PBYTE detour_skip_jmp_chain(PBYTE pbCode)
{
if (pbCode[0] == 0xe9)
{ // jmp +imm32
PBYTE pbNew = pbCode + 5 + *(UNALIGNED INT32*) & pbCode[1];
pbCode = pbNew;
return detour_skip_jmp_chain(pbCode);
}
#ifdef _WIN64
if (pbCode[0] == 0xff && pbCode[1] == 0x25)
{ // jmp [+imm32]
PBYTE pbTarget = pbCode + 6 + *(UNALIGNED INT32*) & pbCode[2];
PBYTE pbNew = *(UNALIGNED PBYTE*)pbTarget;
pbCode = pbNew;
return detour_skip_jmp_chain(pbCode);
}
#else
if (pbCode[0] == 0xff && pbCode[1] == 0x25)
{ // jmp [imm32]
PBYTE pbTarget = *(UNALIGNED PBYTE *)&pbCode[2];
PBYTE pbNew = *(UNALIGNED PBYTE *)pbTarget;
pbCode = pbNew;
return detour_skip_jmp_chain(pbCode);
}
#endif
return pbCode;
}

//////////////////////////////////////////////////////////////////////
PVOID DetourCodeFromPointerEx(PVOID pPointer, PVOID *ppGlobals)
{
return detour_skip_jmp_chain((PBYTE)DetourCodeFromPointer(pPointer, ppGlobals));
}
//////////////////////////////////////////////////////////////////////
```

Then simply use it to modify target hook pointer before calling DetourAttach:


```
lpTargetFn = DetourCodeFromPointerEx(lpTargetFn, NULL);
DetourAttach(&(PVOID&)lpTargetFn,DetourFn);
```



The second possible reason of Detours related incompatibilities is lack of built-in rehooking mechanisms in Detours. If multiple hooks are installed by different apps and one hook owner application closes, whole hook chain can be lost and won’t be restored. It can become troublesome, considering that some hooks can be dynamically uninstalled even when hook owner application is still running (as an example, RTSS injects ID3D12CommandQueue::ExecuteCommandLists for a short period of time, the hook is only necessary to get initial D3D12 command queue / swapchain mapping during the first few rendered frames, then the hook is kicked so chained hooks can follow it too). So there should be some logic handling such cases.


----------



## Jim (Oct 21, 2021)

Thanks for letting us know, will verify.


----------



## Jim (Oct 22, 2021)

In my old (inferior) hook code, what I originally tried to do that seemed effective, was unhook, call the function, and then rehook. The problem with reading/writing every pre-call and post-call is that it's of course very unsafe in a multithreaded environment like D3D12, which I thought was one of the many reasons we switched to detours. But I assume that the old inferior method allowed our hook to bypass the issue you're describing -- because we had a rehooking mechanism.

With detours, it no longer does that. The reason why I mention my original code above is because I specifically remember some other hooks that would erase our hook (just as you describe), and it prevented that. It seems like detours does not follow the same approach, thus that situation is once again occurring. So it seems as though detours is designed specifically expecting everyone else to use detours or at least the same methodology, presumably with no uninstalling.

Would that be an accurate assessment of the situation?

If I'm reading you correctly, you're saying that we should follow the hook chain to the end of the chain, and install our hook there to prevent issues -- but you're also saying that if there are hooks that are designed to be uninstallable, then parts of the chain may just vanish, thus undoing our changes anyway?

I guess this kind of a no-win situation, because if there are hooks that will just erase other hooks or uninstall for whatever reason, then we have no choice but to approach a rehooking mechanism of some sort again. Because if I am reading you correctly (and feel free to correct me if I'm not), without a pre-call and post-call rehooking mechanism, then either everyone follows some expected methodology, or the whole process will inevitably fail at some point. The only problem is that a rehooking mechanism is specifically unsafe for functions that are thread safe, thus it's sort of a paradoxical no-win situation.


----------



## Unwinder (Oct 22, 2021)

Hi Jim,

Thanks for replying!



Jim said:


> So it seems as though detours is designed specifically expecting everyone else to use detours or at least the same methodology, presumably with no uninstalling.
> 
> Would that be an accurate assessment of the situation?



Yep, you summarized the situation precisely and absolutely correct. Microsoft Detours is designed to coexist with other MS Detours or MS Detours methodology based hooks only. Combining it with other hook architectures (e.g. runtime unpatch/call/repatch based, which you used before) results in unpredictable effects (one of possible cases is that detour handler is not being called).



Jim said:


> If I'm reading you correctly, you're saying that we should follow the hook chain to the end of the chain, and install our hook there to prevent issues -- but you're also saying that if there are hooks that are designed to be uninstallable, then parts of the chain may just vanish, thus undoing our changes anyway?



Those are two different and unrelated issues. My suggestion with DetourCodeFromPointerEx, which is unwinding whole JMP chain and installing Detour in the very end of it, solves the first issue and allows safely combining different hooks architectures with Detours (as long as they stay  installed). Uninstallable hooks is a different story with no proper automatic solution, and the problem exists even if both hooks are purely Detours based. Take the following example:



```
DetourTransactionBegin();
DetourAttach(&(PVOID&)g_lpMessageBox1, MessageBoxDetour1);
DetourTransactionCommit();

DetourTransactionBegin();
DetourAttach(&(PVOID&)g_lpMessageBox2, MessageBoxDetour2);
DetourTransactionCommit();

MessageBox("111");

DetourTransactionBegin();
DetourDetach(&(PVOID&)g_lpMessageBox1, MessageBoxDetour1);
DetourTransactionCommit();

MessageBox("222");
```

We’re just installing two MessageBoxA hooks with Detours, simulating OBS hooking DirectX (MessageBoxDetour2) and some other app, also Detours based, e.g. overlay, which hooked DirectX before OBS (MessageBoxDetour1). 
 “111” message box will be correctly intercepted by both Detours. However,  MessageBoxDetour1 uninstallation will vanish whole chain so MessageBoxDetour2 will not get called by “222” message box.
That’s just one more possible incompatibility source specific to Detours architecture, which you need to keep in mind after migrating to Detours. And there is nothing that can be done to it without implementing some additional rehooking logic.


----------



## Jim (Oct 22, 2021)

Why would someone uninstall a hook rather than just do a passthrough? It feels like it'd be dangerous to uninstall a hook versus just leaving it and performing a straight passthrough.


----------



## Unwinder (Oct 22, 2021)

It depends on hooked function call rate. For example, if we're talking about D3D9 graphics overlays, and user wants to hide overlay  then IDirect3DDevice9::Present hook can be safely passed throug instead of uninstalling because it is called just once per frame. However, if you also hook IDirect3DDevice9::Release (for example to implement proper resource reference counting), it can be called by some 3D applications much, much more frequently (AFAIR 3D Mark 06 was the most signigicant example calling ::Release up to million times per frame). In this case even a simple passthrough is an additional performance hit so runtime hook uninstall is preferable. Previously mentioned ID3D12CommandQueue::ExecuteCommandLists is another example of such hooked function with rather high call rate in some 3D applications.


----------



## rcdrone (Oct 23, 2021)

Well, I originally tried Detours for D3D12 as a guess fix for random hook failures specific to that API, which would occur even with no other hooks present on the system. Ultimately, we don't know why this fixed the problem, but we're happy it did.

It had worked so well for D3D12, that we decided to make it the only hook for all platforms. I didn't realize other hooks would attempt to remove hooks, which as Jim mentions, was a benefit of the previous hook rehooking on every function call.

I'm not sure if there's a true benefit to walking the chain and setting up our JMP last. What's to stop a hook from collapsing the entire chain and making themselves the only hook? This might band-aid the current situation, but may not do anything long term except complicate our code.

Not to speak for anyone involved with OBS, but my personal preference would be for your hook to stop booting other hooks, rather than us trying to implement a subversion to your hook-booting in a way that may not even prevent hook-booting from determined hook booters. If we look at Vulkan layers, which is admittedly a debatable idea, there's a trust that hooks won't go out of their way to sabotage each other, and try to play nice. Maybe this is more idealist than practical.

Off-topic stuff:
I don't know about D3D9, but we've been foiled in our attempt to do proper ref counting for DXGI swap chain Release on D3D12 because MS seems to inflate the swap chain ref count in a way that makes it never drop to 0 if the application releases it when the hook is active. The same pattern works on D3D11 though, so we can clean up better for that API.

Are there really applications that call ID3D12CommandQueue::ExecuteCommandLists frequently? I can imagine command lists containing many commands, but applications are shooting themselves in the foot if they are performance-critical, and doing a ton of submits. Similarly, resource ref counts should rarely change on a per-frame basis for a properly written application. Was 3D Mark 06 designed to test the overhead of COM virtual functions? Heh.


----------



## Unwinder (Oct 23, 2021)

rcdrone said:


> I'm not sure if there's a true benefit to walking the chain and setting up our JMP last. What's to stop a hook from collapsing the entire chain and making themselves the only hook? This might band-aid the current situation, but may not do anything long term except complicate our code.
> 
> Not to speak for anyone involved with OBS, but my personal preference would be for your hook to stop booting other hooks, rather than us trying to implement a subversion to your hook-booting in a way that may not even prevent hook-booting from determined hook booters.



Well, “my hook booting other hooks” is not the case, situations is opposite in reality. RTSS is magically able to attach to properly working hook chain with _any_ combination of other hooks installed before it and previously installed hooks architecture is absolutely not important in this case, thanks to this approach of unwinding JMP chain and injecting hook into the very end of it. The problems start when you’re trying to get into the chain and your new Detour based hook is trying to install itself on top of existing hook, which will fails due to Detour’s attempt to perform unsafe JMP overwrite.

When you’re setting new hook, you cannot assume that the previously installed hook is also Detours library based. That’s the reality: there are dozens of hook engines used in different software products available on the market and you need to coexist with them somehow. Due to that reason you should _never_ overwrite JMP injected by other hook installed into the chain before you, because you know nothing about previously installed hook architecture. Detours is using “install and forget about it” paradigm for hooks, but many third party hook engines use more advanced approaches with hook integrity control, use runtime disassembling to verify that the hook they installed is still in the chain, use protective mechanisms aimed to prevent race conditions in case of their hook overwrite by a third party hook etc. Overwriting other hook JMP entry point like you do it with Detours now on non-Detours based hook will cause unpredictable results, trigger such safety mechanics or just crash/deadlock application.

So true benefits of JMP chain unwinding approach are pretty important for me: it guarantees that you never conflict with a hook of unknown architecture, which was installed into target function before you. I use such JMP chain unwinding approach since the first versions of RTSS in 2005 and most of complex hook engines reuse the same technique to build safe hook chains.

Anyway, if you don’t find compatibility with any third party hooks important and prefere to have compact code with default Detours implementation – that entirely your choice. But by implementing such JMP chain unwinding mode you could really shorten the list of conflicting applications, if not make the list completely empty.

Migrating to Detours for hooks implementation is very trivial task that doesn’t take too much time, so if it makes no interest for you to implement compatibility tweaks like that, as a solution for my users who wish to use both RTSS and OBS I’ll simply add a compatibility option to RTSS allowing it to use either Detours or own hooking engine depending on user preference.




rcdrone said:


> Are there really applications that call ID3D12CommandQueue::ExecuteCommandLists frequently? I can imagine command lists containing many commands, but applications are shooting themselves in the foot if they are performance-critical, and doing a ton of submits.




GPU vendors recommend DX12 applications to target 5-10 submits / ID3D12CommandQueue::ExecuteCommandLists calls per frame. Properly written apps follow that recommendation, tried a few modern D3D12 games which I currently have installed and see something close to it, it is 5-30 submits per frame. But I’ve seen some of the first DX12 games launched in 2016 that that performed a few hundred call on each frame, cannot remember what was that, probably Hitman DX12.


----------



## rcdrone (Oct 25, 2021)

> Well, “my hook booting other hooks” is not the case, situations is opposite in reality.


I'm referring to this, which is your own explanation of the issue. I'm not trying to paint your hook in an unfair light.


> For example RTSS is the third case, it will restore original JMP inside the hook handler if it detect that it was relocated. So it will effectively kick OBS Detour out of hook chain, if you inject it after RTSS.


I think we would appreciate this.


> Migrating to Detours for hooks implementation is very trivial task that doesn’t take too much time, so if it makes no interest for you to implement compatibility tweaks like that, as a solution for my users who wish to use both RTSS and OBS I’ll simply add a compatibility option to RTSS allowing it to use either Detours or own hooking engine depending on user preference.



If the provided code snippet really is helpful though, maybe you can convince Microsoft to take it rather than hook novices like us? Their GitHub seems to respond to issues and accept pull requests:








						GitHub - microsoft/Detours: Detours is a software package for monitoring and instrumenting API calls on Windows.  It is distributed in source code form.
					

Detours is a software package for monitoring and instrumenting API calls on Windows.  It is distributed in source code form. - GitHub - microsoft/Detours: Detours is a software package for monitori...




					github.com
				




I'd feel better about incorporating a different hook algorithm if it was vetted by other hook experts, and made part of the library.

Btw, I looked at the documentation for DetourCodeFromPointer, and it's not about JMP chains. Their explanation is this.


> When a binary is statically linked against a DLL, pointers to functions from the DLL often point not to the code from the DLL, but to a jump statement in the binary's import table. DetourCodeFromPointer returns the address of the actual target function, not the jump statement.





> But I’ve seen some of the first DX12 games launched in 2016 that that performed a few hundred call on each frame, cannot remember what was that, probably Hitman DX12.



Free GDC Vault has video/slides of of the Hitman DX12 porting talk, and it sucks that the "Multithreading" slide is skipped in the video. I wonder if they increased submit granularity to try to reduce latency.


----------



## Unwinder (Oct 25, 2021)

rcdrone said:


> I'm referring to this, which is your own explanation of the issue. I'm not trying to paint your hook in an unfair light.



No problem, I just clarified the nature of issue.



rcdrone said:


> I think we would appreciate this.



Already done. Enabling this compatibility option and switching RTSS to Detours hooking model allows both applicaiton hooks to coexist.



rcdrone said:


> If the provided code snippet really is helpful though, maybe you can convince Microsoft to take it rather than hook novices like us? Their GitHub seems to respond to issues and accept pull requests:
> 
> 
> 
> ...



Good idea, I'll try that. But even in case of success, Detours updates are not released too frequently, it make take months if not years for it to get published.



rcdrone said:


> Btw, I looked at the documentation for DetourCodeFromPointer, and it's not about JMP chains. Their explanation is this.



It does a bit more than documentation is telling, check DetourCodeFromPointer implementation into detours.cpp. It performs two things:

It skips IAT jump as documentation is telling (jmp [(+)imm32] (FF 25 XX XX XX XX opcode)) .
Then it skips JMP chain specific to hotpatch hook (sequence of jmp +imm8 (EB XX opcode) -> jmp +imm32 (E9 XX XX XX XX opcode) or jmp [(+)imm32] (FF 25 XX XX XX XX).
So step 2 indeed try to skip other hook's jump chain, but does it for a single installed hook only and only for hotpatch hook starting from short jump.

DetourCodeFromPointerEx which I offered simply extends step 2 functionality by making it recursive and supporting hooks starting from long jumps. So in addition to original DetourCodeFromPointer functionality it also recursively skips any sequence of jmp +imm32 (E9 XX XX XX XX opcode) and jmp [(+)imm32] (FF 25 XX XX XX XX).


----------

