C# 中 lock 关键字的实现 / 秋梦无痕

C# 中 lock 关键字的实现

From: Flier's Sky@blogcn
C# 中 lock 关键字的实现

刚刚在这篇文章《How is lock keyword of C# implemented? 》中看到MS内部关于C#的lock关键字实现的一个讨论。

Subject: RE: How is lock keyword of C# implemented?
At the core, it’s typically one ?lock cmpxchg“ instruction (for x86) for entry, and one for exit, plus a couple dozen other instructions, all in user mode. The lock prefix is replaced with a nop on uniprocessor machines.
The “lock cmpxchg” instruction basically stores the locking thread’s id in the object header, so another thread that tries to lock the same object can see that it’s already locked.
The actual implementation is a lot more complicated, of course – we use the object header for other purposes, for example, so this must be detected and dealt with, plus when a thread leaves the lock, we must detect whether other threads are waiting and so on…
Thanks

回想起前两天分析过的临界区实现，就顺便看了看rotor这方面的实现代码，发现和Windows中临界区的实现思路基本上相同。
在rotor中，每个引用对象内部实现是一个Object对象(sscli\clr\src\vm\object.h:126)的实例。而对象同步机制的实现，则是通过和Object对象绑定的ObjHeader对象(sscli\clr\src\vm\syncblk.h:539)中的SyncBlock结构完成的。这种实现思路跟Delphi中的VMT的实现很相似，rotor中Object对象指针的-4偏移处存储绑定的ObjHeader对象，Delphi则在负偏移处保存VMT表。

class Object
{
//...

// Access the ObjHeader which is at a negative offset on the object (because of
// cache lines)
ObjHeader *GetHeader()
{
return ((ObjHeader *) this) - 1;
}
// retrieve or allocate a sync block for this object
SyncBlock *GetSyncBlock()
{
return GetHeader()->GetSyncBlock();
}

//...
};

ObjHeader::GetSyncBlock(syncblk.cpp:1206)方法从缓冲区获取或者创建新的SyncBlock对象。SyncBlock对象则是一个使用lazily created策略的可缓存结构，调用其Monitor完成对象的实际锁定工作。

// this is a lazily created additional block for an object which contains
// synchronzation information and other "kitchen sink" data
class SyncBlock
{
//...

AwareLock m_Monitor; // the actual monitor
void EnterMonitor()
{
m_Monitor.Enter();
}

//...
};

AwareLock类型是一个很类似临界区的轻量级同步对象，其Enter(syncblk.cpp:1413)方法使用FastInterlockCompareExchange函数尝试锁定此Monitor。如果无法锁定则判断此Monitor的所有者线程是否是当前线程：是则调用线程嵌套锁定函数；否则等待此对象锁定状态的改变。

Thread *pCurThread = GetThread();
for (;;) {
// Read existing lock state.
LONG state = m_MonitorHeld;
if (state == 0) {
// Common case: lock not held, no waiters. Attempt to acquire lock by
// switching lock bit.
if (FastInterlockCompareExchange((LONG*)&m_MonitorHeld, 1, 0) == 0)
break;
} else {
// It's possible to get here with waiters but no lock held, but in this
// case a signal is about to be fired which will wake up a waiter. So
// for fairness sake we should wait too.
// Check first for recursive lock attempts on the same thread.
if (m_HoldingThread == pCurThread)
goto Recursion;
// Attempt to increment this count of waiters then goto contention
// handling code.
if (FastInterlockCompareExchange((LONG*)&m_MonitorHeld, state + 2, state) == state)
goto MustWait;
}
}

可以看到这儿的实现思路和临界区的实现基本上相同。
FastInterlockCompareExchange函数(util.hpp:66)则是MS那个讨论里面提到的lock cmpxchg指令的调用之处。此函数根据编译时选项，被替换成CompareExchangeUP/CompareExchangeMP两个函数分别处理单/多处理器情况。可以参考vm\i386\cgenx86.cpp中的InitFastInterlockOps函数(cgenx86.cpp:2106)实现。在386平台上，这两个函数完全由汇编语言实现(i386\asmhelpers.asm:366, 440）。

CmpXchgOps FastInterlockCompareExchange = (CmpXchgOps)CompareExchangeUP;
// Adjust the generic interlocked operations for any platform specific ones we
// might have.
void InitFastInterlockOps()
{
_ASSERTE(g_SystemInfo.dwNumberOfProcessors != 0);
if (g_SystemInfo.dwNumberOfProcessors != 1)
{
//...
FastInterlockCompareExchange = (CmpXchgOps)CompareExchangeMP;
//...
}
}

FASTCALL_FUNC CompareExchangeUP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP
FASTCALL_FUNC CompareExchangeMP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
lock cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

值得注意的是那个讨论里面提到“The lock prefix is replaced with a nop on uniprocessor machines”，据rain的分析，NT核心部分的DLL也做了类似的优化。