2.4.0.1假死,memory dump,分析和建议
[b]1. 环境:[/b]tw :多语言 2.4.0.1 zip版本
OS:Windows 2003 server 英文版,5.2.3790, sp2
IE 7.0.5730.13 (128-bit cipher)
系统内存>4GB
[b]2. 现象[/b]
打开网页或浏览时,失去了响应。网页一般在30~50个之间,出问题的网页不一定。试过新目录,重新注册IE相关dll等,都会不定时出现这个问题。最后只能直接kill了。
[b]3. memory dump[/b]
问题再出现时,attach到windbg里面,.dump /ma。 相关大致结果如下:[code]...
0:000> .ecxr
...
0:000> !analyze -v -hang
...
FAULTING_IP:
+0
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000007 (Wake debugger)
ExceptionFlags: 00000000
NumberParameters: 0
BUGCHECK_STR: 80000007
PROCESS_NAME: TheWorld.exe
ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened} the system debugger was awakened by an interrupt.
EXCEPTION_CODE: (HRESULT) 0x80000007 (2147483655) - Operation aborted
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
LOADERLOCK_BLOCKED_API: GetModuleFileNameW:LdrLockLoaderLock:
DERIVED_WAIT_CHAIN:
Dl Eid Cid WaitType
-- --- ------- --------------------------
5 e60.e70 Critical Section
WAIT_CHAIN_COMMAND: ~5s;k;;
BLOCKING_THREAD: 00000e70
DEFAULT_BUCKET_ID: APPLICATION_HANG_Orphaned_CriticalSection
PRIMARY_PROBLEM_CLASS: APPLICATION_HANG_Orphaned_CriticalSection
LAST_CONTROL_TRANSFER: from 7c827d0b to 7c8285ec
FAULTING_THREAD: 00000005
STACK_TEXT:
03a4f268 7c827d0b 7c83d236 00000148 00000000 ntdll!KiFastSystemCallRet
03a4f26c 7c83d236 00000148 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
03a4f2a8 7c83d281 00000148 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x1a3
03a4f2c8 7c82d243 7c8877a0 4b3c0000 7c8877ec ntdll!RtlEnterCriticalSection+0xa8
03a4f2fc 77e63cd8 00000001 00000000 03a4f338 ntdll!LdrLockLoaderLock+0xe4
03a4f35c 77390f3d 4b3c0000 03a4f3b0 00000104 kernel32!GetModuleFileNameW+0x77
03a4f370 77390f13 4b3c0000 03a4f3b0 00000104 user32!WowGetModuleFileName+0x14
03a4f5bc 7738fe09 4b3c0000 00000160 00000003 user32!LoadIcoCur+0x76
03a4f5e4 773a0bc7 4b3c0000 00000160 00000001 user32!LoadImageW+0x7c
03a4f604 4b3f7906 4b3c0000 00000160 00000001 user32!LoadImageA+0x6c
03a4f624 4b3c99b8 1b9ff500 03a4f660 03a95700 MSCTF!CLBarItemWin32IME::GetIcon+0x24
03a4f658 4b3c9a0d 0d89c540 04de0278 03a4f8d8 MSCTF!CStubITfLangBarItemButton::stub_GetIcon+0x1b
03a4f668 4b3c93b0 04de0278 0001028c 00000002 MSCTF!CStubITfLangBarItemButton::Invoke+0x14
03a4f8d8 4b3c922d 00000ab8 00000028 03a4fa7c MSCTF!HandleSendReceiveMsg+0x171
03a4fa04 7739b6e3 0001028c 0000c0ae 00000ab8 MSCTF!CicMarshalWndProc+0x161
03a4fa30 7739b874 4b3c6b73 0001028c 0000c0ae user32!InternalCallWinProc+0x28
03a4faa8 7739ba92 00000000 4b3c6b73 0001028c user32!UserCallWinProcCheckWow+0x151
03a4fb10 7739bad0 03a4ff58 00000000 03a4ff84 user32!DispatchMessageWorker+0x327
03a4fb20 00496cb9 03a4ff58 00000000 02da9248 user32!DispatchMessageW+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
03a4ff84 77bcb530 02dc1478 00000000 00000000 TheWorld+0x96cb9
03a4ffb8 77e64829 014e1308 00000000 00000000 msvcrt!_endthreadex+0xa3
03a4ffec 00000000 77bcb4bc 014e1308 00000000 kernel32!BaseThreadStart+0x34
FOLLOWUP_IP:
MSCTF!CLBarItemWin32IME::GetIcon+24
4b3f7906 8b4d0c mov ecx,dword ptr [ebp+0Ch]
SYMBOL_STACK_INDEX: a
SYMBOL_NAME: MSCTF!CLBarItemWin32IME::GetIcon+24
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: MSCTF
IMAGE_NAME: MSCTF.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 45d70ab2
STACK_COMMAND: ~5s ; kb
BUCKET_ID: 80000007_MSCTF!CLBarItemWin32IME::GetIcon+24
FAILURE_BUCKET_ID: APPLICATION_HANG_Orphaned_CriticalSection_80000007_MSCTF.dll!CLBarItemWin32IME::GetIcon
Followup: MachineOwner[/code]进一步的lock分析:[code]0:000> !locks
CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0
WaiterWoken No
LockCount 6
RecursionCount 1
OwningThread d6c
EntryCount 0
ContentionCount 2c9
*** Locked
CritSec mshtml!g_csTimerMan+0 at 43599128
WaiterWoken No
LockCount 0
RecursionCount 1
OwningThread d6c
EntryCount 0
ContentionCount 0
*** Locked
Scanned 3578 critical sections
0:000> ~~[d6c]
^ Illegal thread error in '~~[d6c]'
0:000>~
...[/code]根本没有发现tid=d6c的线程。前面的!analyze -v -hang已给出结果:
[b]
DEFAULT_BUCKET_ID: APPLICATION_HANG_Orphaned_CriticalSection
[/b]
再看:[code]0:000> ~5s
eax=fffe0000 ebx=00000000 ecx=00000000 edx=00000000 esi=7c8877a0 edi=00000000
eip=7c8285ec esp=03a4f26c ebp=03a4f2a8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
7c8285ec c3 ret
0:005> kv 100
ChildEBP RetAddr Args to Child
03a4f268 7c827d0b 7c83d236 00000148 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
03a4f26c 7c83d236 00000148 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
03a4f2a8 7c83d281 00000148 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x1a3 (FPO: [2,7,4])
03a4f2c8 7c82d243 7c8877a0 4b3c0000 7c8877ec ntdll!RtlEnterCriticalSection+0xa8 (FPO: [1,1,0])
03a4f2fc 77e63cd8 00000001 00000000 03a4f338 ntdll!LdrLockLoaderLock+0xe4 (FPO: [SEH])
03a4f35c 77390f3d 4b3c0000 03a4f3b0 00000104 kernel32!GetModuleFileNameW+0x77 (FPO: [SEH])
03a4f370 77390f13 4b3c0000 03a4f3b0 00000104 user32!WowGetModuleFileName+0x14 (FPO: [3,0,0])
03a4f5bc 7738fe09 4b3c0000 00000160 00000003 user32!LoadIcoCur+0x76 (FPO: [6,139,4])
03a4f5e4 773a0bc7 4b3c0000 00000160 00000001 user32!LoadImageW+0x7c (FPO: [6,0,0])
03a4f604 4b3f7906 4b3c0000 00000160 00000001 user32!LoadImageA+0x6c (FPO: [6,0,0])
03a4f624 4b3c99b8 1b9ff500 03a4f660 03a95700 MSCTF!CLBarItemWin32IME::GetIcon+0x24 (FPO: [2,0,0])
03a4f658 4b3c9a0d 0d89c540 04de0278 03a4f8d8 MSCTF!CStubITfLangBarItemButton::stub_GetIcon+0x1b (FPO: [2,6,4])
03a4f668 4b3c93b0 04de0278 0001028c 00000002 MSCTF!CStubITfLangBarItemButton::Invoke+0x14 (FPO: [1,0,0])
03a4f8d8 4b3c922d 00000ab8 00000028 03a4fa7c MSCTF!HandleSendReceiveMsg+0x171 (FPO: [SEH])
03a4fa04 7739b6e3 0001028c 0000c0ae 00000ab8 MSCTF!CicMarshalWndProc+0x161 (FPO: [4,68,4])
03a4fa30 7739b874 4b3c6b73 0001028c 0000c0ae user32!InternalCallWinProc+0x28
03a4faa8 7739ba92 00000000 4b3c6b73 0001028c user32!UserCallWinProcCheckWow+0x151 (FPO: [SEH])
03a4fb10 7739bad0 03a4ff58 00000000 03a4ff84 user32!DispatchMessageWorker+0x327 (FPO: [SEH])
03a4fb20 00496cb9 03a4ff58 00000000 02da9248 user32!DispatchMessageW+0xf (FPO: [1,0,0])
WARNING: Stack unwind information not available. Following frames may be wrong.
03a4ff84 77bcb530 02dc1478 00000000 00000000 TheWorld+0x96cb9
03a4ffb8 77e64829 014e1308 00000000 00000000 msvcrt!_endthreadex+0xa3 (FPO: [SEH])
03a4ffec 00000000 77bcb4bc 014e1308 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])[/code]比较典型的Orphaned Critical Section。
[[i] 本帖最后由 backfire 于 2009-1-22 13:33 编辑 [/i]] [b]4.分析[/b]
一种可能出问题的代码如下,[code] ...
CRITICAL_SECTION g_cs;
DWORD WINAPI ThreadProc(LPVOID lpParam)
{
...
try
{
EnterCriticalSection(&g_cs);
3rdPartyCode(lpParam);
LeaveCriticalSection(&g_cs);
}
catch(...)
{
...
}
...
}[/code]一旦3rdPartyCode() --这里很可能就是IE的内核api,出问题退出,貌似起保护左右的try/catch,其实对CriticalSection一点用都没有,于是Orphan就出现了。。。。
[b]5. 建议[/b]
采用CS类,如下:[code]
class CCriticalSection
{
piublic:
CCriticalSection(CRITICAL_SECTION* pCs)
{
m_pCs = pCs;
if(m_pCs)
EnterCriticalSection(m_pCs);
}
~CCriticalSection()
{
if(m_pCs)
LeaveCriticalSection(m_pCs);
}
private:
CRITICAL_SECTION* m_pCs;
}[/code]
改写后的thread函数如下:[code] DWORD WINAPI ThreadProc(LPVOID lpParam)
{
...
CCriticalSection cs(&g_cs)
try
{
//EnterCriticalSection(&g_cs);
3rdPartyCode(lpParam);
//LeaveCriticalSection(&g_cs);
}
catch(...)
{
...
}
...
}[/code]还有一种可能导致Orphaned CS的状况就是直接在是否CS前调用TerminateThread();相信开发组不会犯这种低级错误;
但有一点要注意,Win32 API函数如HeapAlloc等,内部却是隐含CS的,因此虽然代码中可能显式EnterCriticalSection,但
其实还是可能在HeapAlloc执行中异常退出,导致Orphaned CS。。。
不知道开发组是否使用微软的Applicaiton Verifier(appverf)之类的工具作检查?像上面第2种隐藏问题, appverf是可以查出的。
另外: 如果可能,开发组是否能放出TW相关的pdb文件?或者在有请求的情况下,有条件地发放给相关人员? 这样也便于更好的测试。
[[i] 本帖最后由 backfire 于 2009-1-22 13:31 编辑 [/i]] 没看懂,传说中的高手? 最新版本是2402,你知道吗? [quote]原帖由 [i]nov.six[/i] 于 2009-1-22 13:51 发表 [url=http://bbs.ioage.com/cn/redirect.php?goto=findpost&pid=762004&ptid=76721][img]http://bbs.ioage.com/cn/images/common/back.gif[/img][/url]
最新版本是2402,你知道吗? [/quote]
我已经知道了。但2.3.x以来的几个版本都有问题,我都保留了dump,但都无瑕作仔细的分析(没pdb也不好办),只是最近几天2.4.0.1几乎隔天就要假死一下,于是大致看了几个以前和这最新的dump,发现了这个问题。
2402我还没有用,但已下载。。。
如果没有这个code问题,或者已改了,当然最好不过。 [quote]原帖由 [i]hhzxedu[/i] 于 2009-1-22 13:47 发表 [url=http://bbs.ioage.com/cn/redirect.php?goto=findpost&pid=761997&ptid=76721][img]http://bbs.ioage.com/cn/images/common/back.gif[/img][/url]
没看懂,传说中的高手? [/quote]
高手真的不敢当。只是多作了几年编程的工作而已。 建议楼主直接跟开发组联系 我用的是2.4F。。。最近也经常假死。。 难道是与广告规矩有关系? 多谢楼主的热心, 我会去检查代码看看是否有这样的情况的。 浏览器假死的情况, 很多都是IE内核导致的, 和shell这一层关系不大。
页:
[1]