Heap Handle Mismatches The heap manager keeps a list of active heaps in a process. The heaps are considered separate entities in the sense that the internal per-heap state is only valid within the context of that particular heap. Developers working with the heap manager must take great care to respect this separation by ensuring that the correct heaps are used when allocating and freeing heap memory. The separation is exposed to the developer by using heap handles in the heap API calls. Each heap handle uniquely represents a particular heap in the list of heaps for the process. An example of this is calling the GetProcessHeap API, which returns a unique handle to the default process. Another example is calling the HeapCreate API, which returns a unique handle to the newly created heap. If the uniqueness is broken, heap corruption will ensue. Listing 6.9 illustrates an application that breaks the uniqueness of heaps. Listing 6.9 Example of heap handle mismatch #include <windows.h> #include <stdio.h> #include <conio.h> #define MAX_SMALL_BLOCK_SIZE 20000 HANDLE hSmallHeap=0; 08_0321374460_ch06.qxd 10/3/07 10:49 PM Page 300 HANDLE hLargeHeap=0; VOID* AllocMem(ULONG ulSize); VOID FreeMem(VOID* pMem, ULONG ulSize); BOOL InitHeaps(); VOID FreeHeaps(); int __cdecl wmain (int argc, wchar_t* pArgs[]) { printf("Press any key to start\n"); _getch(); if(InitHeaps()) { BYTE* pBuffer1=(BYTE*) AllocMem(20); BYTE* pBuffer2=(BYTE*) AllocMem(20000); // // Use allocated memory // FreeMem(pBuffer1, 20); FreeMem(pBuffer2, 20000); FreeHeaps(); } printf("Done...exiting application\n"); return 0; } BOOL InitHeaps() { BOOL bRet=TRUE ; hSmallHeap = GetProcessHeap(); hLargeHeap = HeapCreate(0, 0, 0); if(!hLargeHeap) { bRet=FALSE; } return bRet; } VOID FreeHeaps() { if(hLargeHeap) { HeapDestroy(hLargeHeap); hLargeHeap=NULL; } } VOID* AllocMem(ULONG ulSize) { VOID* pAlloc = NULL ; if(ulSize<MAX_SMALL_BLOCK_SIZE) { pAlloc=HeapAlloc(hSmallHeap, 0, ulSize); } else { pAlloc=HeapAlloc(hLargeHeap, 0, ulSize); } return pAlloc; } VOID FreeMem(VOID* pAlloc, ULONG ulSize) { if(ulSize<=MAX_SMALL_BLOCK_SIZE) { HeapFree(hSmallHeap, 0, pAlloc); } else { HeapFree(hLargeHeap, 0, pAlloc); } } The source code and binary for Listing 6.9 can be found in the following folders: Source code: C:\AWD\Chapter6\Mismatch Binary: C:\AWDBIN\WinXP.x86.chk\06Mismatch.exe The application in Listing 6.9 seems pretty straightforward. The main function requests a couple of allocations using the AllocMem helper function. Once done with the allocations, it calls the FreeMem helper API to free the memory. The allocation helper APIs work with the memory from either the default process heap (if the allocation is below a certain size) or a private heap (created in the InitHeaps API) if the size is larger than the threshold. If we run the application, we see that it successfully finishes execution: C:\AWDBIN\WinXP.x86.chk\06Mismatch.exe Press any key to start Done...exiting application We might be tempted to conclude that the application works as expected and sign off on it. However, before we do so, let's use Application Verifier and enable full pageheap on the application and rerun it. This time, the application never finished. As a matter of fact, judging from the crash dialog that appears, it looks like we have a crash. In order to get some more information on the crash, we run the application under the debugger:......... 0:000> g Press any key to start (118.3c8): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=0006fc54 ebx=00000000 ecx=0211b000 edx=0211b008 esi=021161e0 edi=021161e0 eip=7c96893a esp=0006fbec ebp=0006fc20 iopl=0 nv up ei ng nz ac po cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010293 ntdll!RtlpDphIsNormalHeapBlock+0x81: 7c96893a 8039a0 cmp byte ptr [ecx],0A0h ds:0023:0211b000=?? 0:000> kb ChildEBP RetAddr Args to Child 0006fc20 7c96ac47 00081000 021161e0 0006fc54 ntdll!RtlpDphIsNormalHeapBlock+0x81 0006fc44 7c96ae5a 00081000 01000002 00000007 ntdll!RtlpDphNormalHeapFree+0x1e 0006fc94 7c96defb 00080000 01000002 021161e0 ntdll!RtlpDebugPageHeapFree+0x79 0006fd08 7c94a5d0 00080000 01000002 021161e0 ntdll!RtlDebugFreeHeap+0x2c 0006fdf0 7c9268ad 00080000 01000002 021161e0 ntdll!RtlFreeHeapSlowly+0x37 0006fec0 003ab9eb 00080000 00000000 021161e0 ntdll!RtlFreeHeap+0xf9 0006ff18 010012cf 00080000 00000000 021161e0 vfbasics!AVrfpRtlFreeHeap+0x16b 0006ff2c 010011d3 021161e0 00004e20 021161e0 06mismatch!FreeMem+0x1f 0006ff44 01001416 00000001 02060fd8 020daf80 06mismatch!wmain+0x53 0006ffc0 7c816fd7 00011970 7c9118f1 7ffdc000 06mismatch!wmainCRTStartup+0x12f 0006fff0 00000000 010012e7 00000000 78746341 kernel32!BaseProcessStart+0x23 From the stack trace, we can see that our application was trying to free a block of memory when the heap manager access violated. To find out which of the two memory allocations we were freeing, we unassemble the 06mismatch!wmain function and see which of the calls correlate to the address located at 06mismatch!wmain+0x55. 0:000> u 06mismatch!wmain+0x53-10 06mismatch!wmain+0x43: 010011c3 0000 add byte ptr [eax],al 010011c5 68204e0000 push 4E20h 010011ca 8b4df8 mov ecx,dword ptr [ebp-8] 010011cd 51 push ecx 010011ce e8dd000000 call 06mismatch!FreeMem (010012b0) 010011d3 e858000000 call 06mismatch!FreeHeaps (01001230) 010011d8 688c100001 push offset 06mismatch!`string' (0100108c) 010011dd ff1550100001 call dword ptr [06mismatch!_imp__printf (01001050)] Since the call prior to 06mismatch!FreeHeaps is a FreeMem, we know that the last FreeMem call in our code is causing the problem. We can now employ code reviewing to see if anything is wrong. From Listing 6.9, the FreeMem function frees memory either on the default process heap or on a private heap. Furthermore, it looks like the decision is dependent on the size of the block. If the block size is less than or equal to 20Kb, it uses the default process heap. Otherwise, the private heap is used. Our allocation was exactly 20Kb, which means that the FreeMem function attempted to free the memory from the default process heap. Is this correct? One way to easily find out is dumping out the pageheap block metadata, which has a handle to the owning heap contained inside: 0:000> dt _DPH_BLOCK_INFORMATION 021161e0-0x20 +0x000 StartStamp : 0xabcdbbbb +0x004 Heap : 0x02111000 +0x008 RequestedSize : 0x4e20 +0x00c ActualSize : 0x5000 +0x010 FreeQueue : _LIST_ENTRY [ 0x21 - 0x0 ] +0x010 TraceIndex : 0x21 +0x018 StackTrace : 0x00287510 +0x01c EndStamp : 0xdcbabbbb The owning heap for this heap block is 0x02111000. Next, we find out what the default process heap is: 0:000> x 06mismatch!hSmallHeap 01002008 06mismatch!hSmallHeap = 0x00080000The two heaps do not match up, and we are faced with essentially freeing a block of memory owned by heap 0x02111000 on heap 0x00080000. This is also the reason Application Verifier broke execution, because a mismatch in heaps causes serious stability issues. Armed with the knowledge of the reason for the stop, it should now be pretty straightforward to figure out why our application mismatched the two heaps. Because we are relying on size to determine which heaps to allocate and free the memory on, we can quickly see that the AllocMem function uses the following conditional: if(ulSize<MAX_SMALL_BLOCK_SIZE) { pAlloc=HeapAlloc(hSmallHeap, 0, ulSize); } while the FreeMem function uses: if(ulSize<=MAX_SMALL_BLOCK_SIZE) { HeapFree(hSmallHeap, 0, pAlloc); } The allocating conditional checks that the allocation size is less than the threshold, whereas the freeing conditional checks that it is less than or equal. Hence, when freeing an allocation of size 20Kb, incorrectly uses the default process heap. In addition to being able to analyze and get to the bottom of heap mismatch problems, another very important lesson can be learned from our exercise: Never assume that the application works correctly just because no errors are reported during a normal noninstrumented run. As you have already seen, heap corruption problems do not always surface during tests that are run without any type of debugging help. Only when a debugger is attached and the application verifier is enabled do the problems surface. The reason is simple. In a nondebugger, non-Application Verifier run, the heap corruption still occurs but might not have enough time to surface in the form of an access violation. Say that the test runs through scenarios A, B, and C, and the heap corruption occurs in scenario C. After the heap has been corrupted, the application exits without any sign of the heap corruption, and you are led to believe that everything is working correctly. Once the application ships and gets in the hands of the customer, they run the same scenarios, albeit in a different order: C, B, and A. The first scenario ran C, immediately causing the heap corruption, but the application does not exit; rather, it continues running with scenario B and A, providing for a much larger window for the heap corruption to actually affect the application.