1 Illustrated tutorial
This is planned to be an illustrated tutorial on PE file format. Many tutorials I saw are overwhelmed with details, going over all the possibilities and branches for PE file format [9], with the end result that they become hard to follow. We plan to provide an overview tutorial focused on the big picture with a number of illustrations, and the reader will later find more details elsewhere if he is interested in the topic.
1.1 Tools used – Hex Editor ImHex
I am using Hex Editor ImHex (freeware), which enables me to color sections of files ([1]).
1.2 Tools used – PE Viewer "PE-bear"
PE-bear (freeware) is a useful tool for visually analyzing PE files. Based on the documentation, it does not cover all variants and flavors of PE files, but it is a great viewer/parser/analyzer for simpler ones. I think it is always easier to study topics using visual tools. ([2])
2 Executable file formats
Before we explore PE format, let us mention that PE format belongs to a family of "executable file formats" which are nicely listed at [3], around 40 of them, for different Operating Systems. Let us mention several of the most popular ones:
- MZ – For DOS and Windows, extension .exe
- COFF - For UNIX/Linux-like systems, no extension
- ELF - For UNIX/Linux-like systems, no extension or extension .elf
- Mach-O – For macOS and iOS, no extension
- PE - For Windows, extensions .exe, .dll, .sys, etc.
- PE32+ - For 64-bit Windows, extension .exe
Please see [4] for more details on PE and PE32+ formats.
3 History of PE file format
PE stands for ‘Portable Executable" and the format is invented in the 1980s. The dominant format then was MZ MS-DOS format, which has a special marker at the beginning of the file to identify itself, letters "MZ", which are the initials of Mark Zbikowski, one of the MS-DOS developers. PE format was about to target the Windows platform, and they preserved backward compatibility with MZ format (.exe files) and enabled PE format (.exe files) if accidentally run on MS-DOS to report "This program cannot be run in DOS mode", which was an important issue for Microsoft at the time. Therefore, you will still see that the PE format contains an MS-DOS style header, meaning it starts with the magic letters "MZ" and a DOS-Stub that prints that message. That part is unnecessary today but has become part of the standard.
PE format originates from Unix COFF format.
Today, the PE format is extended to host .NET code.
4 Technical details
4.1 Problems PE was designed to solve
- Designed to support different programs on different hardware
- The idea was to separate the program from the processor
- Address the move to 64-bit processors
4.2 Loading PE file into memory
- The physical layout of how bytes of PE file are arranged on the disk is not how they are loaded into memory
- PE file contains several sections, and to avoid wasting space, they are aligned one after the other on the disk
- PE file contains several sections containing data and programs, and each section is loaded into a separate segment of memory
- Reason for that, among others that each section needs to be aligned to a page boundary
- Then, each section can be assigned different memory protection, typically program sections would get execute/read-only protection, and data sections would get no-execute/read-write protection.
- For that reason, most addresses/offsets in the file are specified using Relative Virtual Address (RVA). This specifies the offset from the start of each section.
Here is a picture that illustrates how different section look aligned in "raw alignment" on the disk and how they are being loaded into memory (‘virtual alignment") into different virtual addresses resulting in a new address schema.
4.3 Converting from "Raw Address" to "Virtual Address" and back
- Tasks that will frequently appear are conversions from "Raw Address" (how bytes are aligned in the file) to "Virtual Address" (how bytes are aligned in the memory) and back, using Relative Virtual Address (RVA).
- For example, in the above picture, you can see that the .text section starts at 0x200 (raw) and ends at 0xF32 (raw). That means the file end has an offset of 0xF32(raw)-0x200(raw)=0xD32 from the beginning of the section. When that section .text is mapped to memory address 0x2000(virtual), then the end of the section has address 0x2000+0xD32=0x2D32(RVA). We say that address of the end of section is 0x2D32(RVA).
5 Example program
We will, for our demo, use a simple C#11/.NET-7, "Hello World" program, where we created a resource file for the string "Hello World!". As you will see, .NET assemblies are packaged into PE file format. We complied it as C#11/.NET7.
6 PE format definition
The precise definition of the PE format can be found in [5]. For this tutorial, and to follow the outline of the PE-bear tool, we will define it here as:
A typical PE file consists of the following parts,
- DOS Header (aka "MZ Header")
- DOS Stub
- NT Header (aka "PE File Header")
Which consists of the following:
- PE signature
- File Header (aka "COFF Header", "Image File Header")
- Optional Header (aka "Image Optional Header")
Which itself consists of the following:
- General part
- Data Dictionary
- Sections Headers (aka "Section Table")
- Multiple sections (aka "Sections")
- Section 1
- Section 2
- …
- Section n
Here is a look at the headers in Hex editor, focus on headers,
Here is a look at the whole file just to get an idea of how headers are a small part (in quantity) of the file.
Here is how the PE-bear tool outlines it in its interface,
7 DOS Header (aka "MZ Header")
Here is DOS Header in the Hex editor:
Here is an analysis of the DOS Header by tool PE-bear:
Interpretation
- Note that the header starts with "Magic number" 0x5A4D" which stands for "MZ" (which are initials of Mark Zbikowski, one of the MS-DOS developers), which acts as a file format identifier
- Note at offset 0x3C, there is address of new exe header, which is 0x80, pointing to "NT headers"
8 DOS Stub
Here is DOS Stub in the Hex editor,
Interpretation
This is a small piece of code that is DOS compatible that prints an error message saying, "This program cannot be run in DOS mode", in case the program is run under DOS
9 NT Headers (aka "PE File Header")
Here are NT Headers in Hex editor,
9.1 NT Headers - Signature
Interpretation
That is just 4 bytes starting with "PE" indicating that this is PE format
9.2 NT Headers - File Header (aka "COFF Header", "Image File Header")
Here is an analysis of the File Header by tool PE-bear:
Interpretation
- Note at offset 0x88, it says this file will contain 3 sections. For a fixed section-header size, OS can calculate the size of Section -Headers and how many entries in Section-Headers to look for.
- Note at offset 0x94, it says the size of the Optional Header.
9.3 NT Headers - Optional Header (aka "Image Optional Header")
Here is an analysis of the Optional Header by tool PE-bear,
Interpretation
- This header contains some additional information beyond the basic one contained in the basic File Header.
- Look at file offset 0x98, so called Magic. It actually says which file type it is. 0x10B stands for PE32 format.
- It is interesting to look at file offset 0xA8 for Entry Point. It says address 0x2D32(RVA), which we need to convert to raw address, that it is 0x2D32-0x2000+0x200=0xF32(raw file). That is in the area of section .text. The address of the entry point is the address where the PE loader will begin execution. For the program image, this is the starting address.
9.3.1 NT Headers - Optional Header – Data Dictionary
Interpretation
- If you look at file offset 0x100, for Import Directory, it says at address 0x2CDD(RVA) is the import directory, with the size of 0x4F bytes. As we will discuss later, that is section .text. We need to convert the RVA address to raw offset, that is 0x2CDD-0x2000_0x200=0xEDD(raw file). That is in the area called section .text. It is a bit complicated what is happening here, there is a table acting as a directory for other entries, see [5] for details. But the point is at the end, you get your import dependency, which is, in this case, "mscoree.dll, _CorExeMain". You can see it in the following picture,
And PE-bear tool provides a nice interpretation of this.
- If you look at file offset 0x128, for Debug Directory, it says at address 0x2BD4(RVA). It is resolved similarly to the above case, and again that info is saved in section .text. Again, 0x2BD4-0x2000+0x200=0xDD4(raw file).
Here is it in Hex Editor,
And PE-bear tool provides a nice interpretation of this,
- If you look at file offset 0x168, for .NET Header, it says at address 0x2008(RVA). We will not analyze this in this tutorial but plan another tutorial.
- If you look at file offset 0x108, for Resource Directory, it says at address 0x4000(RVA). That is the area of section .rsrc. Now we calculate 0x4000-0x4000+0x1000=0x1000(raw file). We will examine that below, in sections chapter.
- If you look at file offset 0x120, for Base Relocation Table, it says at address 0x6000(RVA). That is the area of section .reloc. Now we calculate 0x6000-0x6000+0x1600=0x1600(raw file). We will examine that below, in sections chapter.
10 Section Headers (aka "Section Table")
Here are the Section Headers in the Hex editor:
Here is an analysis of the Section Header by tool PE-bear:
11 Multiple sections (aka "Sections")
11.1 Section Names
Sections can have any 8 character name starting with ".". But usual conventions are:
- .text – contains executable code and data
- .idata, .rdata – contains Import API
- .data, .bss– contains data
- .pdata – contains exception info
- .reloc – contains relocation info
- .rsrc – contains resources
- .debug – contains debug information
11.2 Section .text
Here is Section .text in Hex editor:
Interpretation
In our case, since this is a .NET assembly, this section contains
- Metadata
- Managed resources
- IL code
11.3 Section .rsrc
Here is Section .rsrc in Hex editor:
Here is an analysis of Section .rsrc by tool PE-bear:
Interpretation
- You can see both from Hex Editor and PE-bear analysis (seems like an unfinished app here?) that this section contains info about the version and the Application manifest.
- These are UNMANAGED resources. Managed .NET resources are inside the section .text.
11.4 Section .reloc
Here is Section .reloc in Hex editor:
Here is an analysis of Section .reloc by tool PE-bear:
12 Conclusion
We will finish here to make this tutorial of manageable size. We gave a basic description of the PE format, sufficient for a good technical overview, and the interested reader can find more details elsewhere. We didn’t dive into too many details in this article.
More details about PE File Format can be found at [6], [7], [8]. A very interesting slide illustrating different options of PE format can be found at [9].
13 References
- https://github.com/WerWolv/ImHex/releases/tag/v1.27.0
- https://hshrzd.wordpress.com/pe-bear/
- https://en.wikipedia.org/wiki/Comparison_of_executable_file_formats
- https://en.wikipedia.org/wiki/Portable_Executable
- https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?redirectedfrom=MSDN
- https://tech-zealots.com/malware-analysis/pe-portable-executable-structure-malware-analysis-part-2/
- https://resources.infosecinstitute.com/topic/2-malware-researchers-handbook-demystifying-pe-file/
- https://www.red-gate.com/simple-talk/blogs/anatomy-of-a-net-assembly-pe-headers/
- http://2.bp.blogspot.com/-SpKCuFfVJSU/UCL5rJhQ5AI/AAAAAAAAFjo/3TcOoqu-7X4/s1600/AwO4ffCCIAAdANF.png