PE parsing

Mar 27, 2025

A random page with some information of the PE format file and its main headers. Main usage: malware development and malware research.

Considerations

DOS header

We only want to know about the first and last members of this header:

This is what the old MS-DOS loader and the new Windows PE Loader do with this header:

DOS stub

The DOS stub is a MS-DOS program that prints an error message saying that the executable is not compatible with DOS, and exists. This is not executed in the modern Windows OS. This is what gets executed when the program is loaded in MS-DOS. If we copy the bytes of the DOS stub into IDA or any disassembler, we can see that the code routine is just to print the string and exit.

NT header (PE header/new executable header)

IMAGE_NT_HEADERSas defined in winnt.h.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
typedef struct _IMAGE_NT_HEADERS64 {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

There is one structure for 32-bit executables and other for 64-bit executables. The optional header differs, as can be seen in the struct.

Signature

Fixed value of 0x50450000 which translates to PE\0\0 in ASCII. Again, another magic number inside the executable. This is used by the loader to know that it has reached the correct section after querying e_lfanew from the DOS header.

File Header

Check the official microsoft docs for this struct here . Another struct that contains information about the PE file. Some of this information is relevant. Let’s see the struct:

1
2
3
4
5
6
7
8
9
typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

Details about the header:

Optional Header

This can be very confusing, but this header, called the optional header, is one of the most important headers in the PE. The PE loader looks for specific information from this header in order to load and run the executable. It’s called optional header as this header is not included in object files, but it is included in image files, as executables. It doesn’t have a fixed size, that’s why the IMAGE_FILE_HEADER.SizeOfOptionalHeader member exists.

As mentioned earlier, there are two versions of the Optional Header, one for 32-bit executables and one for 64-bit executables.
The two versions are different in two aspects:

We will focus in the 64 bit struct, as most of the malware we will create and parse will be of this type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
typedef struct _IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

Let’s talk about the elements (information gathered from the Official Microsoft docs ):

Data Directory

The optional header field has an array of IMAGE_DATA_DIRECTORY called DataDirectory, with a maximum size of 16 entries (specifed by the constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES):

1
    ___IMAGE_DATA_DIRECTORY DataDirectory[___IMAGE_NUMBEROF_DIRECTORY_ENTRIES];

An IMAGE_DATA_DIRETORY structure is defines as follows:

1
2
3
4
typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

It’s a very simple structure with only two members, first one being an RVA pointing to the start of that Data Directory and the second one being the size of that Data Directory.

But, what is a Data Directory? A Data Directory is a piece of data located within one of the sections of the PE file.
Data Directories contain useful information needed by the Windows loader. An example of a very important directory is the Import Directory, a data directory that contains the list of external functions imported from other libraries.

Here’s a list of Data Directories defined in winnt.h. (Each one of these values represents an index in the DataDirectory array):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Directory Entries
#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

So, for example, to access the Import Directory information we will have to:

We will obtain a struct containing the RVA and the size of such Data Directory. With that information, we can access such directory and parse it. Note that each directory will be parsed differently, depending on the information that it contains.

Also note that there can be data directories with no information. If we take a look at the contents of IMAGE_OPTIONAL_HEADER.DataDirectory of an actual PE file, we might see entries where both fields are set to 0:

Important: Data directories can be inside the sections (e.g, the Import Directory Table is usually inside the .idata or .rdata section). So, we can say that after the NT header, there are the section headers.dI

Section headers

After the PE header, the section headers are the following. They are the last headers in the PE. A Section Header is a structure named IMAGE_SECTION_HEADER defined in winnt.h as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

There will be one section header for each of the sections in the PE. Remember that we can retrieve the name of the sections:

1
    printf("[NT header][file header] number of sections: %u", this->PEFILE_NT_HEADERS.FileHeader.NumberOfSections);

These are the fields of a section header:

Raw Data can != Virtual size

SizeOfRawData and VirtualSize can be different, and this can happen for multiple of reasons.

SizeOfRawData (the size on disk) must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment. If the raw data size is less than that such value, the rest gets padded to match a multiple of the alignment. However, when the section is loaded into memory it doesn’t follow that alignment and only the actual size of the section is occupied. In this case SizeOfRawData will be greater than VirtualSize.

The opposite can happen as well.

If the section contains uninitialized data, these data won’t be accounted for on disk, but when the section gets mapped into memory, the section will expand to reserve memory space for when the uninitialized data gets later initialized and used.
This means that the section on disk will occupy less than it will do in memory, in this case VirtualSize will be greater than SizeOfRawData.

Sections

Lastly, the PE has the contents of the sections (.text, .data, .rdata). Some sections have special names that indicate their purpose, we’ll go over some of them, and a full list of these names can be found on the official Microsoft documentation  under the “Special Sections” section.

Import table

There is no rule that says that the import table must begin at the start of a section named .idata, but that’s how it is typically done, for reasons both traditional and practical.

The first field of the import table, VirtualAddress, is actually the RVA of the table. The RVA is the address of the table relative to the base address of the image when the table is loaded. The second field gives the size in bytes. The data directories, which form the last part of the optional header, are listed in the following table.

Note that the number of directories is not fixed. Before looking for a specific directory, check the NumberOfRvaAndSizes field in the optional header.

Also, do not assume that the RVAs in this table point to the beginning of a section or that the sections that contain specific tables have specific names.

If we navigate to the Section headers, we will see that the .rdata section will start before 2DC0C8:

But we can see that the import directory is not at the start of the section, but somewhere in the middle, as the .rdata section starts a bit before (0x26000) whereas the import directory starts at 0x2D0C8. The tool just says that the Import Directory is inside .rdata, but not at the start of it.

We need to translate the Import Directory RVA to the file offset - a place in the binary file where the DLL import information is stored. The way this can be achieved is by using the following formula:

Location of the Import Directory = imageBase + section.RawOffset + (importDirectory.RVA − section.VA)

Where:

Let’s think how to obtain all the values:

It consists of an array of IMAGE_IMPORT_DESCRIPTOR structures, each one of them is for a DLL.
It doesn’t have a fixed size, so the last IMAGE_IMPORT_DESCRIPTOR of the array is zeroed-out (NULL-Padded) to indicate the end of the Import Directory Table.

IMAGE_IMPORT_DESCRIPTOR is defined as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;

Get DLL name

We need to get the Name RVA (name does not point to the name, but contains a RVA) to a file offset using the technique we used earlier to get the location of the DLL name string. This time the formula we need to use is:

1
offset = imageBase + text.RawOffset + (nameRVA − section.VA)

Where nameRVA is Name RVA value for ADVAPI32.dll from the Import Directory and text.VA is the Virtual Address of the .text section.

Get DLL Import Address Table (imported functions)

TBD