1
0
mirror of synced 2024-11-23 23:31:02 +01:00
9 Pattern Language Guide
WerWolv edited this page 2022-08-17 15:53:52 +02:00

Pattern Language

The Pattern Language is ImHex custom built programming language used to create binary patterns/templates. These patterns are applied to a binary data in order to parse it and display the decoded values neatly in a tree-hierarchy. The syntax follows the the same style as other C-like languages and is therefore easy to read, understand, learn and use. This document is meant as an overview of all the features the Pattern Language has.

THIS DOCUMENTATION IS OUTDATED AND ONLY KEPT FOR LEGACY REASONS

Check out the new and much more complete Documentation over at https://imhex.werwolv.net/docs

Table of Contents

Comments

Comments are a simple way to add documentation or instructions for other developers to your code or to remove parts of it temporarily. There are two styles of comments available: Single line comments and multi line comments.

Single line comments are started with // two forward slashes and will include everything after them until the next new line. Multi line comments are started with /* a forward slash followed by a star and will include everything after them until a */ star followed by a forward slash are found. Multiple multiline comments cannot be nested.

/* This is a 
   multi line
   comment
*/

// This is a single line comment

Built-in Types

Built-in types are the fundamental types used in the language. Supported are various unsigned types, signed types, floating point types as well as a few special types.

Unsigned Types: u8, u16, u32, u64, u128 Signed Types: s8, s16, s32, s64, s128 Floating point Types: float, double Special Types: char, bool

Unsigned and signed types denote their size in bits in the name of the type. s8 is 1 byte long, u32 is 4 bytes long and so on. Floating point types use the same sizes and encodings as their host system which in most cases is 32 bit for floats and 64 bit for doubles with the IEEE 754 encoding. The special types char and bool are both one byte long and for the most part the same as s8 and u8. The only difference is, they produce a more relevant output in the pattern data view.

Endian specification

Every type may be prefixed with either be or le to set if this variable should be treated as big endian or little endian.

be u32 bigEndianVariable @ 0x00;
le u32 littleEndianVariable @ 0x00;
u32 defaultEndianVariable @ 0x00;

Variable Placements

To get started with extracting data from binary data, variables need to be defined and they need to be placed at some offset within the data. This is done using the following syntax:

<type> <variableName> @ <expression>;

// Example
u32 headerMagic @ 0x00;
s8 type @ 0x1234;

Doing this will cause 4 bytes at address 0x00 to 0x03 to be parsed as an unsigned 32 bit value and 1 byte at offset 0x1234 to be parsed as an unsigned 8 bit value. These results will then be displayed in the Pattern Data View within ImHex.

Variable Placement

Arrays

Arrays are used to parse a list of values that all share the same type and are placed contiguously in memory. To place an array at a specific offset, again the variable placement syntax may be used in combination with the array syntax.

<type> <variableName>[<expression>] @ <expression>;

// Example
u32 ids[0x100] @ 0x50;

This will cause a new branch node to appear which contains the decoded values of all entries within the array.

Arrays

Strings

Strings are a special kind of array that do not necessarily need to have a size specified. They can be created by specifying a array of chars

char sizedString[13];
char unsizedString[];

If no size is specified, string will end at the next null terminator 0x00.

Strings

Structs

Structs can be used to group multiple types together in order to form a new type. All members of the struct will be placed right after each other in memory with no padding inserted between them. Therefore the size of the complete struct will be the sizes of all members summed up.

struct <typeName> {
  <variableDeclaration>
  ...
};

// Example
struct Header {
  u8 magic[4];
  u32 type;
  bool flag;
};

Header header @ 0x00;

This code will create a new type named Header which again may be placed at any point in memory using the variable placement syntax. Multiple structs can also be nested to create more complex types all of which create a new branch node in the Pattern Data View.

Struct

Padding

If padding between members is needed, it may be manually inserted using the padding keyword.

struct PaddedData {
  u8 index;
  padding[7];
  u64 height;
  u32 checksum;
};

This will create a 7 byte gap between the index and height member which will not be displayed in the Pattern Data View

Unions

Syntactically, unions look and work exactly the same as structs. The difference however is that all members are placed at the same address on top of each other in contrast to the struct where all members are placed after each other (The same as in C/C++). Therefore the size of the union will be the size of the biggest member within in union.

union <typeName> {
  <variableDeclaration>
  ...
};

// Example
union Color {
  u32 rgba;
  u8 components[4];
};

Color color @ 0x100;

Union

Pointers

A pointer is a member that points to another place in memory. It uses the value at its address as an offset from the start of the current data to find the location of the value that it points to. To define a pointer, first the type of the value being pointed to is specified followed by a * star and the name of the variable. After the : colon, the size of the pointer is required. This needs to be an integral, built-in type which specifies what data gets interpreted as an offset.

<typeName> *<pointerName> : <builtinTypeName>;

//Example
struct Child {
  u32 value;
};

struct Parent {
  Child *child : u16;
};

Parent parent @ 0x200;

Pointer

Enums

Enums are types whose value is restricted to a distinct number of values. When placed in memory, the Pattern Data View will show the relevant enum entry name instead of the numerical value.

Every enum has an underlying type which is used to specify the size of the enum when placed in memory. u32 will create a 4 byte enum, s8 will create a 1 byte enum.

Every enum entry can be set to a distinct value using the <identifier> = <expression> syntax as seen below. If no value is specified for an entry, it's value will be the value of the last entry plus one. Counting starts at zero.

enum <typeName> : <builtinTypeName> {
  <enumEntry>
  ...
};

// Example
enum Architecture : u8 {
  x86 = 0x20,
  x64, // Value 0x21
  ARM32 = 0x35,
  ARM64 // Value 0x36
};

Architecture arch @ 0x100;

Enum

Bitfields

If you're trying to parse a region of memory that is not aligned to the usual 8 bit boundaries or has variables that are smaller than one 8 bits (such as bit flags), a bitfield can be used. Bitfields allow variables to be specified with a custom number bits used. This is done by using the <identifier> : <expression> syntax where the identifier before the colon specifies the field name and the expression after the colon the size of the field in bits. There is no padding inserted between members, however the size of the entire bitfield will be rounded up to the next 8 bit boundary.

bitfield <typeName> {
  <bitfieldEntry>
  ...
};

// Example
bitfield Permission {
  r : 1;
  w : 1;
  x : 1;
};

Permission perm @ 0x20;

Bitfields

Type Aliasing

To give an existing type a new name, a using declaration can be used. This will not replace the old name of the type with a new one, it will create a new type with a new name that is the same as the old type. Therefore both can be used afterwards.

using <newTypeName> = <oldTypeName>;

// Example
using uint32_t = u32;
using Header = ElfHeader;

Attributes

Attributes are a way to change extra settings about variables.

<type> <variableName> [[attributeName("attributeValue")]];

// Example
struct Test {
  u32 magic [[name("Header Magic")]];
  u8 type [[comment("Test type")]];
}

Available attributes are:

  • [[name("New name")]]
    • Overrides the name of the variable displayed in the pattern data view
  • [[color("FF00FFFF")]]
    • Overrides the color of the variable. The value is a RGBA8 color
  • [[comment("Comment")]]
    • Adds a comment to a variable that appears as tooltip when hovered over it in the pattern data view

Mathematical Expressions

In any place where a numeric value is required, a mathematical expression can be inserted. This can be as easy as 1 + 1 but can get much more complex as well by accessing values within structs or enum constants. These expressions work the same as in basically every other language as well with the following operators being supported:

  • a + b : Addition
  • a - b : Subtraction
  • a * b : Multiplication
  • a / b : Division
  • a % b : Modulus
  • a >> b : Bit shift left
  • a << b : Bit shift right
  • a & b : Bitwise AND
  • a | b : Bitwise OR
  • a ^ b : Bitwise XOR
  • a == b : Equality comparison
  • a != b : Inequality comparison
  • a > b : Greater-than comparison
  • a >= b : Greater-than-or-equals comparison
  • a < b : Less-than comparison
  • a <= b : Less-than-or-equals comparison
  • a && b : Boolean AND
  • a || b : Boolean OR
  • a ^^ b : Boolean XOR
  • a ? b : c : Ternary comparison
  • $ : Current offset

Additionally, variable names and the dot . operator may be used access the value of variables in these expressions.

struct SubHeader {
  u16 numEntries;
};

struct Entry {
  // ...
};

struct Header {
  u32 magic;
  SubHeader subHeader;
  Entry entries[subHeader.numEntries + 5];
};

To use constants in an expression, the :: scope resolution operator can be used.

enum Offsets {
  Header = 0x00,
  SectionList = 0x1000,
  StringList = 0x5000
};

Section sections[10] @ Offsets::SectionList;

As seen above, this may be used to create arrays whose size depends on the value of other members and similar things.

Built-in Function calls

Additional functionality for mathematical expressions are provided through built-in functions. All built-in functions take in zero or more numerical values as parameter and return a new numerical value as a result.

ElfHeader header @ findSequence(0, 0x7F, 'E', 'L', 'F');

The following functions are currently supported:

  • u64 findSequence(u32 index, u8 ... bytes)
    • Finds the Nth occurrence (specified by the index parameter) of the list of bytes provided afterwards.
    • The address at which this sequence was found will be returned
  • u(size * 8) readUnsigned(u64 address, u8 size)
    • Reads size bytes at address and returns their unsigned value
    • Allowed sizes are 1, 2, 4, 8 and 16
  • s(size * 8) readSigned(u64 address, u8 size)
    • Reads size bytes at address and returns their signed value
    • Allowed sizes are 1, 2, 4, 8 and 16
  • u64 addressof(string path)
    • Returns the address of a variable
  • u64 nextAfter(string path)
    • Returns the address of the first byte after a variable
  • u64 alignTo(u64 value, u64 alignment)
    • Returns value aligned to alignment

Conditionals

Sometimes structs may have different members depending on some condition. This is where if, else and else if statements come into play. Inside structs and unions, these may be used to only evaluate certain members if some condition is met.

enum Type : u8 {
  Height,
  PValue,
  IValue,
  DValue
};

struct Message {
  Type type;

  if (type == Type::Height) {
    u8 index;
    u32 height;
  }
  else if (type == PValue || type == IValue)
    float value;
  else if (type == DValue)
    double value;
};

Message messages[10] @ 0x100;

Preprocessor

The pre-processor can be used to modify the source code before it's even being processed by the lexer.

Defines

A define preprocessor instruction replaces a name with something else.

For example, the following statement will cause the preprocessor to replace every occurrence of the sequence PI with 3.14159265. It is not aware of any syntax of the language, it's a simple find and replace.

#define PI 3.14159265

Includes

A include directive takes the content of another file mentioned in the directive and pastes its content into the current file.

#include <cstdint.hexpat>
// or
#include "cstdint.hexpat"

It can be used to add the content of files found in the includes folder or any files relative to it.

Pragmas

pragmas are meta-instructions used to configure the Pattern Language evaluator or ImHex in general. The following pragma directives are available:

  • #pragma endian [little|big|native]
    • Sets the default endianess of all variables created to big, little or native endian
  • #pragma MIME <mime/type>
    • Sets the MIME type of the files this pattern is relevant for.
    • If this file is present in the patterns folder and a file is loaded that matches this MIME type, ImHex will ask the user if they want to load this pattern.