docs/templates/pages/packet.html

1026 lines
31 KiB
HTML
Raw Normal View History

2022-04-11 20:27:15 +02:00
{% extends "konami.html" %}
2021-12-29 02:41:21 +01:00
{% block title %}Packet format{% endblock %}
2021-12-28 21:54:12 +01:00
{% block body %}
<h1>Packet format</h1>
2021-12-28 23:29:33 +01:00
<p>e-Amusement uses XML for its application layer payloads. This XML is either verbatim, or in a custom packed binary
format.</p>
2021-12-28 21:54:12 +01:00
<h2 id="xml">The XML format</h2>
<p>Each tag that contains a value has a <code>__type</code> attribute that identifies what type it is. Array types
have a <code>__count</code> attribute indicating how many items are in the array. Binary blobs additionally have
a <code>__size</code> attribute indicating their length (this is notably not present on strings, however).</p>
<p>It is perhaps simpler to illustrate with an example, so:</p>
2021-12-28 23:29:33 +01:00
<pre>{% highlight 'xml' %}
<?xml version='1.0' encoding='UTF-8'?>
<call model="KFC:J:A:A:2019020600" srcid="1000" tag="b0312077">
<eventlog method="write">
<retrycnt __type="u32" />
<data>
<eventid __type="str">G_CARDED</eventid>
<eventorder __type="s32">5</eventorder>
<pcbtime __type="u64">1639669516779</pcbtime>
<gamesession __type="s64">1</gamesession>
<strdata1 __type="str" />
<strdata2 __type="str" />
<numdata1 __type="s64">1</numdata1>
<numdata2 __type="s64" />
<locationid __type="str">ea</locationid>
</data>
</eventlog>
</call>
{% endhighlight %}</pre>
2021-12-28 21:54:12 +01:00
<p>Arrays are encoded by concatenating every value together, with spaces between them. Data types that have multiple
values, are serialized similarly.</p>
<p>Therefore, an element storing an array of <code>3u8</code> (<code>[(1, 2, 3), (4, 5, 6)]</code>) would look like
this</p>
2021-12-28 23:29:33 +01:00
<pre>{% highlight 'xml' %}
<demo __type="3u8" __count="2">1 2 3 4 5 6</demo>
{% endhighlight %}</pre>
2021-12-28 21:54:12 +01:00
<p>Besides this, this is otherwise a rather standard XML.</p>
<h2 id="binary">Packed binary overview</h2>
<p>Many packets, rather than using a string-based XML format, use a custom binary packed format instead. While it
can be a little confusing, remembering that this is encoding an XML tree can make it easier to parse.</p>
<p>To start with, let's take a look at the overall structure of the packets.</p>
<table class="code">
<thead>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
</thead>
<tr>
<td><i>A0</i></td>
<td>C</td>
<td>E</td>
<td>~E</td>
<td colspan="4">Head length</td>
<td style="border-bottom: none" colspan="8"></td>
</tr>
<tr>
<td style="border-top: none; border-bottom: none;" colspan="16">Schema definition</td>
</tr>
<tr>
<td style="border-top: none;" colspan="12"></td>
<td colspan="1"><i>FF</i></td>
<td colspan="3">Align</td>
</tr>
<tr>
<td colspan="4">Data length</td>
<td style="border-bottom: none" colspan="12"></td>
</tr>
<tr>
<td style="border-top: none; border-bottom: none;" colspan="16">Payload</td>
</tr>
<tr>
<td style="border-top: none;" colspan="13"></td>
<td colspan="3">Align</td>
</tr>
</table>
<p>Every packet starts with the magic byte <code>0xA0</code>. Following this is the content byte, the encoding byte,
and then the 2's compliment of the encoding byte.</p>
2022-01-17 20:40:43 +01:00
<p>Possible values for the content byte are:</p>
2021-12-28 21:54:12 +01:00
<table>
<thead>
<tr>
<td>C</td>
<td>Content</td>
</tr>
</thead>
<tr>
<td><code>0x42</code></td>
2022-01-13 20:41:04 +01:00
<td>Packed names, contains data</td>
2021-12-28 21:54:12 +01:00
</tr>
<tr>
<td><code>0x43</code></td>
2022-01-13 20:41:04 +01:00
<td>Packed names, schema only</td>
2021-12-28 21:54:12 +01:00
</tr>
<tr>
<td><code>0x45</code></td>
2022-01-13 20:41:04 +01:00
<td>Full names, contains data</td>
2021-12-28 21:54:12 +01:00
</tr>
<tr>
<td><code>0x46</code></td>
2022-01-13 20:41:04 +01:00
<td>Full names, schema only</td>
2021-12-28 21:54:12 +01:00
</tr>
</table>
2022-01-17 20:40:43 +01:00
<details>
<summary>Source code details</summary>
<p>Not totally cleaned these up yet, but the general concept of how packets are parsed can be seen fairly clearly.
At a high level, we have a single function that validates the header, parses out the schema, then goes to read
the body of the packet, if we're expecting it. The arguments to <code>parse_packet_header</code> will make more
sense in a moment.</p>
<figure>
<img src="./images/parse_packet.png" />
<figcaption><code>libavs-win32.dll:0x1003483</code></figcaption>
</figure>
<p><code>parse_packet_header</code> has a lot of things going on, so I'm just pulling out a few important snippets
here.</p>
<figure>
<img src="./images/parse_packet_header_a.png" /><br>
<img src="./images/parse_packet_header_b.png" /><br>
<img src="./images/parse_packet_header_c.png" />
<figcaption><code>libavs-win32.dll:0x1003448c</code></figcaption>
</figure>
<p>We first read out four bytes from the start of the packet, and convert that to an integer; nothing especially
magic here. The next block however is potentially not the first that you might have expected to see. Based on
the two flags passed into the function arguments, we are going to subtract a value from this header.
Specifically, the first byte we subtract is always <code>0xa0</code>, then the second byte are those
<code>C</code> value in the table above.
</p>
<p>Finally, we mask out the first two bytes, and assert that they're both null. That is, they are exactly equal to
the value we subtracted from them. Of note here is that the caller to this function "decides" what sort of
packet it is expecting.</p>
<p>We can also see the check for <code>~E</code> here. If that check passes, we return the <code>E</code> byte,
otherwise we're going to error.</p>
</details>
2021-12-28 21:54:12 +01:00
<p>The encoding flag indicates the encoding for all string types in the packet (more on those later). Possible
values are:</p>
<table>
<thead>
<tr>
<td><code>E</code></td>
<td><code>~E</code></td>
<td colspan="3">Encoding name</td>
</tr>
</thead>
2022-01-14 17:42:00 +01:00
<tr>
<td><code>0x00</code></td>
<td><code>0xFF</code></td>
<td>None</td>
<td></td>
<td></td>
</tr>
2021-12-28 21:54:12 +01:00
<tr>
<td><code>0x20</code></td>
<td><code>0xDF</code></td>
<td><code>ASCII</code></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>0x40</code></td>
<td><code>0xBF</code></td>
<td><code>ISO-8859-1</code></td>
<td><code>ISO_8859-1</code></td>
<td></td>
</tr>
<tr>
<td><code>0x60</code></td>
<td><code>0x9F</code></td>
<td><code>EUC-JP</code></td>
<td><code>EUCJP</code></td>
<td><code>EUC_JP</code></td>
</tr>
<tr>
<td><code>0x80</code></td>
<td><code>0x7F</code></td>
<td><code>SHIFT-JIS</code></td>
<td><code>SHIFT_JIS</code></td>
<td><code>SJIS</code></td>
</tr>
<tr>
<td><code>0xA0</code></td>
<td><code>0x5F</code></td>
<td><code>UTF-8</code></td>
<td><code>UTF8</code></td>
<td></td>
</tr>
</table>
2022-01-14 17:42:00 +01:00
<p>Data is assumed by default to be in ISO 8859 encoding. That is, for encodings <code>0x00</code> and
2022-01-17 20:40:43 +01:00
<code>0x40</code>, no transformation is performed on the binary data to produce readable text.
</p>
2022-01-14 17:42:00 +01:00
<p>ASCII encoding is true 7-bit ASCII, with the 8th bit always set to 0. This is validated.</p>
2021-12-28 21:54:12 +01:00
<details>
<summary>Source code details</summary>
<p>The full table for these values can be found in libavs.</p>
<figure>
<img src="./images/encoding_table.png">
<figcaption><code>libavs-win32.dll:0x1006b960</code></figcaption>
</figure>
<p>A second table exists just before this on in the source, responsible for the
<code>&lt;?xml version='1.0' encoding='??'?&gt;</code> line in XML files.
</p>
<figure>
<img src="./images/xml_encoding_table.png">
<figcaption><code>libavs-win32.dll:0x1006b940</code></figcaption>
</figure>
<p>This is indexed using the following function, which maps the above encoding IDs to 1, 2, 3, 4 and 5
respectively.</p>
2021-12-28 23:29:33 +01:00
<pre>{% highlight "c" %}char* xml_get_encoding_name(uint encoding_id) {
2021-12-28 21:54:12 +01:00
return ENCODING_NAME_TABLE[((encoding_id & 0xe0) >> 5) * 4];
2021-12-28 23:29:33 +01:00
}{% endhighlight %}</pre>
2021-12-28 21:54:12 +01:00
</details>
<p>While validating <code>~E</code> isn't technically required, it acts as a useful assertion that the packet being
parsed is valid.</p>
<h2 id="schema">The packet schema header</h2>
<p>Following the 4 byte header, is a 4 byte integer containing the length of the next part of the header (this is
technically made redundant as this structure is also terminated).</p>
<p>This part of the header defines the schema that the main payload uses.</p>
2022-01-17 20:40:43 +01:00
<p>A tag definition follows one of the following three formats:</p>
<ul>
<li>
<p>Compressed names:</p>
<table class="code">
<thead>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
</thead>
<tr>
<td>Type</td>
<td>nlen</td>
<td colspan="7">Tag name</td>
<td style="border-bottom: none" colspan="7"></td>
</tr>
<tr>
<td style="border-top: none;" colspan="15">Attributes and children</td>
<td colspan="1"><i>FE</i></td>
</tr>
</table>
</li>
<li>
<p>Full names, short length:</p>
<table class="code">
<thead>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
</thead>
<tr>
<td>Type</td>
<td>0x40-0x64</td>
<td colspan="7">Tag name</td>
<td style="border-bottom: none" colspan="7"></td>
</tr>
<tr>
<td style="border-top: none;" colspan="15">Attributes and children</td>
<td colspan="1"><i>FE</i></td>
</tr>
</table>
</li>
<li>
<p>Full names, long length:</p>
<table class="code">
<thead>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
</thead>
<tr>
<td>Type</td>
<td>0x80-0x8f</td>
<td>0x00-0xff</td>
<td colspan="7">Tag name</td>
<td style="border-bottom: none" colspan="6"></td>
</tr>
<tr>
<td style="border-top: none;" colspan="15">Attributes and children</td>
<td colspan="1"><i>FE</i></td>
</tr>
</table>
</li>
</ul>
2021-12-28 21:54:12 +01:00
2022-01-17 20:40:43 +01:00
<p>The encoding of structure names varies depending on the packet content byte. If the content flag indicated we have a
full string, we first need to check if the value of the first byte exceeds <code>0x7f</code>. If it does, we need to
read an additional byte. In the single byte case, we subtract <code>0x3f</code><sup>1</sup> to get our real length.
In the two byte case we subtract <code>0x7fbf</code><sup>2</sup>. In the latter case, the maximum allowed length is
<code>0x1000</code>.<br>
<small><sup>1</sup> simplified from <code>(length & ~0x40) + 0x01</code></small><br>
<small><sup>2</sup> simplified from <code>(length & ~0x8000) + 0x41</code></small>
2022-01-13 20:41:04 +01:00
</p>
<p>If we are instead parsing packed names, then the names are encoded as densely packed 6 bit values. The length prefix
(<code>nlen</code>) determines the length of the final unpacked string. The acceptable alphabet is
<code>0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</code>, and the packed values are indecies
2022-01-17 20:40:43 +01:00
within this alphabet. The maximum length for a name in this mode is 36 bytes (<code>0x24</code>).
2022-01-13 20:41:04 +01:00
</p>
2021-12-28 21:54:12 +01:00
<p>The children can be a combination of either attribute names, or child tags. Attribute names are represented by
the byte <code>0x2E</code> followed by a length prefixed name as defined above. Child tags follow the above
2022-01-17 20:40:43 +01:00
format. Type <code>0x2E</code> must therefore be considered reserved as a possible structure type. As they carry
special meaning in text-bsaed XML encoding, attribute names beginning with <code>__</code> are disallowed.</p>
<details>
<summary>Source code details</summary>
<p>I'm not going to labour this one, so if you want to go look yourself:</p>
<ul>
<li>6-packed name reader: <code>libavs-win32.dll:0x10009f90</code></li>
<li>Unpacked name reader: <code>libavs-win32.dll:0x1000a110</code></li>
<li>The call to the above: <code>libavs-win32.dll:0x10034a57</code>, with the <code>__</code> checking starting
at <code>libavs-win32:0x10034cfd</code> for attributes (i.e. the <code>JZ</code> at <code>0x10034a7c</code>)
</li>
</ul>
</details>
2021-12-28 21:54:12 +01:00
<p>Attributes (type <code>0x2E</code>) represent a string attribute. Any other attribute must be defined as a child
tag. Is it notable that 0 children is allowable, which is how the majority of values are encoded.</p>
2022-01-17 20:40:43 +01:00
2021-12-28 21:54:12 +01:00
<p>All valid IDs, and their respective type, are listed in the following table. The bucket column here will be
used later when unpacking the main data, so we need not worry about it for now, but be warned it exists and is
possibly the least fun part of this format.</p>
<table class="code">
<thead>
<tr>
<td>ID</td>
<td>Bytes</td>
<td>C type</td>
<td>Bucket</td>
<td colspan="2">XML names</td>
<td></td>
<td>ID</td>
<td>Bytes</td>
<td>C type</td>
<td>Bucket</td>
<td colspan="2">XML names</td>
</tr>
</thead>
<tr>
<td>0x01</td>
<td>0</td>
<td>void</td>
<td>-</td>
<td>void</td>
<td></td>
<td></td>
<td>0x21</td>
<td>24</td>
<td>uint64[3]</td>
<td>int</td>
<td>3u64</td>
<td></td>
</tr>
<tr>
<td>0x02</td>
<td>1</td>
<td>int8</td>
<td>byte</td>
<td>s8</td>
<td></td>
<td></td>
<td>0x22</td>
<td>12</td>
<td>float[3]</td>
<td>int</td>
<td>3f</td>
<td></td>
</tr>
<tr>
<td>0x03</td>
<td>1</td>
<td>uint8</td>
<td>byte</td>
<td>u8</td>
<td></td>
<td></td>
<td>0x23</td>
<td>24</td>
<td>double[3]</td>
<td>int</td>
<td>3d</td>
<td></td>
</tr>
<tr>
<td>0x04</td>
<td>2</td>
<td>int16</td>
<td>short</td>
<td>s16</td>
<td></td>
<td></td>
<td>0x24</td>
<td>4</td>
<td>int8[4]</td>
<td>int</td>
<td>4s8</td>
<td></td>
</tr>
<tr>
<td>0x05</td>
<td>2</td>
<td>uint16</td>
<td>short</td>
2022-01-14 16:01:54 +01:00
<td>u16</td>
2021-12-28 21:54:12 +01:00
<td></td>
<td></td>
<td>0x25</td>
<td>4</td>
<td>uint8[4]</td>
<td>int</td>
<td>4u8</td>
<td></td>
</tr>
<tr>
<td>0x06</td>
<td>4</td>
<td>int32</td>
<td>int</td>
<td>s32</td>
<td></td>
<td></td>
<td>0x26</td>
<td>8</td>
<td>int16[4]</td>
<td>int</td>
<td>4s16</td>
<td></td>
</tr>
<tr>
<td>0x07</td>
<td>4</td>
<td>uint32</td>
<td>int</td>
<td>u32</td>
<td></td>
<td></td>
<td>0x27</td>
<td>8</td>
<td>uint8[4]</td>
<td>int</td>
2022-01-13 20:41:04 +01:00
<td>4u16</td>
2021-12-28 21:54:12 +01:00
<td></td>
</tr>
<tr>
<td>0x08</td>
<td>8</td>
<td>int64</td>
<td>int</td>
<td>s64</td>
<td></td>
<td></td>
<td>0x28</td>
<td>16</td>
<td>int32[4]</td>
<td>int</td>
<td>4s32</td>
<td>vs32</td>
</tr>
<tr>
<td>0x09</td>
<td>8</td>
<td>uint64</td>
<td>int</td>
<td>u64</td>
<td></td>
<td></td>
<td>0x29</td>
<td>16</td>
<td>uint32[4]</td>
<td>int</td>
<td>4u32</td>
<td>vs32</td>
</tr>
<tr>
<td>0x0a</td>
<td><i>prefix</i></td>
<td>char[]</td>
<td>int</td>
<td>bin</td>
<td>binary</td>
<td></td>
<td>0x2a</td>
<td>32</td>
<td>int64[4]</td>
<td>int</td>
<td>4s64</td>
<td></td>
</tr>
<tr>
<td>0x0b</td>
<td><i>prefix</i></td>
<td>char[]</td>
<td>int</td>
<td>str</td>
<td>string</td>
<td></td>
<td>0x2b</td>
<td>32</td>
<td>uint64[4]</td>
<td>int</td>
<td>4u64</td>
<td></td>
</tr>
<tr>
<td>0x0c</td>
<td>4</td>
<td>uint8[4]</td>
<td>int</td>
<td>ip4</td>
<td></td>
<td></td>
<td>0x2c</td>
<td>16</td>
<td>float[4]</td>
<td>int</td>
<td>4f</td>
<td>vf</td>
</tr>
<tr>
<td>0x0d</td>
<td>4</td>
<td>uint32</td>
<td>int</td>
<td>time</td>
<td></td>
<td></td>
<td>0x2d</td>
<td>32</td>
<td>double[4]</td>
<td>int</td>
<td>4d</td>
<td></td>
</tr>
<tr>
<td>0x0e</td>
<td>4</td>
<td>float</td>
<td>int</td>
<td>float</td>
<td>f</td>
<td></td>
<td>0x2e</td>
<td><i>prefix</i></td>
<td>char[]</td>
<td>int</td>
<td>attr</td>
<td></td>
</tr>
<tr>
<td>0x0f</td>
<td>8</td>
<td>double</td>
<td>int</td>
<td>double</td>
<td>d</td>
<td></td>
<td>0x2f</td>
<td>0</td>
<td></td>
<td>-</td>
<td>array</td>
<td></td>
</tr>
<tr>
<td>0x10</td>
<td>2</td>
<td>int8[2]</td>
<td>short</td>
<td>2s8</td>
<td></td>
<td></td>
<td>0x30</td>
<td>16</td>
<td>int8[16]</td>
<td>int</td>
<td>vs8</td>
<td></td>
</tr>
<tr>
<td>0x11</td>
<td>2</td>
<td>uint8[2]</td>
<td>short</td>
<td>2u8</td>
<td></td>
<td></td>
<td>0x31</td>
<td>16</td>
<td>uint8[16]</td>
<td>int</td>
<td>vu8</td>
<td></td>
</tr>
<tr>
<td>0x12</td>
<td>4</td>
<td>int16[2]</td>
<td>int</td>
<td>2s16</td>
<td></td>
<td></td>
<td>0x32</td>
<td>16</td>
<td>int8[8]</td>
<td>int</td>
<td>vs16</td>
<td></td>
</tr>
<tr>
<td>0x13</td>
<td>4</td>
<td>uint16[2]</td>
<td>int</td>
2022-01-13 20:41:04 +01:00
<td>2u16</td>
2021-12-28 21:54:12 +01:00
<td></td>
<td></td>
<td>0x33</td>
<td>16</td>
<td>uint8[8]</td>
<td>int</td>
<td>vu16</td>
<td></td>
</tr>
<tr>
<td>0x14</td>
<td>8</td>
<td>int32[2]</td>
<td>int</td>
<td>2s32</td>
<td></td>
<td></td>
<td>0x34</td>
<td>1</td>
<td>bool</td>
<td>byte</td>
<td>bool</td>
<td>b</td>
</tr>
<tr>
<td>0x15</td>
<td>8</td>
<td>uint32[2]</td>
<td>int</td>
<td>2u32</td>
<td></td>
<td></td>
<td>0x35</td>
<td>2</td>
<td>bool[2]</td>
<td>short</td>
<td>2b</td>
<td></td>
</tr>
<tr>
<td>0x16</td>
<td>16</td>
<td>int16[2]</td>
<td>int</td>
<td>2s64</td>
<td>vs64</td>
<td></td>
<td>0x36</td>
<td>3</td>
<td>bool[3]</td>
<td>int</td>
<td>3b</td>
<td></td>
</tr>
<tr>
<td>0x17</td>
<td>16</td>
<td>uint16[2]</td>
<td>int</td>
<td>2u64</td>
<td>vu64</td>
<td></td>
<td>0x37</td>
<td>4</td>
<td>bool[4]</td>
<td>int</td>
<td>4b</td>
<td></td>
</tr>
<tr>
<td>0x18</td>
<td>8</td>
<td>float[2]</td>
<td>int</td>
<td>2f</td>
<td></td>
<td></td>
<td>0x38</td>
<td>16</td>
<td>bool[16]</td>
<td>int</td>
<td>vb</td>
<td></td>
</tr>
<tr>
<td>0x19</td>
<td>16</td>
<td>double[2]</td>
<td>int</td>
<td>2d</td>
<td>vd</td>
<td></td>
<td>0x38</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1a</td>
<td>3</td>
<td>int8[3]</td>
<td>int</td>
<td>3s8</td>
<td></td>
<td></td>
<td>0x39</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1b</td>
<td>3</td>
<td>uint8[3]</td>
<td>int</td>
<td>3u8</td>
<td></td>
<td></td>
<td>0x3a</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1c</td>
<td>6</td>
<td>int16[3]</td>
<td>int</td>
<td>3s16</td>
<td></td>
<td></td>
<td>0x3b</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1d</td>
<td>6</td>
<td>uint16[3]</td>
<td>int</td>
2022-01-13 20:41:04 +01:00
<td>3u16</td>
2021-12-28 21:54:12 +01:00
<td></td>
<td></td>
<td>0x3c</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1e</td>
<td>12</td>
<td>int32[3]</td>
<td>int</td>
<td>3s32</td>
<td></td>
<td></td>
<td>0x3d</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x1f</td>
<td>12</td>
<td>uint32[3]</td>
<td>int</td>
<td>3u32</td>
<td></td>
<td></td>
<td>0x3e</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x20</td>
<td>24</td>
<td>int64[3]</td>
<td>int</td>
<td>3s64</td>
<td></td>
<td></td>
<td>0x3f</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
<p>Strings should be encoded and decoded according to the encoding specified in the packet header. Null termination is
optional, however should be stripped during decoding.</p>
<p>All of these IDs are <code>& 0x3F</code>. Any value can be turned into an array by setting the 7<sup>th</sup> bit
high (<code>| 0x40</code>). Arrays of this form, in the data section, will be an aligned <code>size: u32</code>
2022-01-17 20:40:43 +01:00
immediately followed by <code>size</code> bytes' worth of (unaligned!) values of the unmasked type. Despite being a
<code>u32</code>, the maximum length allowed is <code>0xffffff</code>.
</p>
2021-12-28 21:54:12 +01:00
<details>
<summary>Source code details</summary>
<p>The full table for these values can be found in libavs. This table contains the names of every tag, along
with additional information such as how many bytes that data type requires, and which parsing function
should be used.</p>
<figure>
<img src="./images/types_table.png">
<figcaption><code>libavs-win32.dll:0x100782a8</code></figcaption>
</figure>
</details>
<details>
<summary>Note about the <code>array</code> type:</summary>
<p>While I'm not totally sure, I have a suspicion this type is used internally as a pseudo-type. Trying to
identify its function as a parsable type has some obvious blockers:</p>
<p>All of the types have convenient <code>printf</code>-using helper functions that are used to emit them when
serializing XML. All except one.</p>
<img src="./images/no_array.png">
<p>If we have a look inside the function that populates node sizes (<code>libavs-win32.dll:0x1000cf00</code>),
it has an explicit case, however is the same fallback as the default case.</p>
<img src="./images/no_array_2.png">
<p>In the same function, however, we can find a second (technically first) check for the array type.</p>
<img src="./images/yes_array.png">
<p>This seems to suggest that internally arrays are represented as a normal node, with the <code>array</code>
type, however when serializing it's converted into the array types we're used to (well, will be after the
2022-01-17 20:40:43 +01:00
next sections) by masking <code>0x40</code> onto the contained type.</p>
2021-12-28 21:54:12 +01:00
<p>Also of interest from this snippet is the fact that <code>void</code>, <code>bin</code>, <code>str</code>,
and <code>attr</code> cannot be arrays. <code>void</code> and <code>attr</code> make sense, however
<code>str</code> and <code>bin</code> are more interesting. I suspect this is because konami want to be able
to preallocate the memory, which wouldn't be possible with these variable length structures.
</p>
</details>
<h2 id="data">The data section</h2>
<p>This is where all the actual packet data is. For the most part, parsing this is the easy part. We traverse our
schema, and read values out of the packet according to the value indicated in the schema. Unfortunately, konami
decided all data should be aligned very specifically, and that gaps left during alignment should be backfilled
later. This makes both reading and writing somewhat more complicated, however the system can be fairly easily
understood.</p>
<p>Firstly, we divide the payload up into 4 byte chunks. Each chunk can be allocated to either store individual
bytes, shorts, or ints (these are the buckets in the table above). When reading or writing a value, we first
check if a chunk allocated to the desired type's bucket is available and has free/as-yet-unread space within it.
If so, we will store/read our data to/from there. If there is no such chunk, we claim the next unclaimed chunk
for our bucket.</p>
<p>For example, imagine we write the sequence <code>byte, int, byte, short, byte, int, short</code>. The final output
should look like:</p>
<table class="code">
<thead>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
<td>15</td>
</tr>
</thead>
<tr>
<td>byte</td>
<td>byte</td>
<td>byte</td>
<td></td>
<td colspan="4">int</td>
<td colspan="2">short</td>
<td colspan="2">short</td>
<td colspan="4">int</td>
</tr>
</table>
<p>While this might seem a silly system compared to just not aligning values, it is at least possible to intuit that it
helps reduce wasted space. It should be noted that any variable-length structure, such as a string or an array,
claims all chunks it encroaches on for the <code>int</code> bucket, disallowing the storage of bytes or shorts
within them.</p>
<details>
<summary>Implementing a packer</summary>
<p>While the intuitive way to understand the packing algorithm is via chunks and buckets, a far more efficient
implementation can be made that uses three pointers. Rather than try to explain in words, hopefully this python
implementation should suffice as explanation:
2021-12-28 23:29:33 +01:00
<pre>{% highlight "python" %}class Packer:
2021-12-28 21:54:12 +01:00
def __init__(self, offset=0):
self._word_cursor = offset
self._short_cursor = offset
self._byte_cursor = offset
self._boundary = offset % 4
def _next_block(self):
self._word_cursor += 4
return self._word_cursor - 4
def request_allocation(self, size):
if size == 0:
return self._word_cursor
elif size == 1:
if self._byte_cursor % 4 == self._boundary:
self._byte_cursor = self._next_block() + 1
else:
self._byte_cursor += 1
return self._byte_cursor - 1
elif size == 2:
if self._short_cursor % 4 == self._boundary:
self._short_cursor = self._next_block() + 2
else:
self._short_cursor += 2
return self._short_cursor - 2
else:
old_cursor = self._word_cursor
for _ in range(math.ceil(size / 4)):
self._word_cursor += 4
return old_cursor
def notify_skipped(self, no_bytes):
for _ in range(math.ceil(no_bytes / 4)):
2021-12-28 23:29:33 +01:00
self.request_allocation(4){% endhighlight %}</pre>
2021-12-28 21:54:12 +01:00
</p>
</details>
{% endblock %}