mirror of
https://gitea.tendokyu.moe/eamuse/docs.git
synced 2025-01-18 22:24:05 +01:00
1026 lines
31 KiB
HTML
1026 lines
31 KiB
HTML
{% extends "konami.html" %}
|
|
{% block title %}Packet format{% endblock %}
|
|
{% block body %}
|
|
<h1>Packet format</h1>
|
|
|
|
<p>e-Amusement uses XML for its application layer payloads. This XML is either verbatim, or in a custom packed binary
|
|
format.</p>
|
|
|
|
|
|
<h2 id="xml">The XML format</h2>
|
|
|
|
<p>Each tag that contains a value has a <code>__type</code> attribute that identifies what type it is. Array types
|
|
have a <code>__count</code> attribute indicating how many items are in the array. Binary blobs additionally have
|
|
a <code>__size</code> attribute indicating their length (this is notably not present on strings, however).</p>
|
|
<p>It is perhaps simpler to illustrate with an example, so:</p>
|
|
<pre>{% highlight 'xml' %}
|
|
<?xml version='1.0' encoding='UTF-8'?>
|
|
<call model="KFC:J:A:A:2019020600" srcid="1000" tag="b0312077">
|
|
<eventlog method="write">
|
|
<retrycnt __type="u32" />
|
|
<data>
|
|
<eventid __type="str">G_CARDED</eventid>
|
|
<eventorder __type="s32">5</eventorder>
|
|
<pcbtime __type="u64">1639669516779</pcbtime>
|
|
<gamesession __type="s64">1</gamesession>
|
|
<strdata1 __type="str" />
|
|
<strdata2 __type="str" />
|
|
<numdata1 __type="s64">1</numdata1>
|
|
<numdata2 __type="s64" />
|
|
<locationid __type="str">ea</locationid>
|
|
</data>
|
|
</eventlog>
|
|
</call>
|
|
{% endhighlight %}</pre>
|
|
<p>Arrays are encoded by concatenating every value together, with spaces between them. Data types that have multiple
|
|
values, are serialized similarly.</p>
|
|
<p>Therefore, an element storing an array of <code>3u8</code> (<code>[(1, 2, 3), (4, 5, 6)]</code>) would look like
|
|
this</p>
|
|
<pre>{% highlight 'xml' %}
|
|
<demo __type="3u8" __count="2">1 2 3 4 5 6</demo>
|
|
{% endhighlight %}</pre>
|
|
<p>Besides this, this is otherwise a rather standard XML.</p>
|
|
|
|
<h2 id="binary">Packed binary overview</h2>
|
|
|
|
<p>Many packets, rather than using a string-based XML format, use a custom binary packed format instead. While it
|
|
can be a little confusing, remembering that this is encoding an XML tree can make it easier to parse.</p>
|
|
<p>To start with, let's take a look at the overall structure of the packets.</p>
|
|
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>1</td>
|
|
<td>2</td>
|
|
<td>3</td>
|
|
<td>4</td>
|
|
<td>5</td>
|
|
<td>6</td>
|
|
<td>7</td>
|
|
<td>8</td>
|
|
<td>9</td>
|
|
<td>10</td>
|
|
<td>11</td>
|
|
<td>12</td>
|
|
<td>13</td>
|
|
<td>14</td>
|
|
<td>15</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td><i>A0</i></td>
|
|
<td>C</td>
|
|
<td>E</td>
|
|
<td>~E</td>
|
|
<td colspan="4">Head length</td>
|
|
<td style="border-bottom: none" colspan="8"></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none; border-bottom: none;" colspan="16">Schema definition</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none;" colspan="12"></td>
|
|
<td colspan="1"><i>FF</i></td>
|
|
<td colspan="3">Align</td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="4">Data length</td>
|
|
<td style="border-bottom: none" colspan="12"></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none; border-bottom: none;" colspan="16">Payload</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none;" colspan="13"></td>
|
|
<td colspan="3">Align</td>
|
|
</tr>
|
|
</table>
|
|
<p>Every packet starts with the magic byte <code>0xA0</code>. Following this is the content byte, the encoding byte,
|
|
and then the 2's compliment of the encoding byte.</p>
|
|
<p>Possible values for the content byte are:</p>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<td>C</td>
|
|
<td>Content</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td><code>0x42</code></td>
|
|
<td>Packed names, contains data</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x43</code></td>
|
|
<td>Packed names, schema only</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x45</code></td>
|
|
<td>Full names, contains data</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x46</code></td>
|
|
<td>Full names, schema only</td>
|
|
</tr>
|
|
</table>
|
|
<details>
|
|
<summary>Source code details</summary>
|
|
<p>Not totally cleaned these up yet, but the general concept of how packets are parsed can be seen fairly clearly.
|
|
At a high level, we have a single function that validates the header, parses out the schema, then goes to read
|
|
the body of the packet, if we're expecting it. The arguments to <code>parse_packet_header</code> will make more
|
|
sense in a moment.</p>
|
|
<figure>
|
|
<img src="./images/parse_packet.png" />
|
|
<figcaption><code>libavs-win32.dll:0x1003483</code></figcaption>
|
|
</figure>
|
|
<p><code>parse_packet_header</code> has a lot of things going on, so I'm just pulling out a few important snippets
|
|
here.</p>
|
|
<figure>
|
|
<img src="./images/parse_packet_header_a.png" /><br>
|
|
<img src="./images/parse_packet_header_b.png" /><br>
|
|
<img src="./images/parse_packet_header_c.png" />
|
|
<figcaption><code>libavs-win32.dll:0x1003448c</code></figcaption>
|
|
</figure>
|
|
<p>We first read out four bytes from the start of the packet, and convert that to an integer; nothing especially
|
|
magic here. The next block however is potentially not the first that you might have expected to see. Based on
|
|
the two flags passed into the function arguments, we are going to subtract a value from this header.
|
|
Specifically, the first byte we subtract is always <code>0xa0</code>, then the second byte are those
|
|
<code>C</code> value in the table above.
|
|
</p>
|
|
<p>Finally, we mask out the first two bytes, and assert that they're both null. That is, they are exactly equal to
|
|
the value we subtracted from them. Of note here is that the caller to this function "decides" what sort of
|
|
packet it is expecting.</p>
|
|
<p>We can also see the check for <code>~E</code> here. If that check passes, we return the <code>E</code> byte,
|
|
otherwise we're going to error.</p>
|
|
</details>
|
|
|
|
<p>The encoding flag indicates the encoding for all string types in the packet (more on those later). Possible
|
|
values are:</p>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<td><code>E</code></td>
|
|
<td><code>~E</code></td>
|
|
<td colspan="3">Encoding name</td>
|
|
</tr>
|
|
</thead>
|
|
|
|
<tr>
|
|
<td><code>0x00</code></td>
|
|
<td><code>0xFF</code></td>
|
|
<td>None</td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x20</code></td>
|
|
<td><code>0xDF</code></td>
|
|
<td><code>ASCII</code></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x40</code></td>
|
|
<td><code>0xBF</code></td>
|
|
<td><code>ISO-8859-1</code></td>
|
|
<td><code>ISO_8859-1</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x60</code></td>
|
|
<td><code>0x9F</code></td>
|
|
<td><code>EUC-JP</code></td>
|
|
<td><code>EUCJP</code></td>
|
|
<td><code>EUC_JP</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0x80</code></td>
|
|
<td><code>0x7F</code></td>
|
|
<td><code>SHIFT-JIS</code></td>
|
|
<td><code>SHIFT_JIS</code></td>
|
|
<td><code>SJIS</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>0xA0</code></td>
|
|
<td><code>0x5F</code></td>
|
|
<td><code>UTF-8</code></td>
|
|
<td><code>UTF8</code></td>
|
|
<td></td>
|
|
</tr>
|
|
</table>
|
|
<p>Data is assumed by default to be in ISO 8859 encoding. That is, for encodings <code>0x00</code> and
|
|
<code>0x40</code>, no transformation is performed on the binary data to produce readable text.
|
|
</p>
|
|
<p>ASCII encoding is true 7-bit ASCII, with the 8th bit always set to 0. This is validated.</p>
|
|
<details>
|
|
<summary>Source code details</summary>
|
|
<p>The full table for these values can be found in libavs.</p>
|
|
<figure>
|
|
<img src="./images/encoding_table.png">
|
|
<figcaption><code>libavs-win32.dll:0x1006b960</code></figcaption>
|
|
</figure>
|
|
<p>A second table exists just before this on in the source, responsible for the
|
|
<code><?xml version='1.0' encoding='??'?></code> line in XML files.
|
|
</p>
|
|
<figure>
|
|
<img src="./images/xml_encoding_table.png">
|
|
<figcaption><code>libavs-win32.dll:0x1006b940</code></figcaption>
|
|
</figure>
|
|
<p>This is indexed using the following function, which maps the above encoding IDs to 1, 2, 3, 4 and 5
|
|
respectively.</p>
|
|
<pre>{% highlight "c" %}char* xml_get_encoding_name(uint encoding_id) {
|
|
return ENCODING_NAME_TABLE[((encoding_id & 0xe0) >> 5) * 4];
|
|
}{% endhighlight %}</pre>
|
|
</details>
|
|
<p>While validating <code>~E</code> isn't technically required, it acts as a useful assertion that the packet being
|
|
parsed is valid.</p>
|
|
|
|
<h2 id="schema">The packet schema header</h2>
|
|
<p>Following the 4 byte header, is a 4 byte integer containing the length of the next part of the header (this is
|
|
technically made redundant as this structure is also terminated).</p>
|
|
<p>This part of the header defines the schema that the main payload uses.</p>
|
|
|
|
<p>A tag definition follows one of the following three formats:</p>
|
|
<ul>
|
|
<li>
|
|
<p>Compressed names:</p>
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>1</td>
|
|
<td>2</td>
|
|
<td>3</td>
|
|
<td>4</td>
|
|
<td>5</td>
|
|
<td>6</td>
|
|
<td>7</td>
|
|
<td>8</td>
|
|
<td>9</td>
|
|
<td>10</td>
|
|
<td>11</td>
|
|
<td>12</td>
|
|
<td>13</td>
|
|
<td>14</td>
|
|
<td>15</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td>Type</td>
|
|
<td>nlen</td>
|
|
<td colspan="7">Tag name</td>
|
|
<td style="border-bottom: none" colspan="7"></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none;" colspan="15">Attributes and children</td>
|
|
<td colspan="1"><i>FE</i></td>
|
|
</tr>
|
|
</table>
|
|
</li>
|
|
<li>
|
|
<p>Full names, short length:</p>
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>1</td>
|
|
<td>2</td>
|
|
<td>3</td>
|
|
<td>4</td>
|
|
<td>5</td>
|
|
<td>6</td>
|
|
<td>7</td>
|
|
<td>8</td>
|
|
<td>9</td>
|
|
<td>10</td>
|
|
<td>11</td>
|
|
<td>12</td>
|
|
<td>13</td>
|
|
<td>14</td>
|
|
<td>15</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td>Type</td>
|
|
<td>0x40-0x64</td>
|
|
<td colspan="7">Tag name</td>
|
|
<td style="border-bottom: none" colspan="7"></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none;" colspan="15">Attributes and children</td>
|
|
<td colspan="1"><i>FE</i></td>
|
|
</tr>
|
|
</table>
|
|
</li>
|
|
<li>
|
|
<p>Full names, long length:</p>
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>1</td>
|
|
<td>2</td>
|
|
<td>3</td>
|
|
<td>4</td>
|
|
<td>5</td>
|
|
<td>6</td>
|
|
<td>7</td>
|
|
<td>8</td>
|
|
<td>9</td>
|
|
<td>10</td>
|
|
<td>11</td>
|
|
<td>12</td>
|
|
<td>13</td>
|
|
<td>14</td>
|
|
<td>15</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td>Type</td>
|
|
<td>0x80-0x8f</td>
|
|
<td>0x00-0xff</td>
|
|
<td colspan="7">Tag name</td>
|
|
<td style="border-bottom: none" colspan="6"></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border-top: none;" colspan="15">Attributes and children</td>
|
|
<td colspan="1"><i>FE</i></td>
|
|
</tr>
|
|
</table>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The encoding of structure names varies depending on the packet content byte. If the content flag indicated we have a
|
|
full string, we first need to check if the value of the first byte exceeds <code>0x7f</code>. If it does, we need to
|
|
read an additional byte. In the single byte case, we subtract <code>0x3f</code><sup>1</sup> to get our real length.
|
|
In the two byte case we subtract <code>0x7fbf</code><sup>2</sup>. In the latter case, the maximum allowed length is
|
|
<code>0x1000</code>.<br>
|
|
<small><sup>1</sup> simplified from <code>(length & ~0x40) + 0x01</code></small><br>
|
|
<small><sup>2</sup> simplified from <code>(length & ~0x8000) + 0x41</code></small>
|
|
</p>
|
|
<p>If we are instead parsing packed names, then the names are encoded as densely packed 6 bit values. The length prefix
|
|
(<code>nlen</code>) determines the length of the final unpacked string. The acceptable alphabet is
|
|
<code>0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</code>, and the packed values are indecies
|
|
within this alphabet. The maximum length for a name in this mode is 36 bytes (<code>0x24</code>).
|
|
</p>
|
|
|
|
<p>The children can be a combination of either attribute names, or child tags. Attribute names are represented by
|
|
the byte <code>0x2E</code> followed by a length prefixed name as defined above. Child tags follow the above
|
|
format. Type <code>0x2E</code> must therefore be considered reserved as a possible structure type. As they carry
|
|
special meaning in text-bsaed XML encoding, attribute names beginning with <code>__</code> are disallowed.</p>
|
|
|
|
<details>
|
|
<summary>Source code details</summary>
|
|
<p>I'm not going to labour this one, so if you want to go look yourself:</p>
|
|
<ul>
|
|
<li>6-packed name reader: <code>libavs-win32.dll:0x10009f90</code></li>
|
|
<li>Unpacked name reader: <code>libavs-win32.dll:0x1000a110</code></li>
|
|
<li>The call to the above: <code>libavs-win32.dll:0x10034a57</code>, with the <code>__</code> checking starting
|
|
at <code>libavs-win32:0x10034cfd</code> for attributes (i.e. the <code>JZ</code> at <code>0x10034a7c</code>)
|
|
</li>
|
|
</ul>
|
|
</details>
|
|
|
|
<p>Attributes (type <code>0x2E</code>) represent a string attribute. Any other attribute must be defined as a child
|
|
tag. Is it notable that 0 children is allowable, which is how the majority of values are encoded.</p>
|
|
|
|
<p>All valid IDs, and their respective type, are listed in the following table. The bucket column here will be
|
|
used later when unpacking the main data, so we need not worry about it for now, but be warned it exists and is
|
|
possibly the least fun part of this format.</p>
|
|
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>ID</td>
|
|
<td>Bytes</td>
|
|
<td>C type</td>
|
|
<td>Bucket</td>
|
|
<td colspan="2">XML names</td>
|
|
<td></td>
|
|
<td>ID</td>
|
|
<td>Bytes</td>
|
|
<td>C type</td>
|
|
<td>Bucket</td>
|
|
<td colspan="2">XML names</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td>0x01</td>
|
|
<td>0</td>
|
|
<td>void</td>
|
|
<td>-</td>
|
|
<td>void</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x21</td>
|
|
<td>24</td>
|
|
<td>uint64[3]</td>
|
|
<td>int</td>
|
|
<td>3u64</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x02</td>
|
|
<td>1</td>
|
|
<td>int8</td>
|
|
<td>byte</td>
|
|
<td>s8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x22</td>
|
|
<td>12</td>
|
|
<td>float[3]</td>
|
|
<td>int</td>
|
|
<td>3f</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x03</td>
|
|
<td>1</td>
|
|
<td>uint8</td>
|
|
<td>byte</td>
|
|
<td>u8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x23</td>
|
|
<td>24</td>
|
|
<td>double[3]</td>
|
|
<td>int</td>
|
|
<td>3d</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x04</td>
|
|
<td>2</td>
|
|
<td>int16</td>
|
|
<td>short</td>
|
|
<td>s16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x24</td>
|
|
<td>4</td>
|
|
<td>int8[4]</td>
|
|
<td>int</td>
|
|
<td>4s8</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x05</td>
|
|
<td>2</td>
|
|
<td>uint16</td>
|
|
<td>short</td>
|
|
<td>u16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x25</td>
|
|
<td>4</td>
|
|
<td>uint8[4]</td>
|
|
<td>int</td>
|
|
<td>4u8</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x06</td>
|
|
<td>4</td>
|
|
<td>int32</td>
|
|
<td>int</td>
|
|
<td>s32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x26</td>
|
|
<td>8</td>
|
|
<td>int16[4]</td>
|
|
<td>int</td>
|
|
<td>4s16</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x07</td>
|
|
<td>4</td>
|
|
<td>uint32</td>
|
|
<td>int</td>
|
|
<td>u32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x27</td>
|
|
<td>8</td>
|
|
<td>uint8[4]</td>
|
|
<td>int</td>
|
|
<td>4u16</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x08</td>
|
|
<td>8</td>
|
|
<td>int64</td>
|
|
<td>int</td>
|
|
<td>s64</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x28</td>
|
|
<td>16</td>
|
|
<td>int32[4]</td>
|
|
<td>int</td>
|
|
<td>4s32</td>
|
|
<td>vs32</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x09</td>
|
|
<td>8</td>
|
|
<td>uint64</td>
|
|
<td>int</td>
|
|
<td>u64</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x29</td>
|
|
<td>16</td>
|
|
<td>uint32[4]</td>
|
|
<td>int</td>
|
|
<td>4u32</td>
|
|
<td>vs32</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0a</td>
|
|
<td><i>prefix</i></td>
|
|
<td>char[]</td>
|
|
<td>int</td>
|
|
<td>bin</td>
|
|
<td>binary</td>
|
|
<td></td>
|
|
<td>0x2a</td>
|
|
<td>32</td>
|
|
<td>int64[4]</td>
|
|
<td>int</td>
|
|
<td>4s64</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0b</td>
|
|
<td><i>prefix</i></td>
|
|
<td>char[]</td>
|
|
<td>int</td>
|
|
<td>str</td>
|
|
<td>string</td>
|
|
<td></td>
|
|
<td>0x2b</td>
|
|
<td>32</td>
|
|
<td>uint64[4]</td>
|
|
<td>int</td>
|
|
<td>4u64</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0c</td>
|
|
<td>4</td>
|
|
<td>uint8[4]</td>
|
|
<td>int</td>
|
|
<td>ip4</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x2c</td>
|
|
<td>16</td>
|
|
<td>float[4]</td>
|
|
<td>int</td>
|
|
<td>4f</td>
|
|
<td>vf</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0d</td>
|
|
<td>4</td>
|
|
<td>uint32</td>
|
|
<td>int</td>
|
|
<td>time</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x2d</td>
|
|
<td>32</td>
|
|
<td>double[4]</td>
|
|
<td>int</td>
|
|
<td>4d</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0e</td>
|
|
<td>4</td>
|
|
<td>float</td>
|
|
<td>int</td>
|
|
<td>float</td>
|
|
<td>f</td>
|
|
<td></td>
|
|
<td>0x2e</td>
|
|
<td><i>prefix</i></td>
|
|
<td>char[]</td>
|
|
<td>int</td>
|
|
<td>attr</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x0f</td>
|
|
<td>8</td>
|
|
<td>double</td>
|
|
<td>int</td>
|
|
<td>double</td>
|
|
<td>d</td>
|
|
<td></td>
|
|
<td>0x2f</td>
|
|
<td>0</td>
|
|
<td></td>
|
|
<td>-</td>
|
|
<td>array</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x10</td>
|
|
<td>2</td>
|
|
<td>int8[2]</td>
|
|
<td>short</td>
|
|
<td>2s8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x30</td>
|
|
<td>16</td>
|
|
<td>int8[16]</td>
|
|
<td>int</td>
|
|
<td>vs8</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x11</td>
|
|
<td>2</td>
|
|
<td>uint8[2]</td>
|
|
<td>short</td>
|
|
<td>2u8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x31</td>
|
|
<td>16</td>
|
|
<td>uint8[16]</td>
|
|
<td>int</td>
|
|
<td>vu8</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x12</td>
|
|
<td>4</td>
|
|
<td>int16[2]</td>
|
|
<td>int</td>
|
|
<td>2s16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x32</td>
|
|
<td>16</td>
|
|
<td>int8[8]</td>
|
|
<td>int</td>
|
|
<td>vs16</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x13</td>
|
|
<td>4</td>
|
|
<td>uint16[2]</td>
|
|
<td>int</td>
|
|
<td>2u16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x33</td>
|
|
<td>16</td>
|
|
<td>uint8[8]</td>
|
|
<td>int</td>
|
|
<td>vu16</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x14</td>
|
|
<td>8</td>
|
|
<td>int32[2]</td>
|
|
<td>int</td>
|
|
<td>2s32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x34</td>
|
|
<td>1</td>
|
|
<td>bool</td>
|
|
<td>byte</td>
|
|
<td>bool</td>
|
|
<td>b</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x15</td>
|
|
<td>8</td>
|
|
<td>uint32[2]</td>
|
|
<td>int</td>
|
|
<td>2u32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x35</td>
|
|
<td>2</td>
|
|
<td>bool[2]</td>
|
|
<td>short</td>
|
|
<td>2b</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x16</td>
|
|
<td>16</td>
|
|
<td>int16[2]</td>
|
|
<td>int</td>
|
|
<td>2s64</td>
|
|
<td>vs64</td>
|
|
<td></td>
|
|
<td>0x36</td>
|
|
<td>3</td>
|
|
<td>bool[3]</td>
|
|
<td>int</td>
|
|
<td>3b</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x17</td>
|
|
<td>16</td>
|
|
<td>uint16[2]</td>
|
|
<td>int</td>
|
|
<td>2u64</td>
|
|
<td>vu64</td>
|
|
<td></td>
|
|
<td>0x37</td>
|
|
<td>4</td>
|
|
<td>bool[4]</td>
|
|
<td>int</td>
|
|
<td>4b</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x18</td>
|
|
<td>8</td>
|
|
<td>float[2]</td>
|
|
<td>int</td>
|
|
<td>2f</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x38</td>
|
|
<td>16</td>
|
|
<td>bool[16]</td>
|
|
<td>int</td>
|
|
<td>vb</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x19</td>
|
|
<td>16</td>
|
|
<td>double[2]</td>
|
|
<td>int</td>
|
|
<td>2d</td>
|
|
<td>vd</td>
|
|
<td></td>
|
|
<td>0x38</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1a</td>
|
|
<td>3</td>
|
|
<td>int8[3]</td>
|
|
<td>int</td>
|
|
<td>3s8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x39</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1b</td>
|
|
<td>3</td>
|
|
<td>uint8[3]</td>
|
|
<td>int</td>
|
|
<td>3u8</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3a</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1c</td>
|
|
<td>6</td>
|
|
<td>int16[3]</td>
|
|
<td>int</td>
|
|
<td>3s16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3b</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1d</td>
|
|
<td>6</td>
|
|
<td>uint16[3]</td>
|
|
<td>int</td>
|
|
<td>3u16</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3c</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1e</td>
|
|
<td>12</td>
|
|
<td>int32[3]</td>
|
|
<td>int</td>
|
|
<td>3s32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3d</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1f</td>
|
|
<td>12</td>
|
|
<td>uint32[3]</td>
|
|
<td>int</td>
|
|
<td>3u32</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3e</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x20</td>
|
|
<td>24</td>
|
|
<td>int64[3]</td>
|
|
<td>int</td>
|
|
<td>3s64</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td>0x3f</td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Strings should be encoded and decoded according to the encoding specified in the packet header. Null termination is
|
|
optional, however should be stripped during decoding.</p>
|
|
<p>All of these IDs are <code>& 0x3F</code>. Any value can be turned into an array by setting the 7<sup>th</sup> bit
|
|
high (<code>| 0x40</code>). Arrays of this form, in the data section, will be an aligned <code>size: u32</code>
|
|
immediately followed by <code>size</code> bytes' worth of (unaligned!) values of the unmasked type. Despite being a
|
|
<code>u32</code>, the maximum length allowed is <code>0xffffff</code>.
|
|
</p>
|
|
|
|
<details>
|
|
<summary>Source code details</summary>
|
|
<p>The full table for these values can be found in libavs. This table contains the names of every tag, along
|
|
with additional information such as how many bytes that data type requires, and which parsing function
|
|
should be used.</p>
|
|
<figure>
|
|
<img src="./images/types_table.png">
|
|
<figcaption><code>libavs-win32.dll:0x100782a8</code></figcaption>
|
|
</figure>
|
|
</details>
|
|
<details>
|
|
<summary>Note about the <code>array</code> type:</summary>
|
|
<p>While I'm not totally sure, I have a suspicion this type is used internally as a pseudo-type. Trying to
|
|
identify its function as a parsable type has some obvious blockers:</p>
|
|
|
|
<p>All of the types have convenient <code>printf</code>-using helper functions that are used to emit them when
|
|
serializing XML. All except one.</p>
|
|
<img src="./images/no_array.png">
|
|
<p>If we have a look inside the function that populates node sizes (<code>libavs-win32.dll:0x1000cf00</code>),
|
|
it has an explicit case, however is the same fallback as the default case.</p>
|
|
<img src="./images/no_array_2.png">
|
|
|
|
<p>In the same function, however, we can find a second (technically first) check for the array type.</p>
|
|
<img src="./images/yes_array.png">
|
|
<p>This seems to suggest that internally arrays are represented as a normal node, with the <code>array</code>
|
|
type, however when serializing it's converted into the array types we're used to (well, will be after the
|
|
next sections) by masking <code>0x40</code> onto the contained type.</p>
|
|
<p>Also of interest from this snippet is the fact that <code>void</code>, <code>bin</code>, <code>str</code>,
|
|
and <code>attr</code> cannot be arrays. <code>void</code> and <code>attr</code> make sense, however
|
|
<code>str</code> and <code>bin</code> are more interesting. I suspect this is because konami want to be able
|
|
to preallocate the memory, which wouldn't be possible with these variable length structures.
|
|
</p>
|
|
</details>
|
|
|
|
<h2 id="data">The data section</h2>
|
|
|
|
<p>This is where all the actual packet data is. For the most part, parsing this is the easy part. We traverse our
|
|
schema, and read values out of the packet according to the value indicated in the schema. Unfortunately, konami
|
|
decided all data should be aligned very specifically, and that gaps left during alignment should be backfilled
|
|
later. This makes both reading and writing somewhat more complicated, however the system can be fairly easily
|
|
understood.</p>
|
|
<p>Firstly, we divide the payload up into 4 byte chunks. Each chunk can be allocated to either store individual
|
|
bytes, shorts, or ints (these are the buckets in the table above). When reading or writing a value, we first
|
|
check if a chunk allocated to the desired type's bucket is available and has free/as-yet-unread space within it.
|
|
If so, we will store/read our data to/from there. If there is no such chunk, we claim the next unclaimed chunk
|
|
for our bucket.</p>
|
|
<p>For example, imagine we write the sequence <code>byte, int, byte, short, byte, int, short</code>. The final output
|
|
should look like:</p>
|
|
|
|
<table class="code">
|
|
<thead>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>1</td>
|
|
<td>2</td>
|
|
<td>3</td>
|
|
<td>4</td>
|
|
<td>5</td>
|
|
<td>6</td>
|
|
<td>7</td>
|
|
<td>8</td>
|
|
<td>9</td>
|
|
<td>10</td>
|
|
<td>11</td>
|
|
<td>12</td>
|
|
<td>13</td>
|
|
<td>14</td>
|
|
<td>15</td>
|
|
</tr>
|
|
</thead>
|
|
<tr>
|
|
<td>byte</td>
|
|
<td>byte</td>
|
|
<td>byte</td>
|
|
<td></td>
|
|
<td colspan="4">int</td>
|
|
<td colspan="2">short</td>
|
|
<td colspan="2">short</td>
|
|
<td colspan="4">int</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>While this might seem a silly system compared to just not aligning values, it is at least possible to intuit that it
|
|
helps reduce wasted space. It should be noted that any variable-length structure, such as a string or an array,
|
|
claims all chunks it encroaches on for the <code>int</code> bucket, disallowing the storage of bytes or shorts
|
|
within them.</p>
|
|
|
|
<details>
|
|
<summary>Implementing a packer</summary>
|
|
<p>While the intuitive way to understand the packing algorithm is via chunks and buckets, a far more efficient
|
|
implementation can be made that uses three pointers. Rather than try to explain in words, hopefully this python
|
|
implementation should suffice as explanation:
|
|
<pre>{% highlight "python" %}class Packer:
|
|
def __init__(self, offset=0):
|
|
self._word_cursor = offset
|
|
self._short_cursor = offset
|
|
self._byte_cursor = offset
|
|
self._boundary = offset % 4
|
|
|
|
def _next_block(self):
|
|
self._word_cursor += 4
|
|
return self._word_cursor - 4
|
|
|
|
def request_allocation(self, size):
|
|
if size == 0:
|
|
return self._word_cursor
|
|
elif size == 1:
|
|
if self._byte_cursor % 4 == self._boundary:
|
|
self._byte_cursor = self._next_block() + 1
|
|
else:
|
|
self._byte_cursor += 1
|
|
return self._byte_cursor - 1
|
|
elif size == 2:
|
|
if self._short_cursor % 4 == self._boundary:
|
|
self._short_cursor = self._next_block() + 2
|
|
else:
|
|
self._short_cursor += 2
|
|
return self._short_cursor - 2
|
|
else:
|
|
old_cursor = self._word_cursor
|
|
for _ in range(math.ceil(size / 4)):
|
|
self._word_cursor += 4
|
|
return old_cursor
|
|
|
|
def notify_skipped(self, no_bytes):
|
|
for _ in range(math.ceil(no_bytes / 4)):
|
|
self.request_allocation(4){% endhighlight %}</pre>
|
|
</p>
|
|
</details>
|
|
{% endblock %} |