Lots of stuff about names

2024-11-23 22:40:57 +01:00 · 2022-01-17 19:40:43 +00:00 · 2022-01-17 19:40:43 +00:00 · 34b4cfc27b
commit 34b4cfc27b
parent 6bf6dd92ad
7 changed files with 179 additions and 48 deletions
--- a/images/parse_packet.png
+++ b/images/parse_packet.png
--- a/images/parse_packet_header_a.png
+++ b/images/parse_packet_header_a.png
--- a/images/parse_packet_header_b.png
+++ b/images/parse_packet_header_b.png
--- a/images/parse_packet_header_c.png
+++ b/images/parse_packet_header_c.png
--- a/styles.css
+++ b/styles.css
@ -96,16 +96,21 @@ pre {
 summary {
    user-select: none;
    cursor: pointer;
+    color: #c7254e;
 }

 details {
-    background: lightblue;
-    border: 1px solid cornflowerblue;
-    padding: 4px;
+    background: #f9f2f4;
+    border: 1px solid #c7b3b8;
+    border-radius: 2px;
+    padding: 4px 8px;
    margin: 4px 0;
    overflow-x: auto;
    max-width: 100%;
 }
+details code {
+    background: #fff;
+}

 table.nav {
    padding-right: 1px;
--- a/templates/base.html
+++ b/templates/base.html
@ -7,7 +7,7 @@
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{% block title %}{% endblock %}{% if self.title() %} | {% endif %}e-Amusement API</title>

-    <link rel="stylesheet" href="{{ROOT}}/styles.css?ver=3">
+    <link rel="stylesheet" href="{{ROOT}}/styles.css?ver=4">
    <link rel="stylesheet" href="{{ROOT}}/tango.css">

    <script async src="https://www.googletagmanager.com/gtag/js?id=G-LG6C6HT317"></script>
--- a/templates/pages/packet.html
+++ b/templates/pages/packet.html
@ -98,7 +98,7 @@
 </table>
 <p>Every packet starts with the magic byte <code>0xA0</code>. Following this is the content byte, the encoding byte,
    and then the 2's compliment of the encoding byte.</p>
-<p>Currently known possible values for the content byte are:</p>
+<p>Possible values for the content byte are:</p>
 <table>
    <thead>
        <tr>
@ -123,7 +123,37 @@
        <td>Full names, schema only</td>
    </tr>
 </table>
-<p><small><i>I haven't seen <code>0x44</code>, so no idea what that one does, before you ask.</i></small></p>
+<details>
+    <summary>Source code details</summary>
+    <p>Not totally cleaned these up yet, but the general concept of how packets are parsed can be seen fairly clearly.
+        At a high level, we have a single function that validates the header, parses out the schema, then goes to read
+        the body of the packet, if we're expecting it. The arguments to <code>parse_packet_header</code> will make more
+        sense in a moment.</p>
+    <figure>
+        <img src="./images/parse_packet.png" />
+        <figcaption><code>libavs-win32.dll:0x1003483</code></figcaption>
+    </figure>
+    <p><code>parse_packet_header</code> has a lot of things going on, so I'm just pulling out a few important snippets
+        here.</p>
+    <figure>
+        <img src="./images/parse_packet_header_a.png" /><br>
+        <img src="./images/parse_packet_header_b.png" /><br>
+        <img src="./images/parse_packet_header_c.png" />
+        <figcaption><code>libavs-win32.dll:0x1003448c</code></figcaption>
+    </figure>
+    <p>We first read out four bytes from the start of the packet, and convert that to an integer; nothing especially
+        magic here. The next block however is potentially not the first that you might have expected to see. Based on
+        the two flags passed into the function arguments, we are going to subtract a value from this header.
+        Specifically, the first byte we subtract is always <code>0xa0</code>, then the second byte are those
+        <code>C</code> value in the table above.
+    </p>
+    <p>Finally, we mask out the first two bytes, and assert that they're both null. That is, they are exactly equal to
+        the value we subtracted from them. Of note here is that the caller to this function "decides" what sort of
+        packet it is expecting.</p>
+    <p>We can also see the check for <code>~E</code> here. If that check passes, we return the <code>E</code> byte,
+        otherwise we're going to error.</p>
+</details>
+
 <p>The encoding flag indicates the encoding for all string types in the packet (more on those later). Possible
    values are:</p>
 <table>
@ -179,7 +209,8 @@
    </tr>
 </table>
 <p>Data is assumed by default to be in ISO 8859 encoding. That is, for encodings <code>0x00</code> and
-    <code>0x40</code>, no transformation is performed on the binary data to produce readable text.</p>
+    <code>0x40</code>, no transformation is performed on the binary data to produce readable text.
+</p>
 <p>ASCII encoding is true 7-bit ASCII, with the 8th bit always set to 0. This is validated.</p>
 <details>
    <summary>Source code details</summary>
@ -209,57 +240,150 @@
    technically made redundant as this structure is also terminated).</p>
 <p>This part of the header defines the schema that the main payload uses.</p>

-<p>A tag definition looks like:</p>
+<p>A tag definition follows one of the following three formats:</p>
+<ul>
+    <li>
+        <p>Compressed names:</p>
+        <table class="code">
+            <thead>
+                <tr>
+                    <td>0</td>
+                    <td>1</td>
+                    <td>2</td>
+                    <td>3</td>
+                    <td>4</td>
+                    <td>5</td>
+                    <td>6</td>
+                    <td>7</td>
+                    <td>8</td>
+                    <td>9</td>
+                    <td>10</td>
+                    <td>11</td>
+                    <td>12</td>
+                    <td>13</td>
+                    <td>14</td>
+                    <td>15</td>
+                </tr>
+            </thead>
+            <tr>
+                <td>Type</td>
+                <td>nlen</td>
+                <td colspan="7">Tag name</td>
+                <td style="border-bottom: none" colspan="7"></td>
+            </tr>
+            <tr>
+                <td style="border-top: none;" colspan="15">Attributes and children</td>
+                <td colspan="1"><i>FE</i></td>
+            </tr>
+        </table>
+    </li>
+    <li>
+        <p>Full names, short length:</p>
+        <table class="code">
+            <thead>
+                <tr>
+                    <td>0</td>
+                    <td>1</td>
+                    <td>2</td>
+                    <td>3</td>
+                    <td>4</td>
+                    <td>5</td>
+                    <td>6</td>
+                    <td>7</td>
+                    <td>8</td>
+                    <td>9</td>
+                    <td>10</td>
+                    <td>11</td>
+                    <td>12</td>
+                    <td>13</td>
+                    <td>14</td>
+                    <td>15</td>
+                </tr>
+            </thead>
+            <tr>
+                <td>Type</td>
+                <td>0x40-0x64</td>
+                <td colspan="7">Tag name</td>
+                <td style="border-bottom: none" colspan="7"></td>
+            </tr>
+            <tr>
+                <td style="border-top: none;" colspan="15">Attributes and children</td>
+                <td colspan="1"><i>FE</i></td>
+            </tr>
+        </table>
+    </li>
+    <li>
+        <p>Full names, long length:</p>
+        <table class="code">
+            <thead>
+                <tr>
+                    <td>0</td>
+                    <td>1</td>
+                    <td>2</td>
+                    <td>3</td>
+                    <td>4</td>
+                    <td>5</td>
+                    <td>6</td>
+                    <td>7</td>
+                    <td>8</td>
+                    <td>9</td>
+                    <td>10</td>
+                    <td>11</td>
+                    <td>12</td>
+                    <td>13</td>
+                    <td>14</td>
+                    <td>15</td>
+                </tr>
+            </thead>
+            <tr>
+                <td>Type</td>
+                <td>0x80-0x8f</td>
+                <td>0x00-0xff</td>
+                <td colspan="7">Tag name</td>
+                <td style="border-bottom: none" colspan="6"></td>
+            </tr>
+            <tr>
+                <td style="border-top: none;" colspan="15">Attributes and children</td>
+                <td colspan="1"><i>FE</i></td>
+            </tr>
+        </table>
+    </li>
+</ul>

-<table class="code">
-    <thead>
-        <tr>
-            <td>0</td>
-            <td>1</td>
-            <td>2</td>
-            <td>3</td>
-            <td>4</td>
-            <td>5</td>
-            <td>6</td>
-            <td>7</td>
-            <td>8</td>
-            <td>9</td>
-            <td>10</td>
-            <td>11</td>
-            <td>12</td>
-            <td>13</td>
-            <td>14</td>
-            <td>15</td>
-        </tr>
-    </thead>
-    <tr>
-        <td>Type</td>
-        <td>nlen</td>
-        <td colspan="7">Tag name</td>
-        <td style="border-bottom: none" colspan="8"></td>
-    </tr>
-    <tr>
-        <td style="border-top: none;" colspan="15">Attributes and children</td>
-        <td colspan="1"><i>FE</i></td>
-    </tr>
-</table>
-
-<p>The encoding of structure names varies depending on the packet content byte. If the content flag indicates we have
-    full names, then <code>nlen</code> will be masked with <code>0x40</code>. The string length is the unmasked value,
-    +1 (0-length names make no sense anyway). We can then read off the correct number of bytes, and decode accordingly.
+<p>The encoding of structure names varies depending on the packet content byte. If the content flag indicated we have a
+    full string, we first need to check if the value of the first byte exceeds <code>0x7f</code>. If it does, we need to
+    read an additional byte. In the single byte case, we subtract <code>0x3f</code><sup>1</sup> to get our real length.
+    In the two byte case we subtract <code>0x7fbf</code><sup>2</sup>. In the latter case, the maximum allowed length is
+    <code>0x1000</code>.<br>
+    <small><sup>1</sup> simplified from <code>(length & ~0x40) + 0x01</code></small><br>
+    <small><sup>2</sup> simplified from <code>(length & ~0x8000) + 0x41</code></small>
 </p>
 <p>If we are instead parsing packed names, then the names are encoded as densely packed 6 bit values. The length prefix
    (<code>nlen</code>) determines the length of the final unpacked string. The acceptable alphabet is
    <code>0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz</code>, and the packed values are indecies
-    within this alphabet.
+    within this alphabet. The maximum length for a name in this mode is 36 bytes (<code>0x24</code>).
 </p>

 <p>The children can be a combination of either attribute names, or child tags. Attribute names are represented by
    the byte <code>0x2E</code> followed by a length prefixed name as defined above. Child tags follow the above
-    format. Type <code>0x2E</code> must therefore be considered reserved as a possible structure type.</p>
+    format. Type <code>0x2E</code> must therefore be considered reserved as a possible structure type. As they carry
+    special meaning in text-bsaed XML encoding, attribute names beginning with <code>__</code> are disallowed.</p>
+
+<details>
+    <summary>Source code details</summary>
+    <p>I'm not going to labour this one, so if you want to go look yourself:</p>
+    <ul>
+        <li>6-packed name reader: <code>libavs-win32.dll:0x10009f90</code></li>
+        <li>Unpacked name reader: <code>libavs-win32.dll:0x1000a110</code></li>
+        <li>The call to the above: <code>libavs-win32.dll:0x10034a57</code>, with the <code>__</code> checking starting
+            at <code>libavs-win32:0x10034cfd</code> for attributes (i.e. the <code>JZ</code> at <code>0x10034a7c</code>)
+        </li>
+    </ul>
+</details>

 <p>Attributes (type <code>0x2E</code>) represent a string attribute. Any other attribute must be defined as a child
    tag. Is it notable that 0 children is allowable, which is how the majority of values are encoded.</p>
+
 <p>All valid IDs, and their respective type, are listed in the following table. The bucket column here will be
    used later when unpacking the main data, so we need not worry about it for now, but be warned it exists and is
    possibly the least fun part of this format.</p>
@ -766,7 +890,9 @@
    optional, however should be stripped during decoding.</p>
 <p>All of these IDs are <code>& 0x3F</code>. Any value can be turned into an array by setting the 7<sup>th</sup> bit
    high (<code>| 0x40</code>). Arrays of this form, in the data section, will be an aligned <code>size: u32</code>
-    immediately followed by <code>size</code> bytes' worth of (unaligned!) values of the unmasked type.</p>
+    immediately followed by <code>size</code> bytes' worth of (unaligned!) values of the unmasked type. Despite being a
+    <code>u32</code>, the maximum length allowed is <code>0xffffff</code>.
+</p>

 <details>
    <summary>Source code details</summary>
@ -794,7 +920,7 @@
    <img src="./images/yes_array.png">
    <p>This seems to suggest that internally arrays are represented as a normal node, with the <code>array</code>
        type, however when serializing it's converted into the array types we're used to (well, will be after the
-        next sections) by masking 0x40 onto the contained type.</p>
+        next sections) by masking <code>0x40</code> onto the contained type.</p>
    <p>Also of interest from this snippet is the fact that <code>void</code>, <code>bin</code>, <code>str</code>,
        and <code>attr</code> cannot be arrays. <code>void</code> and <code>attr</code> make sense, however
        <code>str</code> and <code>bin</code> are more interesting. I suspect this is because konami want to be able