commit 546df244b0e6fd9aad19880b8af1d7a22406a534 Author: Bottersnike Date: Mon Dec 20 03:39:28 2021 +0000 Initial diff --git a/images/encoding_table.png b/images/encoding_table.png new file mode 100644 index 0000000..a1223a9 Binary files /dev/null and b/images/encoding_table.png differ diff --git a/images/no_array.png b/images/no_array.png new file mode 100644 index 0000000..b1a73f1 Binary files /dev/null and b/images/no_array.png differ diff --git a/images/no_array_2.png b/images/no_array_2.png new file mode 100644 index 0000000..c5e6b63 Binary files /dev/null and b/images/no_array_2.png differ diff --git a/images/types_table.png b/images/types_table.png new file mode 100644 index 0000000..ee754bb Binary files /dev/null and b/images/types_table.png differ diff --git a/images/unsorted/javaw_DSoqceZKFz.png b/images/unsorted/javaw_DSoqceZKFz.png new file mode 100644 index 0000000..731d7c1 Binary files /dev/null and b/images/unsorted/javaw_DSoqceZKFz.png differ diff --git a/images/xml_encoding_table.png b/images/xml_encoding_table.png new file mode 100644 index 0000000..00cd27e Binary files /dev/null and b/images/xml_encoding_table.png differ diff --git a/images/yes_array.png b/images/yes_array.png new file mode 100644 index 0000000..3ec6bd3 Binary files /dev/null and b/images/yes_array.png differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..9d71b14 --- /dev/null +++ b/index.html @@ -0,0 +1,58 @@ + + + + + + + + eAmuse API + + + + + + + + + + + +
ContentsTransport layerPacket format
+ +

Benami/Konami eAmuse API

+

Why?

+

I was curious how these APIs work, yet could find little to nothing on Google. There are a number of + closed-source projects, with presumably similarly closed-source internal documentation, and a scattering of + implementations of things, yet I couldn't find a site that actually just documents how the API works. If I'm + going to have to reverse engineer an open source project (or a closed source one, for that matter), I might as + well just go reverse engineer an actual game (or it's stdlib, as most of my time has been spent currently).

+

These pages are very much a work in progress, and are being written as I reverse engineer parts of the + protocol. I've been asserting all my assumptions by writing my own implementation as I go, however it currently + isn't sharable quality code and, more importantly, the purpose of these pages is to make implementation of one's + own code hopefully trivial.

+

Sharing annotated sources for all of the games' stdlibs would be both impractical and unwise. Where relevant + however I try to include snippets to illustrate concepts, and have included their locations in the source for if + you feel like taking a dive too.

+

If you're here because you work on one of those aforementioned closed source projects, hello! Feel free to share + knowledge with the rest of the world, or point out corrections. Or don't; you do you.

+ +

Contents

+
    +
  1. Transport layer
  2. +
      +
    1. Packet structure
    2. +
    3. Types
    4. +
    +
  3. The inner packet structure
  4. +
      +
    1. XML packets
    2. +
    3. Binary packed packets
    4. +
    5. Binary schemas
    6. +
    7. Binary data
    8. +
    +
+ +

This site intentionally looks not-great. I don't feel like changing that, and honestly quite like the aesthetic.

+ + + \ No newline at end of file diff --git a/packet.html b/packet.html new file mode 100644 index 0000000..f8a362a --- /dev/null +++ b/packet.html @@ -0,0 +1,893 @@ + + + + + + + + Packet format | eAmuse API + + + + + + + + + + + + +
ContentsTransport layerPacket format
+ +

Packet format

+ +

eAmuse uses XML for its application layer payloads*. This XML is either verbatim, or in a custom packed binary + format.
*Newer games use JSON, but this page is about XML.

+ + +

The XML format

+ +

Each tag that contains a value has a __type attribute that identifies what type it is. Array types + have a __count attribute indicating how many items are in the array. Binary blobs additionally have + a __size attribute indicating their length (this is notably not present on strings, however).

+

It is perhaps simpler to illustrate with an example, so:

+
<?xml version='1.0' encoding='UTF-8'?>
+<call model="KFC:J:A:A:2019020600" srcid="1000" tag="b0312077">
+    <eventlog method="write">
+        <retrycnt __type="u32" />
+        <data>
+            <eventid __type="str">G_CARDED</eventid>
+            <eventorder __type="s32">5</eventorder>
+            <pcbtime __type="u64">1639669516779</pcbtime>
+            <gamesession __type="s64">1</gamesession>
+            <strdata1 __type="str" />
+            <strdata2 __type="str" />
+            <numdata1 __type="s64">1</numdata1>
+            <numdata2 __type="s64" />
+            <locationid __type="str">ea</locationid>
+        </data>
+    </eventlog>
+</call>
+

Arrays are encoded by concatenating every value together, with spaces between them. Data types that have multiple + values, are serialized similarly.

+

Therefore, an element storing an array of 3u8 ([(1, 2, 3), (4, 5, 6)]) would look like + this

+
<demo __type="3u8" __count="2">1 2 3 4 5 6</demo>
+

Besides this, this is otherwise a rather standard XML.

+ +

Packed binary overview

+ +

Many packets, rather than using a string-based XML format, use a custom binary packed format instead. While it + can be a little confusing, remembering that this is encoding an XML tree can make it easier to parse.

+

To start with, let's take a look at the overall structure of the packets.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
0123456789101112131415
A0CE~EHead length
Schema definition
FFAlign
Data length
Payload
Align
+

Every packet starts with the magic byte 0xA0. Following this is the content byte, the encoding byte, + and then the 2's compliment of the encoding byte.

+

Currently known possible values for the content byte are:

+ + + + + + + + + + + + + + + + + + + + + + + +
CContent
0x42Compressed data
0x43Compressed, no data
0x45Decompressed data
0x46Decompressed, no data
+

Decompressed packets contain an XML string. Compressed packets are what we're interested in here.

+

The encoding flag indicates the encoding for all string types in the packet (more on those later). Possible + values are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
E~EEncoding name
0x200xDFASCII
0x400xBFISO-8859-1ISO_8859-1
0x600x9FEUC-JPEUCJPEUC_JP
0x800x7FSHIFT-JISSHIFT_JISSJIS
0xA00x5FUTF-8UTF8
+
+ Source code details +

The full table for these values can be found in libavs.

+
+ +
libavs-win32.dll:0x1006b960
+
+

A second table exists just before this on in the source, responsible for the + <?xml version='1.0' encoding='??'?> line in XML files. +

+
+ +
libavs-win32.dll:0x1006b940
+
+

This is indexed using the following function, which maps the above encoding IDs to 1, 2, 3, 4 and 5 + respectively.

+
char* xml_get_encoding_name(uint encoding_id) {
+    return ENCODING_NAME_TABLE[((encoding_id & 0xe0) >> 5) * 4];
+}
+
+

While validating ~E isn't technically required, it acts as a useful assertion that the packet being + parsed is valid.

+ +

The packet schema header

+

Following the 4 byte header, is a 4 byte integer containing the length of the next part of the header (this is + technically made redundant as this structure is also terminated).

+

This part of the header defines the schema that the main payload uses.

+ +

A tag definition looks like:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
0123456789101112131415
TypenlenTag name
Attributes and childrenFE
+ +

Structure names are encoded as densely packed 6 bit values, length prefixed (nlen). The acceptable + alphabet is 0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz, and the packed values + are indecies within this alphabet.

+ +

The children can be a combination of either attribute names, or child tags. Attribute names are represented by + the byte 0x2E followed by a length prefixed name as defined above. Child tags follow the above + format. Type 0x2E must therefore be considered reserved as a possible structure type.

+ +

Attributes (type 0x2E) represent a string attribute. Any other attribute must be defined as a child + tag. Is it notable that 0 children is allowable, which is how the majority of values are encoded.

+

All valid IDs, and their respective type, are listed in the following table. The bucket column here will be + used later when unpacking the main data, so we need not worry about it for now, but be warned it exists and is + possibly the least fun part of this format.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IDBytesC typeBucketXML namesIDBytesC typeBucketXML names
0x010void-void0x2124uint64[3]int3u64
0x021int8bytes80x2212float[3]int3f
0x031uint8byteu80x2324double[3]int3d
0x042int16shorts160x244int8[4]int4s8
0x052uint16shorts160x254uint8[4]int4u8
0x064int32ints320x268int16[4]int4s16
0x074uint32intu320x278uint8[4]int4s16
0x088int64ints640x2816int32[4]int4s32vs32
0x098uint64intu640x2916uint32[4]int4u32vs32
0x0aprefixchar[]intbinbinary0x2a32int64[4]int4s64
0x0bprefixchar[]intstrstring0x2b32uint64[4]int4u64
0x0c4uint8[4]intip40x2c16float[4]int4fvf
0x0d4uint32inttime0x2d32double[4]int4d
0x0e4floatintfloatf0x2eprefixchar[]intattr
0x0f8doubleintdoubled0x2f0-array
0x102int8[2]short2s80x3016int8[16]intvs8
0x112uint8[2]short2u80x3116uint8[16]intvu8
0x124int16[2]int2s160x3216int8[8]intvs16
0x134uint16[2]int2s160x3316uint8[8]intvu16
0x148int32[2]int2s320x341boolbyteboolb
0x158uint32[2]int2u320x352bool[2]short2b
0x1616int16[2]int2s64vs640x363bool[3]int3b
0x1716uint16[2]int2u64vu640x374bool[4]int4b
0x188float[2]int2f0x3816bool[16]intvb
0x1916double[2]int2dvd0x38
0x1a3int8[3]int3s80x39
0x1b3uint8[3]int3u80x3a
0x1c6int16[3]int3s160x3b
0x1d6uint16[3]int3s160x3c
0x1e12int32[3]int3s320x3d
0x1f12uint32[3]int3u320x3e
0x2024int64[3]int3s640x3f
+ +

Strings should be encoded and decoded according to the encoding specified in the packet header. Null termination is optional, however should be stripped during decoding.

+

All of these IDs are & 0x3F. Any value can be turned into an array by setting the 7th bit + high (| 0x40). Arrays of this form, in the data section, will be an aligned size: u32 + immediately followed by size bytes' worth of (unaligned!) values of the unmasked type.

+ +
+ Source code details +

The full table for these values can be found in libavs. This table contains the names of every tag, along + with additional information such as how many bytes that data type requires, and which parsing function + should be used.

+
+ +
libavs-win32.dll:0x100782a8
+
+
+
+ Note about the array type: +

While I'm not totally sure, I have a suspicion this type is used internally as a pseudo-type. Trying to + identify its function as a parsable type has some obvious blockers:

+ +

All of the types have convenient printf-using helper functions that are used to emit them when + serializing XML. All except one.

+ +

If we have a look inside the function that populates node sizes (libavs-win32.dll:0x1000cf00), + it has an explicit case, however is the same fallback as the default case.

+ + +

In the same function, however, we can find a second (technically first) check for the array type.

+ +

This seems to suggest that internally arrays are represented as a normal node, with the array + type, however when serializing it's converted into the array types we're used to (well, will be after the + next sections) by masking 0x40 onto the contained type.

+

Also of interest from this snippet is the fact that void, bin, str, + and attr cannot be arrays. void and attr make sense, however + str and bin are more interesting. I suspect this is because konami want to be able + to preallocate the memory, which wouldn't be possible with these variable length structures. +

+
+ +

The data section

+ +

This is where all the actual packet data is. For the most part, parsing this is the easy part. We traverse our + schema, and read values out of the packet according to the value indicated in the schema. Unfortunately, konami + decided all data should be aligned very specifically, and that gaps left during alignment should be backfilled + later. This makes both reading and writing somewhat more complicated, however the system can be fairly easily + understood.

+

Firstly, we divide the payload up into 4 byte chunks. Each chunk can be allocated to either store individual + bytes, shorts, or ints (these are the buckets in the table above). When reading or writing a value, we first + check if a chunk allocated to the desired type's bucket is available and has free/as-yet-unread space within it. + If so, we will store/read our data to/from there. If there is no such chunk, we claim the next unclaimed chunk + for our bucket.

+

For example, imagine we write the sequence byte, int, byte, short, byte, int, short. The final output should look like:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
0123456789101112131415
bytebytebyteintshortshortint
+ +

While this might seem a silly system compared to just not aligning values, it is at least possible to intuit that it helps reduce wasted space. It should be noted that any variable-length structure, such as a string or an array, claims all chunks it encroaches on for the int bucket, disallowing the storage of bytes or shorts within them.

+ +
+ Implementing a packer +

While the intuitive way to understand the packing algorithm is via chunks and buckets, a far more efficient implementation can be made that uses three pointers. Rather than try to explain in words, hopefully this python implementation should suffice as explanation:

class Packer:
+    def __init__(self, offset=0):
+        self._word_cursor = offset
+        self._short_cursor = offset
+        self._byte_cursor = offset
+        self._boundary = offset % 4
+
+    def _next_block(self):
+        self._word_cursor += 4
+        return self._word_cursor - 4
+
+    def request_allocation(self, size):
+        if size == 0:
+            return self._word_cursor
+        elif size == 1:
+            if self._byte_cursor % 4 == self._boundary:
+                self._byte_cursor = self._next_block() + 1
+            else:
+                self._byte_cursor += 1
+            return self._byte_cursor - 1
+        elif size == 2:
+            if self._short_cursor % 4 == self._boundary:
+                self._short_cursor = self._next_block() + 2
+            else:
+                self._short_cursor += 2
+            return self._short_cursor - 2
+        else:
+            old_cursor = self._word_cursor
+            for _ in range(math.ceil(size / 4)):
+                self._word_cursor += 4
+            return old_cursor
+
+    def notify_skipped(self, no_bytes):
+        for _ in range(math.ceil(no_bytes / 4)):
+            self.request_allocation(4)

+
+ + + + \ No newline at end of file diff --git a/styles.css b/styles.css new file mode 100644 index 0000000..1bdf5fb --- /dev/null +++ b/styles.css @@ -0,0 +1,57 @@ +body { + /* font-family: sans-serif; */ +} + +table { + border-collapse: collapse; + font-family: monospace; + letter-spacing: .02em; +} + +thead { + font-weight: bold; + border-bottom: 2px solid #000; +} + +td { + border: 1px solid #111; + padding: 2px; + text-align: center; + min-width: 32px; +} + +td a { + display: block; + padding: 4px 8px; +} + +code { + display: inline-block; + letter-spacing: .02em; + padding: 2px 4px; + font-size: 90%; + color: #c7254e; + background-color: #f9f2f4; + border-radius: 4px; +} +pre > code { + border-radius: 4px; + background: #f8f8f8; + border: 1px solid #ccc; + padding: 4px; + color: #333; + padding: 9.5px; + line-height: 1.4; +} + +summary { + user-select: none; + cursor: pointer; +} + +details { + background: lightblue; + border: 1px solid cornflowerblue; + padding: 4px; + margin: 4px 0; +} diff --git a/transport.html b/transport.html new file mode 100644 index 0000000..da81e7c --- /dev/null +++ b/transport.html @@ -0,0 +1,43 @@ + + + + + + + Transport | eAmuse API + + + + + + + + + + +
ContentsTransport layerPacket format
+ +

Network format

+ +

eAmuse packets are sent and received over HTTP (no S), with requests being in the body of POST requests, and replies being in the, well, reply.

+

The packets are typically both encrypted and compressed. The compression format used is indicated by the X-Compress header, and valid values are

+ +

Encryption is performed after compression, and uses RC4. RC4 is symmetric, so decryption is performed the same as encryption. That is, packet = encrypt(compress(data)) and data = decompress(decrypt(data)).

+ +

Encryption keys

+

Encryption is not performed using a single static key. Instead, each request and response has its own key that is generated.

+

These keys are generated baesd on the X-Eamuse-Info header.

+

This header loosely follows the format 1-[0-9a-f]{8}-[0-9a-f]{4}. This corresponds to [version]-[serial]-[salt]. TODO: Confirm this

+

Our per-packet key is then generated using md5(serial | salt | KEY). Identifying KEY is left as an exercise for the reader, however should not be especially challenging.

+ +

LZ77

+

Packets are compressed using lzss. The compressed data structure is a repeating cycle of an 8 bit flags byte, followed by 8 values. Each value is either a single literal byte, if the corresponding bit in the preceeding flag is high, or is a two byte lookup into the window.

+

The lookup bytes are structured as pppppppp ppppllll where p is a 12 bit index in the window, and l is a 4 bit integer that determines how many times to repeat the value located at that index in the window.

+ +

The exact algorithm used for compression is not especially important, as long as it follows this format. One can feasibly perform no compression at all, and instead insert 0xFF every 8 bytes (starting at index 0), to indicate that all values are literals. While obviously poor for compression, this is an easy way to test without first implementing a compressor.

+ + + \ No newline at end of file