commit 546df244b0e6fd9aad19880b8af1d7a22406a534 Author: Bottersnike Date: Mon Dec 20 03:39:28 2021 +0000 Initial diff --git a/images/encoding_table.png b/images/encoding_table.png new file mode 100644 index 0000000..a1223a9 Binary files /dev/null and b/images/encoding_table.png differ diff --git a/images/no_array.png b/images/no_array.png new file mode 100644 index 0000000..b1a73f1 Binary files /dev/null and b/images/no_array.png differ diff --git a/images/no_array_2.png b/images/no_array_2.png new file mode 100644 index 0000000..c5e6b63 Binary files /dev/null and b/images/no_array_2.png differ diff --git a/images/types_table.png b/images/types_table.png new file mode 100644 index 0000000..ee754bb Binary files /dev/null and b/images/types_table.png differ diff --git a/images/unsorted/javaw_DSoqceZKFz.png b/images/unsorted/javaw_DSoqceZKFz.png new file mode 100644 index 0000000..731d7c1 Binary files /dev/null and b/images/unsorted/javaw_DSoqceZKFz.png differ diff --git a/images/xml_encoding_table.png b/images/xml_encoding_table.png new file mode 100644 index 0000000..00cd27e Binary files /dev/null and b/images/xml_encoding_table.png differ diff --git a/images/yes_array.png b/images/yes_array.png new file mode 100644 index 0000000..3ec6bd3 Binary files /dev/null and b/images/yes_array.png differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..9d71b14 --- /dev/null +++ b/index.html @@ -0,0 +1,58 @@ + + + + + + + + eAmuse API + + + + + + + + + + + +

+ +

Benami/Konami eAmuse API

Why?

I was curious how these APIs work, yet could find little to nothing on Google. There are a number of + closed-source projects, with presumably similarly closed-source internal documentation, and a scattering of + implementations of things, yet I couldn't find a site that actually just documents how the API works. If I'm + going to have to reverse engineer an open source project (or a closed source one, for that matter), I might as + well just go reverse engineer an actual game (or it's stdlib, as most of my time has been spent currently).

These pages are very much a work in progress, and are being written as I reverse engineer parts of the + protocol. I've been asserting all my assumptions by writing my own implementation as I go, however it currently + isn't sharable quality code and, more importantly, the purpose of these pages is to make implementation of one's + own code hopefully trivial.

Sharing annotated sources for all of the games' stdlibs would be both impractical and unwise. Where relevant + however I try to include snippets to illustrate concepts, and have included their locations in the source for if + you feel like taking a dive too.

If you're here because you work on one of those aforementioned closed source projects, hello! Feel free to share + knowledge with the rest of the world, or point out corrections. Or don't; you do you.

+ +

Transport layer

Packet structure
Types

The inner packet structure

+ +

This site intentionally looks not-great. I don't feel like changing that, and honestly quite like the aesthetic.

+ + + \ No newline at end of file diff --git a/packet.html b/packet.html new file mode 100644 index 0000000..f8a362a --- /dev/null +++ b/packet.html @@ -0,0 +1,893 @@ + + + + + + + + Packet format | eAmuse API + + + + + + + + + + + + +

Contents

Transport layer

Packet format

+ +

Packet format

+ +

eAmuse uses XML for its application layer payloads*. This XML is either verbatim, or in a custom packed binary + format.
*Newer games use JSON, but this page is about XML.

+ + +

The XML format

+ +

Each tag that contains a value has a __type attribute that identifies what type it is. Array types + have a __count attribute indicating how many items are in the array. Binary blobs additionally have + a __size attribute indicating their length (this is notably not present on strings, however).

It is perhaps simpler to illustrate with an example, so:

<?xml version='1.0' encoding='UTF-8'?>
+<call model="KFC:J:A:A:2019020600" srcid="1000" tag="b0312077">
+    <eventlog method="write">
+        <retrycnt __type="u32" />
+        <data>
+            <eventid __type="str">G_CARDED</eventid>
+            <eventorder __type="s32">5</eventorder>
+            <pcbtime __type="u64">1639669516779</pcbtime>
+            <gamesession __type="s64">1</gamesession>
+            <strdata1 __type="str" />
+            <strdata2 __type="str" />
+            <numdata1 __type="s64">1</numdata1>
+            <numdata2 __type="s64" />
+            <locationid __type="str">ea</locationid>
+        </data>
+    </eventlog>
+</call>

Arrays are encoded by concatenating every value together, with spaces between them. Data types that have multiple + values, are serialized similarly.

Therefore, an element storing an array of 3u8 ([(1, 2, 3), (4, 5, 6)]) would look like + this

<demo __type="3u8" __count="2">1 2 3 4 5 6</demo>

Besides this, this is otherwise a rather standard XML.

+ +

Packed binary overview

+ +

Many packets, rather than using a string-based XML format, use a custom binary packed format instead. While it + can be a little confusing, remembering that this is encoding an XML tree can make it easier to parse.

To start with, let's take a look at the overall structure of the packets.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
A0	C	E	~E	Head length
Schema definition
												FF	Align
Data length
Payload
													Align

Every packet starts with the magic byte 0xA0. Following this is the content byte, the encoding byte, + and then the 2's compliment of the encoding byte.

Currently known possible values for the content byte are:

+ + + + + + + + + + + + + + + + + + + + + + + +

C	Content
0x42	Compressed data
0x43	Compressed, no data
0x45	Decompressed data
0x46	Decompressed, no data

Decompressed packets contain an XML string. Compressed packets are what we're interested in here.

The encoding flag indicates the encoding for all string types in the packet (more on those later). Possible + values are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

E	~E	Encoding name
0x20	0xDF	ASCII
0x40	0xBF	ISO-8859-1	ISO_8859-1
0x60	0x9F	EUC-JP	EUCJP	EUC_JP
0x80	0x7F	SHIFT-JIS	SHIFT_JIS	SJIS
0xA0	0x5F	UTF-8	UTF8

Source code details

The full table for these values can be found in libavs.

A second table exists just before this on in the source, responsible for the + <?xml version='1.0' encoding='??'?> line in XML files. +

This is indexed using the following function, which maps the above encoding IDs to 1, 2, 3, 4 and 5 + respectively.

char* xml_get_encoding_name(uint encoding_id) {
+    return ENCODING_NAME_TABLE[((encoding_id & 0xe0) >> 5) * 4];
+}

While validating ~E isn't technically required, it acts as a useful assertion that the packet being + parsed is valid.

+ +

The packet schema header

Following the 4 byte header, is a 4 byte integer containing the length of the next part of the header (this is + technically made redundant as this structure is also terminated).

This part of the header defines the schema that the main payload uses.

+ +

A tag definition looks like:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Type	nlen	Tag name
Attributes and children															FE

+ +

Structure names are encoded as densely packed 6 bit values, length prefixed (nlen). The acceptable + alphabet is 0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz, and the packed values + are indecies within this alphabet.

+ +

The children can be a combination of either attribute names, or child tags. Attribute names are represented by + the byte 0x2E followed by a length prefixed name as defined above. Child tags follow the above + format. Type 0x2E must therefore be considered reserved as a possible structure type.

+ +

Attributes (type 0x2E) represent a string attribute. Any other attribute must be defined as a child + tag. Is it notable that 0 children is allowable, which is how the majority of values are encoded.

All valid IDs, and their respective type, are listed in the following table. The bucket column here will be + used later when unpacking the main data, so we need not worry about it for now, but be warned it exists and is + possibly the least fun part of this format.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

ID	Bytes	C type	Bucket	XML names		ID	Bytes	C type	Bucket	XML names
0x01	0	void	-	void		0x21	24	uint64[3]	int	3u64
0x02	1	int8	byte	s8		0x22	12	float[3]	int	3f
0x03	1	uint8	byte	u8		0x23	24	double[3]	int	3d
0x04	2	int16	short	s16		0x24	4	int8[4]	int	4s8
0x05	2	uint16	short	s16		0x25	4	uint8[4]	int	4u8
0x06	4	int32	int	s32		0x26	8	int16[4]	int	4s16
0x07	4	uint32	int	u32		0x27	8	uint8[4]	int	4s16
0x08	8	int64	int	s64		0x28	16	int32[4]	int	4s32	vs32
0x09	8	uint64	int	u64		0x29	16	uint32[4]	int	4u32	vs32
0x0a	prefix	char[]	int	bin	binary	0x2a	32	int64[4]	int	4s64
0x0b	prefix	char[]	int	str	string	0x2b	32	uint64[4]	int	4u64
0x0c	4	uint8[4]	int	ip4		0x2c	16	float[4]	int	4f	vf
0x0d	4	uint32	int	time		0x2d	32	double[4]	int	4d
0x0e	4	float	int	float	f	0x2e	prefix	char[]	int	attr
0x0f	8	double	int	double	d	0x2f	0		-	array
0x10	2	int8[2]	short	2s8		0x30	16	int8[16]	int	vs8
0x11	2	uint8[2]	short	2u8		0x31	16	uint8[16]	int	vu8
0x12	4	int16[2]	int	2s16		0x32	16	int8[8]	int	vs16
0x13	4	uint16[2]	int	2s16		0x33	16	uint8[8]	int	vu16
0x14	8	int32[2]	int	2s32		0x34	1	bool	byte	bool	b
0x15	8	uint32[2]	int	2u32		0x35	2	bool[2]	short	2b
0x16	16	int16[2]	int	2s64	vs64	0x36	3	bool[3]	int	3b
0x17	16	uint16[2]	int	2u64	vu64	0x37	4	bool[4]	int	4b
0x18	8	float[2]	int	2f		0x38	16	bool[16]	int	vb
0x19	16	double[2]	int	2d	vd	0x38
0x1a	3	int8[3]	int	3s8		0x39
0x1b	3	uint8[3]	int	3u8		0x3a
0x1c	6	int16[3]	int	3s16		0x3b
0x1d	6	uint16[3]	int	3s16		0x3c
0x1e	12	int32[3]	int	3s32		0x3d
0x1f	12	uint32[3]	int	3u32		0x3e
0x20	24	int64[3]	int	3s64		0x3f

+ +

Strings should be encoded and decoded according to the encoding specified in the packet header. Null termination is optional, however should be stripped during decoding.

All of these IDs are & 0x3F. Any value can be turned into an array by setting the 7^th bit + high (| 0x40). Arrays of this form, in the data section, will be an aligned size: u32 + immediately followed by size bytes' worth of (unaligned!) values of the unmasked type.

+ +

Source code details

The full table for these values can be found in libavs. This table contains the names of every tag, along + with additional information such as how many bytes that data type requires, and which parsing function + should be used.

Note about the array type:

While I'm not totally sure, I have a suspicion this type is used internally as a pseudo-type. Trying to + identify its function as a parsable type has some obvious blockers:

+ +

All of the types have convenient printf-using helper functions that are used to emit them when + serializing XML. All except one.

If we have a look inside the function that populates node sizes (libavs-win32.dll:0x1000cf00), + it has an explicit case, however is the same fallback as the default case.

+ +

In the same function, however, we can find a second (technically first) check for the array type.

This seems to suggest that internally arrays are represented as a normal node, with the array + type, however when serializing it's converted into the array types we're used to (well, will be after the + next sections) by masking 0x40 onto the contained type.

Also of interest from this snippet is the fact that void, bin, str, + and attr cannot be arrays. void and attr make sense, however + str and bin are more interesting. I suspect this is because konami want to be able + to preallocate the memory, which wouldn't be possible with these variable length structures. +

+ +

The data section

+ +

This is where all the actual packet data is. For the most part, parsing this is the easy part. We traverse our + schema, and read values out of the packet according to the value indicated in the schema. Unfortunately, konami + decided all data should be aligned very specifically, and that gaps left during alignment should be backfilled + later. This makes both reading and writing somewhat more complicated, however the system can be fairly easily + understood.

Firstly, we divide the payload up into 4 byte chunks. Each chunk can be allocated to either store individual + bytes, shorts, or ints (these are the buckets in the table above). When reading or writing a value, we first + check if a chunk allocated to the desired type's bucket is available and has free/as-yet-unread space within it. + If so, we will store/read our data to/from there. If there is no such chunk, we claim the next unclaimed chunk + for our bucket.

For example, imagine we write the sequence byte, int, byte, short, byte, int, short. The final output should look like:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
byte	byte	byte		int				short		short		int

+ +

While this might seem a silly system compared to just not aligning values, it is at least possible to intuit that it helps reduce wasted space. It should be noted that any variable-length structure, such as a string or an array, claims all chunks it encroaches on for the int bucket, disallowing the storage of bytes or shorts within them.

+ +

Implementing a packer

While the intuitive way to understand the packing algorithm is via chunks and buckets, a far more efficient implementation can be made that uses three pointers. Rather than try to explain in words, hopefully this python implementation should suffice as explanation:

class Packer:
+    def __init__(self, offset=0):
+        self._word_cursor = offset
+        self._short_cursor = offset
+        self._byte_cursor = offset
+        self._boundary = offset % 4
+
+    def _next_block(self):
+        self._word_cursor += 4
+        return self._word_cursor - 4
+
+    def request_allocation(self, size):
+        if size == 0:
+            return self._word_cursor
+        elif size == 1:
+            if self._byte_cursor % 4 == self._boundary:
+                self._byte_cursor = self._next_block() + 1
+            else:
+                self._byte_cursor += 1
+            return self._byte_cursor - 1
+        elif size == 2:
+            if self._short_cursor % 4 == self._boundary:
+                self._short_cursor = self._next_block() + 2
+            else:
+                self._short_cursor += 2
+            return self._short_cursor - 2
+        else:
+            old_cursor = self._word_cursor
+            for _ in range(math.ceil(size / 4)):
+                self._word_cursor += 4
+            return old_cursor
+
+    def notify_skipped(self, no_bytes):
+        for _ in range(math.ceil(no_bytes / 4)):
+            self.request_allocation(4)

+ + + + \ No newline at end of file diff --git a/styles.css b/styles.css new file mode 100644 index 0000000..1bdf5fb --- /dev/null +++ b/styles.css @@ -0,0 +1,57 @@ +body { + /* font-family: sans-serif; */ +} + +table { + border-collapse: collapse; + font-family: monospace; + letter-spacing: .02em; +} + +thead { + font-weight: bold; + border-bottom: 2px solid #000; +} + +td { + border: 1px solid #111; + padding: 2px; + text-align: center; + min-width: 32px; +} + +td a { + display: block; + padding: 4px 8px; +} + +code { + display: inline-block; + letter-spacing: .02em; + padding: 2px 4px; + font-size: 90%; + color: #c7254e; + background-color: #f9f2f4; + border-radius: 4px; +} +pre > code { + border-radius: 4px; + background: #f8f8f8; + border: 1px solid #ccc; + padding: 4px; + color: #333; + padding: 9.5px; + line-height: 1.4; +} + +summary { + user-select: none; + cursor: pointer; +} + +details { + background: lightblue; + border: 1px solid cornflowerblue; + padding: 4px; + margin: 4px 0; +} diff --git a/transport.html b/transport.html new file mode 100644 index 0000000..da81e7c --- /dev/null +++ b/transport.html @@ -0,0 +1,43 @@ + + + + + + + Transport | eAmuse API + + + + + + + + + + +

Contents

Transport layer

Packet format

+ +

Network format

+ +

eAmuse packets are sent and received over HTTP (no S), with requests being in the body of POST requests, and replies being in the, well, reply.

The packets are typically both encrypted and compressed. The compression format used is indicated by the X-Compress header, and valid values are

none
lz77

Encryption is performed after compression, and uses RC4. RC4 is symmetric, so decryption is performed the same as encryption. That is, packet = encrypt(compress(data)) and data = decompress(decrypt(data)).

+ +

Encryption keys

Encryption is not performed using a single static key. Instead, each request and response has its own key that is generated.

These keys are generated baesd on the X-Eamuse-Info header.

This header loosely follows the format 1-[0-9a-f]{8}-[0-9a-f]{4}. This corresponds to [version]-[serial]-[salt]. TODO: Confirm this

Our per-packet key is then generated using md5(serial | salt | KEY). Identifying KEY is left as an exercise for the reader, however should not be especially challenging.

+ +

LZ77

Packets are compressed using lzss. The compressed data structure is a repeating cycle of an 8 bit flags byte, followed by 8 values. Each value is either a single literal byte, if the corresponding bit in the preceeding flag is high, or is a two byte lookup into the window.

The lookup bytes are structured as pppppppp ppppllll where p is a 12 bit index in the window, and l is a 4 bit integer that determines how many times to repeat the value located at that index in the window.

+ +

The exact algorithm used for compression is not especially important, as long as it follows this format. One can feasibly perform no compression at all, and instead insert 0xFF every 8 bytes (starting at index 0), to indicate that all values are literals. While obviously poor for compression, this is an easy way to test without first implementing a compressor.

+ + + \ No newline at end of file