JS Open Serialization Scheme


Wynn Tee
  
11 Feb 2021

Note: This page documents the specification of JOSS. The reference implementation is hosted on GitHub.

Table of Contents



1. Introduction

JavaScript can be run not only in browsers, but also on servers through the use of JavaScript runtime environments, such as Deno and Node.js.

The de facto serialization format for exchanging data between browsers and servers is JavaScript Object Notation (JSON). However, browsers and servers that exchange data in JSON format are limited to the few data structures native to JSON, even when they can both run JavaScript.

This page documents the specification for a serialization format called the JS Open Serialization Scheme (JOSS). The format supports almost all data types and data structures intrinsic to JavaScript. The format also supports some often overlooked features of JavaScript, such as primitive wrapper objects, circular references, sparse arrays, and negative zeros.

2. Serialization

The serialization of a JavaScript data item begins with a single byte, called a marker byte. In some cases, the marker byte is standalone. In general, it is concatenated with a sequence of bytes to complete the serialization.

2.1. Standalone

The marker bytes with values 0–31 are listed in the following table. The most significant bit is assigned the bit number 0.

Table 1. Standalone marker bytes and others.
BitValueInterpretation
0–20Multipurpose
3–70null
1undefined
2true as a Boolean value
3true as a Boolean object
4false as a Boolean value
5false as a Boolean object
6Infinity as a Number value
7Infinity as a Number object
8-Infinity as a Number value
9-Infinity as a Number object
10NaN as a Number value
11NaN as a Number object
12Hole in an Array
13Unsupported data
14Marker byte for Date
15Marker byte for RegExp
16–28Reserved for future extensions
29Marker byte for object reference
30Marker byte for custom object
31Reserved for future extensions

The values 0–13 are for standalone marker bytes. The other values are either for marker bytes acting as semantic tags or reserved for future extensions.

2.2. Numbers

The Number type is used to represent numbers stored in double-precision format. It is serialized by concatenating

  1. The marker byte.
  2. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 2. Marker byte for numbers.
BitValueInterpretation
0–21Number
30Number value
1Number object
4If integer:
0Integer is not negative-valued
1Integer is negative-valued
5–70Payload is 1 byte long
1Payload is 2 bytes long
2Payload is 3 bytes long
3Payload is 4 bytes long
4Payload is 5 bytes long
5Payload is 6 bytes long
6Payload is 7 bytes long
7Payload is 8 bytes long

If the represented number is not an integer, the payload is the value of the number encoded in double-precision format and little-endian byte ordering. The payload is exactly 64 bits or 8 bytes long in this case.

If the represented number is an integer, the payload is the absolute value of the integer encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. The payload is at most 53 bits or 7 bytes long in this case.

Infinity, -Infinity, and NaN are special cases of the Number type serialized using standalone marker bytes.

2.3. Big Integers

The BigInt type is used to represent arbitrarily big integers. It is serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 3. Marker byte for big integers.
BitValueInterpretation
0–22BigInt
30BigInt value
1BigInt object
40Integer is not negative-valued
1Integer is negative-valued
5–70Payload size is 1 byte long
1Payload size is 2 bytes long
2Payload size is 3 bytes long
3Payload size is 4 bytes long
4Payload size is 5 bytes long
5Payload size is 6 bytes long
6Payload size is 7 bytes long
7Payload size is 8 bytes long

The payload is the absolute value of the represented integer encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

The payload size is the byte length of the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.4. Character and Binary Strings

The String type and ArrayBuffer object are used to represent character strings and binary strings respectively. They are serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 4. Marker byte for character and binary strings.
BitValueInterpretation
0–23String, ArrayBuffer, SharedArrayBuffer
3–40String value
1String object
2ArrayBuffer
3SharedArrayBuffer
5–70Payload size is 1 byte long
1Payload size is 2 bytes long
2Payload size is 3 bytes long
3Payload size is 4 bytes long
4Payload size is 5 bytes long
5Payload size is 6 bytes long
6Payload size is 7 bytes long
7Payload size is 8 bytes long

The payload is the represented character string encoded in UTF-8 code units or the represented binary string, whichever is applicable.

The payload size is the byte length of the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.5. Dense Arrays and Collections

The Array, Object, Map, and Set objects are used to represent indexed and keyed collections of data. They are serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 5. Marker byte for dense arrays and collections.
BitValueInterpretation
0–24Dense Array, plain Object, Map, Set
3–40Dense Array
1Plain Object
2Map
3Set
5–70Payload size is 1 byte long
1Payload size is 2 bytes long
2Payload size is 3 bytes long
3Payload size is 4 bytes long
4Payload size is 5 bytes long
5Payload size is 6 bytes long
6Payload size is 7 bytes long
7Payload size is 8 bytes long

The payload is the serialization of

  • Array: The elements in ascending order of index.
  • Object: The key-value pairs of own enumerable properties keyed by strings, optionally in the order returned by the [[OwnPropertyKeys]] method.
  • Map: The key-value pairs in order of insertion.
  • Set: The values in order of insertion.

The payload size is the number of items in the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. Each key-value pair is considered one item.

The aforementioned serialization is not applicable to Array objects with holes. The serialization of such objects is described in the next subsection.

2.6. Sparse Arrays

The previous subsection is not applicable to Array objects with holes. Such objects are serialized by concatenating

  1. The marker byte.
  2. The array size.
  3. The payload size.
  4. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 6. Marker byte for sparse arrays.
BitValueInterpretation
0–25Sparse Array
30Holes are serialized explicitly
1Indices are serialized explicitly
4–50Array size is 1 byte long
1Array size is 2 bytes long
2Array size is 3 bytes long
3Array size is 4 bytes long
6–70Payload size is 1 byte long
1Payload size is 2 bytes long
2Payload size is 3 bytes long
3Payload size is 4 bytes long

The payload is the serialization of

  • Method A: The holes and elements in ascending order of index, up to and including the last element. Holes after the last element are omitted.
  • Method B: The index-element pairs in ascending order of index.

The payload under method A is analogous to the payload of a dense Array in that holes are treated like elements. Holes are serialized explicitly using a standalone marker byte.

The payload under method B is analogous to the payload of an Object in that indices are treated like property keys.

The payload size is the number of items in the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. Each index-element pair is considered one item.

The array size is the value of the length property encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.7. Typed Arrays

The DataView and TypedArray objects are used to access ArrayBuffer objects. They are serialized by concatenating

  1. The marker byte.
  2. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 7. Marker byte for typed arrays.
BitValueInterpretation
0–26DataView, TypedArray
30Elements are in little-endian byte ordering
1Elements are in big-endian byte ordering
4–70DataView
1Int8Array
2Uint8Array
3Uint8ClampedArray
4Int16Array
5Uint16Array
6Int32Array
7Uint32Array
8Float32Array
9Float64Array
10BigInt64Array
11BigUint64Array
12–15Reserved for future extensions

The payload is the serialization of the binary string returned by the buffer property and segmented by the byteOffset and byteLength properties.

2.8. Dates

The Date object is used to represent dates and times. It is serialized by concatenating

  1. The marker byte for Date.
  2. The serialization of the number returned by the valueOf() method.

2.9. Regular Expressions

The RegExp object is used to represent regular expressions. It is serialized by concatenating

  1. The marker byte for RegExp.
  2. The serialization of the string returned by the toString() method.

2.10. Object References

A reference to an object whose marker byte can be found in the serialized byte stream is serialized by concatenating

  1. The marker byte for object reference.
  2. The serialization of the position of the referenced object's marker byte in the serialized byte stream, where the first byte is at position zero.

2.11. Custom Objects

A custom object that can be serialized using an external serialization format is serialized by concatenating

  1. The marker byte for custom object.
  2. The serialization of the custom object in accordance with the external serialization format.

2.12. Unsupported Data

Any data type or data structure not covered by the preceding subsections is serialized using a standalone marker byte.

3. Deserialization

The deserialization of a JavaScript data item is accomplished by decoding a serialized byte stream with reference to the serialization format.

The deserialization process should substitute an appropriate Error object in the following scenarios:

  • The marker byte for unsupported data is encountered.
  • The JavaScript engine cannot return the required data.

The deserialization process should stop when the serialized byte stream is malformed as in, but not limited to, the following scenarios:

  • A reserved marker byte is encountered.
  • An invalid data type is encountered, such as
    • An Object key that is not a String value.
    • An Array index that is not a Number value.
  • An invalid value is encountered, such as
    • An Array index that is out of bounds.
    • A duplicate Array index, Object key, Map key, or Set value.
  • The standalone marker byte for a hole is encountered outside the context of a sparse Array.
  • The payload of a Number type encodes an integer longer than 53 bits.
  • The payload of an object reference does not point to a prior object.
  • The serialized byte stream ends before the deserialization process.
  • The deserialization process ends before the serialized byte stream.

4. Limitations

The serialization format does not support certain data types and data structures intrinsic to JavaScript, such as Error, Function, Symbol, and objects that hold weak references like WeakMap, WeakSet, and WeakRef.

The serialization format also does not preserve object properties that are non-enumerable, keyed by symbols, or inherited through the prototype chain, such as the byteOffset property of TypedArray objects and the lastIndex property of RegExp objects.

5. Extensions

The serialization format reserves the marker byte values 224–255, as well as those labelled as reserved in Table 1 and Table 7, for future extensions.