A semantics for JSON

[Warning: here there be PL geekery.]

In a discussion about Lloyd Hilaiel‘s cool new JSONSelect library, Rob Sayre pointed out that the library shouldn’t permit ordered access to JSON objects. This got me thinking about the fact that the JSON RFC defines a syntax without a semantics. And yet the introduction makes a semantic claim:

An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

This claim from the introduction isn’t specified anywhere in the RFC. But it’s not too hard to provide a semantics that does. To keep things simple, let me just assume an existing semantics for Unicode strings and IEEE754 double-precision floating-point numbers, where UnicodeValue(s) produces the Unicode string for a JSON string literal s, and DoubleValue(n) produces the IEEE754 double-precision floating-point number value for the JSON number literal n.

Here goes!

Values

A JSON value is a member of one of the following disjoint sets:

  • Unicode strings
  • IEEE754 numbers
  • the constants { true, false, null }
  • JSON arrays
  • JSON objects

A JSON array is a finite sequence of JSON values. (I’ll write [] for the empty array, [ v ] for the singleton array containing the JSON value v, and a1a2 for array concatenation.)

A JSON object is a finite set of (Unicode string, JSON value) pairs.

Operations

Selecting the keys of a JSON object:

keys(o) = { s | (s, v)o }

Extending a JSON object:

extend(o, s, v) = (o{ (s, v) }) ∪ { (s, v) }
if (s, v)o
extend(o, s, v) = o{ (s, v) }
if skeys(o)

Looking up properties on a JSON object:

lookup(o, s) = v
if (s, v)o

Note that lookup is partial: lookup(o, s) is unspecified when skeys(o).

Interpretation

Now that we have the data model, we can define the interpretation of the syntax:

Value(string) = UnicodeValue(string)
Value(number) = DoubleValue(number)
Value({}) =
Value({ members }) = ObjectValue(members)
Value([]) = []
Value([ elements ]) = ArrayValue(elements)

ObjectValue(s : v) = { (UnicodeValue(s), Value(v)) }
ObjectValue(members, s : v) = extend(ObjectValue(members), UnicodeValue(s), Value(v))

ArrayValue(v) = [ Value(v) ]
ArrayValue(v, elements) = Value(v) ⋅ ArrayValue(elements)

That’s it!

With this data model, you can now show that the order of object properties doesn’t matter, except insofar as when there are duplicate keys, later ones replace earlier ones.

4 Responses to A semantics for JSON

  1. Heh, I like.

    So I’m wondering, that very last line, doesn’t that on it’s own proof that there is in fact an ordering? And that it is (ok, or could be) important?

    In fact, this goes for the two main data structures in json. Array is an implicitly ordered list (ordered as is), at least when converted to an object in js.

    And for the object order seems to matter as well. And even though the spec clearly and explicitly doesn’t define an ordering for object literals, we all know that the de facto standard deviates from that (albeit not very cross browser).

    Of course, JSON itself has little to do with browsers. But I still wouldn’t say that there’s no order, strictly speaking.

    Oh and talking about strict, what happens if you have two keys of the same name and you try to evaluate it in strict mode? It should reject an object with two keys on the same level with the same (canonical) name. Does that count for JSON.parse as well?

  2. @Peter: Here’s a concrete example to show where order matters and where it doesn’t. Let’s say that parse is the function that takes JSON syntax and produces an AST. You can now easily use the semantics to prove that Value(parse(“{ \”x\” : 1, \”x\” : 2 }”)) ≠ Value(parse(“{ \”x\” : 2, \”x\” : 1 }”)), but Value(parse(“{ \”x\” : 1, \”y\” : 1 }”)) = Value(parse(“{ \”y\” : 1, \”x\” : 1 }”)).

    See the difference? The order of the syntax matters, but in the semantics, objects are unordered. We can now actually prove that many JSON syntaxes that are different on the surface are supposed to be indistinguishable in the semantics. Which is a formal way of demonstrating what Rob Sayre was saying, namely, that you shouldn’t expose the order of properties in a parsed JSON object.

  3. As for strict mode, JSON is not defined in terms of JavaScript. It’s just a data interchange format. It happens to have a syntax based on JavaScript, and it happens to have an implementation specified in the JavaScript standard, but AFAIK the JSON standard is the RFC. So there’s no reason why strict mode should have any bearing on the definition of JSON.

  4. Still a bit curious whether engines apply their strict mode logic to parsing json ;) Wouldn’t be surprised it at least one did.