SJOT: Schemas for JSON Objects ============================== <p align="right"><i>by Robert van Engelen, September 28, 2016.<br>Updated November 15, 2017.</i></p> <p><a class="github-button" href="https://github.com/Genivia/SJOT" data-count-href="/Genivia/SJOT/stargazers" data-show-count="true" data-count-aria-label="# stargazers on GitHub" aria-label="Star Genivia/SJOT on GitHub">Star</a> <a class="github-button" href="https://github.com/Genivia/SJOT/archive/master.zip" data-icon="octicon-cloud-download" aria-label="Download Genivia/SJOT on GitHub">Download</a> <a style="vertical-align:top;" href="https://www.npmjs.com/package/sjot"><img src="https://img.shields.io/npm/dm/sjot.svg" alt="npm version" height="18"></a> <a style="vertical-align:top;" href="https://travis-ci.com/Genivia/SJOT"><img src="https://travis-ci.com/Genivia/SJOT.svg?branch=master" alt="build status"></img></a> <a style="vertical-align:top;" target="_blank" href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow"><img alt="license" src="https://img.shields.io/badge/license-BSD%203--Clause-blue.svg"></a></p> The [JSON schema](http://json-schema.org) draft was an important move forward to make JSON more useful with APIs and other systems that require JSON content validation. However, working with JSON schema can be daunting and defeats the simplicity of JSON. We created a simpler alternative to JSON schema that is more compact and easier to use. We call it *Schemas for JSON Objects* or simply *SJOT*. SJOT aims at fast JSON validation and type-checking with lightweight schemas and compact validators. SJOT schemas are valid JSON, just like JSON schema. But SJOT schemas are faster, more compact, and more intuitive. A SJOT schema of an object can be as simple as a *JSON object template*. Because SJOT schemas have the look and feel of a template, SJOT is easy to use. Not convinced? Try a [live demo](get-sjot.html#demo) of SJOT and snapSJOT in action. SJOT by example {#example} --------------- As a first example, say we have a JSON representation of a company product, similar to the [json-schema.org example](http://json-schema.org/example1.html) of a JSON schema (which is over 40 lines long!) that describes a company product API. An example product in this API is: [json] { "id": 1, "name": "A green door", "price": 12.50 } The product properties `id`, `name`, and `price` are considered the bare minimum properties of a product and should therefore be required. Other products may contain optional `tags`, `dimensions`, and a `warehouseLocation`: [json] { "id": 2, "name": "An ice sculpture", "price": 12.50, "tags": ["cold", "ice"], "dimensions": { "length": 7.0, "width": 12.0, "height": 9.5 }, "warehouseLocation": { "latitude": -78.75, "longitude": 20.4 } } Let's give this a SJOT, pun intended *(comments are added for clarity and are not part of SJOT)*: [json] { "@id": "http://example.com/product.json", ← identify this schema (this is optional!) "@note": "A company product", ← describe what is defined "product": { ← define a product object that has... "id": "number", ← a required id number "name": "string", ← a required name string "price": "<0.0..", ← a required price in decimal greater than 0.0 "tags?": "string{1,}", ← an optional tags array of unique strings (a non-empty set) "dimensions?": { ← optional dimensions, when provided has.. "length": "number", ← a required length numeric dimension "width": "number", ← a required width numeric dimension "height": "number" ← a required height numeric dimension }, "warehouseLocation?": "http://example.com/geo.json#location" ← an optional warehouseLocation with a type defined by another SJOT } } It's easy to see that property names ending in `?` are optional. Property types are named (such as `"number"`) or `{}`-objects, `[]`-arrays (see later), and references to other SJOT schemas, such as `"http://example.com/geo.json#location"`. Similar to the json-schema.org example, the `location` type of `warehouseLocation` is defined in a separate SJOT schema: [json] { "@id": "http://example.com/geo.json", "location": { ← define location that has... "latitude": "float", ← a required latitude single precision float "longitude": "float" ← a required longitude single precision float } } As a side note, if you don't want to write schemas at all, then consider using the JS `snapSJOT.convert(data)` defined in the `snapSJOT` module of npm package `snapsjot`. This converts JSON data and JS values to a SJOT schema, see npm package [snapsjot](https://www.npmjs.com/package/snapsjot). SJOT types include the basic JSON types `"string"`, `"number"`, `"boolean"`, `"null"`, `"object"`, and `"array"` but also more specific types, such as `"char[0,6]"`, `"float"`, integer ranges `"0..10"` and float ranges `"0.0..10.0"`, and arrays of these, such as `"string[1,10]"` for and array of 1 to 10 strings and `"1..10[3][4]"` for an array of 4 arrays of 3 integers between 1 and 10. Object types are just the `{}`-brackets with members, as an "inline" style. To create an array using the "inline style" (without requiring named types in strings), simply use a pair of `[` `]` brackets to enclose the type. For example, `[{"id":"number"}]` is an array of objects with numeric `id` properties. The json-schema.org example schema actually defines an array of products. In SJOT, the product array type is referred to by a SJOT type reference `"http://example.com/product.json#product[]"`. This reference uses an array annotation and this suffices to describe and validate a JSON array of products. Type referencing of the form *URI#name* is used to refer to a named type in a schema, such as `http://example.com/geo.json#location` that references the `location` object type defined in the `http://example.com/geo.json` schema. As you can see, a SJOT type reference is very simple and clean. A type reference string contains a `#` reference to a global type in a schema without requiring deeper multi-hop paths (no JSON pointers or paths). A reference to a type in the current schema (e.g. that has no `@id` attribute property) is simply written as *#name* with an empty URI. A reference to the root type in a schema is simply written as *URI#* and *#* for the root type of the current schema. Multiple schemas can be combined in a list of schemas, each schema with a unique `@id`. Types can be referenced between these schemas and the schemas in the array are used to validate JSON data. See the [examples](#examples) in this article and check out our live [demo](get-sjot.html#demo) of SJOT in action. SJOT can be translated to JSON schema draft v4 without loss of details. See the [SJOT to JSON schema converter.](get-sjot.html#demo) SJOT schema basics {#basics} ------------------ A SJOT schema is a dictionary with named types and a `@root` type: [json] { "@root": type, "SomeType": type, "AnotherType": type, ... } Each type is either atomic (i.e. a primitive type), an object type, an array type, a reference to a named type, or unions thereof to define alternate choices of types. Types are explained in the next section. The `@root` property indicates the root type of the JSON document to validate. For example: [json] { "@root": "string[0,999]" } This schema validates JSON arrays of strings. This array can contain up to 999 items. If the schema has only one type, then `@root` can be replaced by any name of your choosing: [json] { "mystrings": "string[0,999]" } However, if the schema has multiple named types, then a `@root` is mandatory to avoid ambiguity. The following example defines a `@root` document type that refers to a `Person` type using the `#Person` type reference, a `Name` string type and a `Person` object with `firstname` and `lastname` properties: [json] { "@id": "http://example.com/sjot.json", "@root": "#Person", "Name": "string", "Person": { "firstname": "#Name", "lastname": "#Name" } } A SJOT schema may optionally include an `@id` property to declare a *namespace URI* to identify the schema. Using a URL to identify a schema can be useful when external schemas must be loaded by a validator. Note that the `firstname` and `lastname` of a `Person` object refer to a `Name` instead of just a string. This is useful, because if we decide later to restrict the string content of names then we only have to do this once, for example by chaning the `Name` type as follows: [json] { ... "Name": "(\\w(\\w|\\s)*)", ... } where `\\w` matches a letter or digit and `\\s` matches a space. SJOT schema types {#types} ----------------- SJOT has a list of built-in primitive types that are commonly used, besides `"boolean"`, `"number"`, `"string"`, and `"null"`. Objects, arrays, sets, tuples, and unions are simply defined in a SJOT schema using an inline style. It only takes two tables to list all SJOT schema constructs. A SJOT type is one of: [json] "any" any type (wildcard) "atom" any non-null primitive type (boolean, number, or string) "boolean" Boolean with value true or false "true" fixed value true "false" fixed value false "byte" 8-bit integer "short" 16-bit integer "int" 32-bit integer "long" 64-bit integer "ubyte" 8-bit unsigned integer "ushort" 16-bit unsigned integer "uint" 32-bit unsigned integer "ulong" 64-bit unsigned integer "integer" integer (unconstrained) "float" single precision decimal "double" double precision decimal "number" decimal number (unconstrained) "n..m" inclusive numeric range (n, m are optional integer/decimal values) "<n..m>" exclusive numeric range (n, m are optional integer/decimal values) "n,m..k,l" numeric enumeration with ranges (choice of integer/decimal values) "string" string "base64" string with base64 content "hex" string with hexadecimal content "uuid" string with UUID content, optionally starting with urn:uuid: "date" string with RFC 3339 date YYYY-MM-DD "time" string with RFC 3339 time and optional time zone HH-MM-SS[.s][[+|-]HH:MM|Z] "datetime" string with RFC 3339 datetime and optional time zone "duration" string with ISO-8601 duration PnYnMnDTnHnMnS "char" string with a single character (ASCII, Unicode, UTF-8, etc.) "char[n,m]" string of n to m characters (n, m are optional) "(regex)" string that matches the regex "type[]" array of typed values, shorthand for [ type ] "type[n,m]" array of n to m typed value, shorthand for [ n, type, m ] "type{}" set of atoms (array of unique atoms) "type{n,m}" set of n to m atoms (n, m are optional) "#name" reference to a named type in the current schema "URI#name" reference to a named type in schema "@id": "URI" "object" object, same as {} "array" array, same as [] "null" fixed value null [ type ] array of typed values [ n, type, m ] array of n to m typed values (n, m, type are optional) [ type, ..., type ] tuple of typed values [[ type, ..., type ]] union (choice) of types { "name": type, ... } object with typed properties The property names of object types can be annotated to make them optional or match a pattern: [json] "name" property is required "name?" property is optional "name?value" property with a default value (primitive types only!) "(regex)" property name(s) that match the regex If the character `?` is to be part of a property name, then we write it as a regex `(who\\?)`, with a double backslash to escape the `?` (a single backslash will be removed by most JSON parsers). Likewise, if a property name starts with a `(` then we write it as a regex. ### Objects with required, optional, and default properties An example object type with a required, optional, and default property is: [json] { "Widget": { ← a widget has... "id": "string", ← a required id "tags?": "string{1,}", ← an optional non-empty array of unique string tags "counter?1": "ulong" ← an optional counter with default value 1 } } To disallow additional properties, add the `"@final": true` attribute property. To permit optional properties to occur depending on other optional properties, see the SJOT [dependencies](#deps) described further below. An object with any properties is `"object"` or just `{}`. An empty object that does not permit any properties is `{ "@final": true }`. ### Regex properties and values Regex anchoring with `^` and `$` is unnecessary (JSON and SJOT are language and regex library neutral: regex patterns match entire strings). For example, this dictionary object maps words to words: [json] { "(\\w+)": "(\\w+)" } To match strings partially, simply use a `.*` at the ends of the regex. Additional types with constraints can be easily added to a SJOT schema, for example the ISO 6709 Annex H latitude and longitude type values (see the Google [JSON Style Guide](https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Latitude/Longitude_Property_Values#Latitude/Longitude_Property_Values)): [json] { "@id": "http://example.com/iso-6709.json", "@note": "ISO 6709 Annex H latitude and longitude location", "LatLon": "([+- ]\\d{2}(.\\d+)?[+- ]\\d{3}(.\\d+)?)" } Special string types such as ID, URI, email, hostname, and so on can be easily defined with a regex and put in a schema for reuse. ### Tuples A tuple is a fixed-length list of values, such as `[ "point", true ]`, which is defined by the tuple type: [json] [ "string", "boolean" ] ### Arrays and sets Arrays of named types are simply defined by `"type[]"` without bounds and `"type[n,m]"` with bounds. The lower and upper bounds are optional, so `"type[n,]"` and `"type[,m]"` can be used. Use `"type[n]"` for a fixed-size array. The inline style for arrays is `[type]` without bounds and `[n, type, m]` with bounds, where `n` and `m` are non-negative integers. The lower and upper bounds are optional, so `[n, type]` and `[type, m]` can be used. The `type` is also optional and is `"any"` when omitted. Thus, `[]` is an array of any type with any length, `[0]` is an empty array, `[2]` is an array with two items, and `[1,3]` is an array of one to three items of any type. For example, extending a Widget object type example to include an array of quantity-price objects: [json] { "Widget": { ← a widget has... "id": "string", ← a required id "tags?": "string{1,}", ← an optional non-empty array of unique string tags "counter?1": "ulong", ← an optional counter with default value 1 "pricing?": [ ← an optional array of quantity-price objects { "quantity": "1..", ← quantity "price": "<0.0.." ← price per quantity } ] } } Sets of named types `"type{}"` without bounds and `"type{n,m}"` with bounds are essentially arrays of atomic values that are unique. The lower and upper bounds are optional. Uniqueness of atomic values is well defined. By contrast, object equality is often semantic instead of structural. That is, two objects may still be considered equivalent when structurally different, such as when extra properties are to be ignored. Therefore, SJOT does not admit sets of non-atomic values. This requirement makes sorting stable and validation of sets (with sorting) fast. ### Enumerations To enumerate numbers for a numeric type, use constants and ranges: [json] "Composite": "4,6,8..10,12,14..16" To enumerate strings, use regex alternations: [json] "Color": "(RED|GREEN|YELLOW|BLUE)" Enumerations of mixed types are modeled with a union: [json] "TrueOrColorOrByte": [[ "true", "(RED|GREEN|YELLOW|BLUE)", "byte" ]] ### Unions A union of types describes the range of possible types that a value may have. For example, this union represents a string or a number value: [json] [[ "string", "number" ]] Array types and object types in the union must be *distinct*. Objects are distinct if they do not share properties. For example, the following union has two distinct object types: [json] [[ { "a": "number" }, { "b": "string" } ]] To combine objects that are not distinct in a union, you should define new objects that use a new outer property name that acts as a unique tag: [json] [[ { "t1": { "a": "string", "b": "number" } }, { "t2": { "b": "string" } } ]] Why is this recommended? The goal of SJOT is to make validation fast and scalable with predictable validation times, similar to XML schema validators for XML data bindings. Therefore, the SJOT validator must be able to determine the type of the value efficiently among the choices in the union, *using constant algorithmic complexity*. By contrast, JSON schema's "oneOf" and "anyOf" are not always efficient because the validator may have to revisit the data multiple times. This recommendation also enhances readability of the JSON data by design. Consider a counter example where we have a choice of two distinct objects: [json] { "data": [ a long array of objects ], "id": 456 } and [json] { "data": [ a long array of objects ], "date": "01-01-2017" } Since both objects have a `data` array, they overlap. By just looking at the JSON text, one has to search after the array to find the potentially distinguishing properties. This is not acceptable from a performance point of view. A compounding problem is that JSON does not require properties to be ordered in any way, so there is no guarantee to implement a fast object identification check. A tag is needed to distinguish these objects properly, making them immediately recognizable and distinct: [json] { "locations": { "data": [ a long array of objects ], "id": 456 } } and [json] { "invoices": { "data": [ a long array of objects ], "date": "01-01-2017" } } Arrays in a union are distinct if the item type of the arrays are distinct. This takes care of notorious problems with JSON schema when using "oneOf" instead of "anyOf" for type choices. A "oneOf" over *M* arrays of length *N* may require *M* x *N* time to validate while SJOT takes at most *M*+*N* time. Worse, validation with this JSON schema "oneOf" fails for an empty array because it matches all arrays in the "oneOf" (surprise!). You may have guessed by now that a union is a smart combination of "oneOf" and "anyOf". The validator applies "anyOf" semantics for efficiency, but the restriction on distinct types essentially force "oneOf" semantics by avoiding ambiguity. Finally, unions should not be nested, either directly or indirectly via a type reference to another union or array of unions. ### Type references To refer to a named type we use a SJOT type reference of the form URI#name* or *#name*. The first form refers to the named type in the schema identified by its `@id` and URI value and the second form refers the current schema. If the reference is to the `@root` type then we use *URI#* and just *#*, respectively. For example, a linked list of numbers can be very compactly defined as: [json] { "@root": { "value": "number", "next?": "#" } } Spaghetti references are not allowed: a type reference must refer to a type and that type cannot directly be another referenced type. SJOT in JSON {#embed} ------------ A SJOT schema can be embeded within a JSON object by using the `@sjot` property. The embedded schema describes and validates that object. For example: [json] { "@sjot": { "Person": { "@note": "Person with a first name and a last name", "firstname": "string", "lastname": "string" } }, "firstname": "Jason", "lastname": "Bourne" } When embedded, the SJOT schema should have only one type or define a `@root` object type (if several types are defined) that defines the JSON document content. In this example the `Person` object type describes the content. The JSON content is valid because it includes the required `firstname` and `lastname` properties of a `Person` object type. An embedded SJOT may refer to an external schema's root using `URL#`. For example, the same object above with a schema reference: [json] { "@sjot": "http://example.com/sjot.json#", "firstname": "Jason", "lastname": "Bourne" } The `@sjot` URL points to a SJOT schema that has a `Person` object type as the root, such as the SJOT schema that we [described earlier](#basics) in this article. An embedded SJOT may refer to a specific type in a schema: [json] { "@sjot": "http://example.com/sjot.json#Person", "firstname": "Jason", "lastname": "Bourne" } When you invoke the validator with a specific type and schema, then only that type and schema are used to validate the data. Use `null` as a type when invoking the validator to permit an embedded `@sjot` to override the type. A `@sjot` in a JSON object may occur anywhere JSON, not just the root-level object. A `@sjot` may contain an array of schemas, each identified with a unique `@id`. SJOT attribute properties {#props} ------------------------- A `@sjot` attribute property of an object in JSON contains an embedded SJOT that defines the JSON object. An embedded `@sjot` value can be a type reference to a SJOT schema. If multiple types are defined in the embedded SJOT schema, the type that defines the JSON object should be named `@root`. A `@id` attribute property in a SJOT schema identifies the schema by a URI namespace string. A `@note` attribute property can be added to a SJOT schema and to the object types that the schema defines. The `@note` value should be a string. A `@root` attribute property refers to the root type of the schema. An embedded SJOT should have a `@root` attribute property or the schema should define only one type. A `@one`, `@any`, `@all`, or `@dep` attribute property of an object type in a SJOT schema restricts the use of optional object properties. See the SJOT [dependencies](#deps) described further below. A `@extends` attribute property of an object type in a SJOT schema introduces a derived object type. A derived object type includes the properties of a base object type. We will discuss the use of base and derived object types below. A `@final` attribute property declares an object type final and it cannot be extended. Also extra properties for this object in JSON are not permitted. SJOT base and derived object types {#extend} ---------------------------------- You can extend a base object by adding properties to define a derived object. The `@extends` attribute property in an object type refers to a base object type that is extended. For example: [json] { "@id": "http://www.example.com/sjot.json", "@note": "Schema to store personal information", "Person": { "@note": "Person with a first name and a last name", "firstname": "string", "lastname": "string" }, "PersonDetails": { "@note": "Person with optional age and gender", "@extends": "http://www.example.com/sjot.json#Person", "age?": "0..", "gender?": "(MALE|FEMALE)" } } The `age?` property is optional and has a non-negative integer value. The `gender?` property is optional and has one of the two string values `MALE` or `FEMALE`. When creating derived object types, it is not permitted to override the base properties. Only new properties can be added that are not already in the base object type to create a derived object type. This ensures that a derived object can be used in place of a base object in JSON and will pass validation by ignoring the extra properties in the derived object. This permits upgrading of a JSON API with backward compatibility to a base API. A derived object type can change a base property from optional to required by using a `@one` singleton propset with that property name. SJOT final object types {#final} ---------------------- A `@final` object cannot have any extra properties that are not defined in the schema. Consider the `PersonDetails` example from the previous example but now declared `@final`: [json] { "PersonDetails": { "@note": "Person with optional age and gender", "@extends": "http://www.example.com/sjot.json#Person", "@final": true, "age?": "0..", "gender?": "(MALE|FEMALE)" } } Additional properties that are used in a JSON `PersonDetails` object will cause the validator to reject this JSON content. SJOT any, one, and all dependencies {#deps} ----------------------------------- When object type properties are optional, you can make their use dependent on the presence of other properties in the object. You can enforcing one property of a set of properties to be present. Or force any property of a set to be present. Or all properties as a group to be present or none of that group. More specific property dependencies can be enforced as well. ### SJOT one The SJOT `@one` attribute property of an object type is a list of sets of object property names. Each property set defines the properties that should be exclusive, meaning only one of the properties may be present. For example, the `choices` object type defined below has one of the properties `a`, `b`, or `c`, and one of the properties `x` or `y`: [json] { "choices": { "a?": "int", "b?": "int", "c?": "int", "x?": "float", "y?": "float", "@one": [ [ "a", "b", "c" ], [ "x", "y" ] ] } } The property sets in the `@one` list should be mutually disjoint and only refer to properties that are optional (without default values) in the schema. ### SJOT any The SJOT `@any` attribute property of an object type is a list of sets of object property names. Each property set defines the properties of which one or more should be used in this object. For example, the `anyabc` object type defined below must have at least one of the properties `a`, `b`, and `c` and therefore cannot be empty: [json] { "anyabc": { "a?": "int", "b?": "int", "c?": "int", "@any": [ [ "a", "b", "c" ] ] } } The property sets in the `@any` list should be mutually disjoint and only refer to properties that are optional (without default values) in the schema. ### SJOT all The SJOT `@all` attribute property of an object type is a list of sets of object property names. Each property set defines which properties should all be included when at least one of them is used, meaning that all properties should be present or none of them at all. For example, the `allornone` object type defined below must have both of the properties `x` and `y` or none of them: [json] { "allornone": { "x?": "int", "y?": "int", "@all": [ [ "x", "y" ] ] } } The property sets in the `@all` list should be mutually disjoint and only refer to properties that are optional (without default values) in the schema. ### SJOT dep The SJOT `@dep` attribute property of an object type enforces properties to be present when a specific property is present. For example, the `ifxthenyz` object type defined below must have properties `y` and `z` if property `x` is present: [json] { "ifxthenyz": { "x?": "int", "y?": "int", "z?": "int", "@dep": { "x": [ "y", "z" ] } } } To simplify this notation, if a property list has only one property, the property name can be directly used instead of the singleton list. The property sets in each `@dep` list should only refer to properties that are optional (without default values) in the schema. Note that the `@all` attribute property enforces the *N* dependencies for a group of *N* properties that are all dependent on each other. SJOT validation {#validation} --------------- Validation proceeds recursively over objects, arrays, and tuples. Primitive values (atoms) are verified against the value type constraints that are imposed on a value by using the type information in the SJOT schema. The property names of an object are matched against the property names of a SJOT object type. For each matching property name the value is recursively validated. If a property is required but is absent, validation fails. If a property is optional and is absent or its value is `null`, validation succeeds, meaning that `null` is equivalent to absent for optional properties. In this case the `null` property can be deleted by the validator. If an optional property has a default value and is absent or its value is `null`, the default value is assumed and the default value can be assigned to this property by the validator. The `@one`, `@any`, `@all`, and `@dep` constraints on object properties is enforced. For the `@one` constraints, exactly one property must occur for each property set specified. For the `@any` set of properties at least one of the properties must occur for each property set specified. For the `@all` constraints, all or none of the properties must occur for each property set specified. For the `@dep` constraints, if an optional property is present then the properties in the specified property set must all be present. Extra properties of an object are ignored unless the object type is `@final`. Validation fails when extra properties are present in a final object. An array is validated by checking constraints on its length and the uniqueness of atomic items in case of a set. In case of a set of atoms `atom{}`, it is assumed that integers and floating point values are compared based on their mathematical value, not their type. So a set cannot contain both 0 and 0.0. A `null` value in an array is converted when validated against a primitive type. The result is `false` for Boolean, `0` for numeric types, and `""` for string types. An array of objects, arrays, or tuples cannot contain `null` values and triggers a validation error. A tuple is validated by validating its members, with the same validation rule for `null` as for arrays stated above. Tuple sizes are fixed. Validation fails when tuples are not of the correct size. An object that is validated against the types `any` or `object` is validated using its embedded `@sjot` schema, when present. SJOT examples {#examples} ------------- ### Vehicle data with embedded schema [json] { "@sjot": { "vehicle": { "color?": "(WHITE|GRAY|BLACK)", "rgb?": "([0-9a-fA-F]{6})", "make": "string", "year?": "1970..", "@one": [ [ "color", "rgb" ] ] } }, "rgb": "D71E1E", "make": "Honda", "year": 2006 } ### Product catalog with embedded schemas [json] { "@sjot": [ { "@id": "http://example.com/product.json", "@note": "Company product catalog", "@root": { "products": "http://example.com/product.json#product[]" }, "product": { "@note": "A company product", "id": "number", "name": "string", "price": "<0.0..", "tags?": "string{1,}", "dimensions?": { "length": "number", "width": "number", "height": "number" }, "warehouseLocation?": "http://example.com/geo.json#location" } }, { "@id": "http://example.com/geo.json", "location": { "latitude": "float", "longitude": "float" } } ], "products": [ { "id": 1, "name": "A green door", "price": 12.50 }, { "id": 2, "name": "An ice sculpture", "price": 12.50, "tags": ["cold", "ice"], "dimensions": { "length": 7.0, "width": 12.0, "height": 9.5 }, "warehouseLocation": { "latitude": -78.75, "longitude": 20.4 } } ] } SJOT chameleon objects: trick or treat? {#trick} --------------------------------------- A tricky situation arises when a derived object type extends a base object type that is defined in another schema. Assuming that one or more of the base object properties refer to a *type* in the current base schema by using a local *#type* reference, then the scope of these type references changes as the base object properties are literally imported into the derived object. We call this type of base object a *chameleon object*. A chameleon object (ab)uses local type references and tricks its properties into changing shape! An example chameleon object is the `Base` object type in the top SJOT schema of the following two SJOT schemas: [json] [ { "@id": "http://example.com/base.json", "Base": { "id": "#ID" }, "ID", "any" }, { "@id": "http://example.com/derived.json", "Derived": { "@extends": "http://example.com/base.json#Base" }, "ID": "string" } ] The `Base` object `id` propery changes type, from `"any"` to `"string"` when imported into `Derived` with the SJOT `@extends` attribute property. To see why, consider the derived object that results after the import and after substituting the `#ID` type reference: [json] { "@id": "http://example.com/derived.json", "Derived": { "id": "#ID" }, "ID": "string" } Chameleons allow us to define *type generics* that change shape via local type references. A real treat to the expressiveness of SJOT. However, danger lurks here! When a JSON API relies on a base object with fixed property types and this base is a chameleon, then the use of a derived object in place of the base object may cause validation failures. A local *#type* reference should only be used when the current schema has no `@id` so this schema cannot be referenced. If an `@id` is used and the resulting chameleon type generics are extended, then it makes sense that local type references should be generic types, such as `any`, `atom`, or `object`. SJOT versus JSON schema ----------------------- - JSON schema is **verbose**, doubling the nesting level compared to the JSON content it describes. By contrast, SJOT schema levels are one-on-one with JSON data. - JSON schema validation performance is **not scalable**, because validation cost may exceed linear time processing cost (meaning linear in the size of the input), in the worst case taking exponential time or memory to validate constraints, see the [exploding JSON Schema states](#JSON-schema-sucks) examples. By contrast, SJOT validators are very fast and scalable. The asymptotic running time of JSON validity checking is linear in the size of the given JSON data. - JSON schema permits constraining primitive type value ranges, but offers **few predeclared primitive types** to choose from when almost all programming languages offer byte, short, int, float and double precision types. You can use minimum, maximum and multipleOf to constrain the decimal representation in JSON Schema, but we have to keep in mind that floating point values are typically stored in IEEE 754 format and decimals are rounded, therefore values such as 1234567890123.0099 also validate when multipleOf is 0.01. Therefore, fractional constraints are not reliable. By constrast, SJOT offers a wide choice of pre-defined types and value range constraints work fine and are very simple to use in SJOT. - JSON schema is **non-strict by default**, meaning that all object properties are optional and any additional properties are permitted by default, that is, schemas accept almost anything by default. For example, JSON with typos in property names will not be rejected by a JSON Schema validator by default. By contrast, SJOT is stricy by defailt. - JSON schemas are **not extensible**, you can only add more constraints when combining schemas. There is no easy way to achieve object inheritance. Worse, combining schemas may lead to a schema that rejects too much or even rejects everything. By contrast, SJOT objects are extensible or final. - JSON schema **violates the encapsulation principle** because it permits referencing local schema types via JSON Pointer such as nested objects, which means that you cannot update local types without breaking all the schemas that point to the updated local type structures. By contrast, SJOT groups all types at the top level in the schema as a simple dictionary of named types. - JSON schema design **violates the orthogonality principle** for several constructs. For example [ and ] can sometimes be used to indicate choices but in other cases it cannot (perhaps oneOf should be used, but that has its own problems). - Checking if a JSON schema's constraints reject everything is an **NP-complete problem**. Worse, constraints may depend on property values in the JSON data, not just property occurrences. By contrast, the SJOT schema checker verifies your schemas and detects blocking constraints. - The **principle of least surprise** does not apply to JSON schema: a construct may work well in one case when the same construct causes problems elsewhere. For example, using oneOf to select among primitive types, say "string" and "number" makes sense, but using oneOf to select schemas may not always work and leads to surprising rejections. Consider the simple case when we have a JSON empty array that matches both the "array of strings" and "array of numbers" schemas! Converting SJOT to JSON schema is easy and automatic with the tools included with SJOT, try our [live demo](get-sjot.html#demo) to convert SJOT to JSON schema and vice versa. Want to give it a SJOT? {#ps} ----------------------- SJOT for JS is licensed under the BSD3 and available for download from GitHub [SJOT](https://github.com/Genivia/SJOT) and npm package [sjot](https://www.npmjs.com/package/sjot). In addition, the snapSJOT converter that creates SJOT schemas for JSON data is available for download from GitHub [snapSJOT](https://github.com/Genivia/SJOT) and npm package [snapsjot](https://www.npmjs.com/package/snapsjot). Try a [live demo](get-sjot.html#demo) of SJOT and snapSJOT in action. APPENDIX A: Exploding JSON Schema states {#JSON-schema-sucks} ---------------------------------------- The first "ping-pong" JSON schema example randomly alternates between a "ping" and a "pong" schema for nested objects `x` until we find a boolean `y` that is a final "pong": [json] {"x":{"x":{"x":{"x":{"x":{"x":{"y":true}}}}}}} If the nesting level exceeds 16 then JSON schema validators can take minutes (or crash) using the following schema: [json] { "$schema" : "http://json-schema.org/draft-04/schema#", "$ref": "#/definitions/ping", "definitions": { "ping": { "type": "object", "properties": { "x": { "anyOf": [ { "$ref": "#/definitions/ping" }, { "$ref": "#/definitions/pong" } ] } }, "additionalProperties": false }, "pong": { "type": "object", "properties": { "x": { "anyOf": [ { "$ref": "#/definitions/ping" }, { "$ref": "#/definitions/pong" } ] }, "y": { "type": "boolean" } }, "additionalProperties": false } } } For the second example, let's implement a finite state machine in a JSON schema. The JSON Schema has *N* definitions. The "words" we validate with the schema are defined by the regular expression `(a{N}|a(a|b+){0,N-1}b)*x` that describes a sequence of `a` and `b` ending in `x`. The word `abbx` is represented by the JSON pointer `a/b/b/x` which is `{"a":{"b":{"b":{"x":true}}}}`. The first definition for "0" has the following schema: [json] { "$schema": "http://json-schema.org/draft-04/schema#", "$ref": "#/definitions/0", "definitions": { "0": { "type": "object", "properties": { "a": { "$ref": "#/definitions/1" }, "x": { "type": "boolean" } }, "additionalProperties": false }, Then we add *N*-1 definitions `<DEF>` to the schema enumerated "1", "2", "3", ... "*N*-1": [json] "<DEF>": { "type": "object", "properties": { "a": { "$ref": "#/definitions/<DEF>+1" }, "b": { "anyOf": [ { "$ref": "#/definitions/0" }, { "$ref": "#/definitions/<DEF>" } ] } }, "additionalProperties": false }, where "`<DEF>`+1" wraps back to "0" when `<DEF>` is equal to *N*-1. This "NFA" on a two-letter alphabet has *N* states, only one initial and one final state. Its equivalent minimal DFA has 2^*N* (2 to the power *N*) states. In the worst case, a validator that uses this JSON schema either takes 2^*N* time or uses 2^*N* memory "cells" to validate the input. [![To top](images/go-up.png) To top](#) APPENDIX B: Tips and tricks {#tricks} --------------------------- ### What does SJOT stand for? <b>S</b>chemas for <b>J</b>SON <b>O</b>bjec<b>t</b>s. <b>To JS</b> spelled backwards. ### How to define a schema for JSON when the JSON content may have alternate types If the alternate types are distinguishable and you must use the same schema for validation then use a union as the schema root: [json] { "@root": [[ type1, type2, type3, ... ]] } ### How to define a property with a ? in the name Use a regex: [json] "(PropWithA\\?InItsName)": "string", This regex property is optional. To make the property required, see below. Use the same approach when a property name starts with a `(`. ### How to make regex properties required instead of optional Regex properties are optional by design. If the property is required, add an `@any` attribute property to force its presence: [json] "(PropWithA\\?InItsName)": "string", "@any": [ ["PropWithA?InItsName"], ... ] ### How to define a property with a default empty string value Because `null` is converted to an empty string when used as a string type, use `null` as the default value for a property that needs an empty string default value: [json] "name?null": "string" By contrast, `"name?"` is an optional property without a default value. ### How to define a singleton tuple Use unit lower and upper bounds: [json] [1, type, 1] By contrast, `[type]` denotes an array of any length, not a singleton tuple. ### How to define an array of tuples Use an array lower bound and/or upper bound: [json] [0, [type1, type2] ] By constrast, `[[ type1, type2 ]]` denotes a union. ### How to define an object that rejects additional properties Use the `@final` attribute property to restrict the object type: [json] { "@final": true, "name": "string" } This validates objects with a required `"name"` property that is a string and rejects all objects that include other properties. An object type may have regex properties, which means that additional properties are permitted when they match the regex: [json] { "@final": true, "name": "string", "(extra.*)": "any" } This permits additional properties with names that start with `"extra"`. ### How to define an empty object Use the following: [json] { "@final": true } By contrast, `"object"` and `{}` denote extensible object types. ### How to define an empty array Use the following: [json] [0] By contrast, `"array"` and `[]` denote arrays of any type and of any length. [![To top](images/go-up.png) To top](#) <p align="right"><i>Copyright (c) 2016, Robert van Engelen, Genivia Inc. All rights reserved.</i></p>