This StAX parser allows to easily read JSONs directly to the application data structures, skipping the building of JSON DOM and skipping all irrelevant pieces of data in the JSON stream.
Why StAX
99% of real world applications read JSONs to their own data structures.
Different parsers usually provide two main approaches to do so:
- Most parsers first read the file to the Document Object Model data structures (DOM) and then let the application to convert these DOM nodes into application objects. This leads to the substantial memory and CPU overheads.
- Other parsers provide the application with streaming-push interface (SAX). In this approach JSON library becomes just-a-lexer, and all actual parsing is delegated to the application, that has to implement some hand-written ingenious state machine, that:
- maps keys to fields,
- switches contexts and mappings on each object and array start/end,
- skips all unneeded structures,
- handles different variants of mappings,
- converts, checks and normalizes data.
So these approaches are either slow and resource consuming or painful to use, but there is a third way free from these downsides: a streaming pull interface (StAX) that implemented in Argentum json_Parser
class.
- Like SAX it parses data on the fly without building intermediate DOM.
- But unlike it, Argentum
Parser
doesn't feed application with stream of tokens. Instead it allows application to query for the data this application expects. - If some parts of incoming JSON left not claimed, they simply get skipped.
- This combines the simplicity of DOM-based parsers with high efficiency of SAX parsers.
Usage Examples
Create and initialize Parser instance
using json { Parser }
p = Parser.init(jsonText);
Parser p
is ready to parse text from jsonText
.
Existing parser can be reset to parse another JSON (or the same JSON data from the beginning) by calling init
on the existing parser object. You don't need to recreate parsers every time.
Every standard JSON file contains a single root node that can be:
- a number, boolean value, string, null,
- an array of nodes,
- an object, which is a key-value collection, where key is always a string, and value is a node.
Argentum json_Parser
initially stays on the root node, from which you can get a node of type you application expects. Usually its either object
or array
.
Read Arrays from JSON
To read an array from the current position, use getArr
that takes a ()void
lambda as a parameter. This lambda will be called for each encountered array item:
p.getArr{
log("an array item!")
};
Actually this lambda is intended not to just log the fact the array item is seen, but to extract array item the way the application expects. So let's extract primitive nodes.
Read primitive data
Numbers
To extract numeric node, call
getNum(defaultValue double) double
- returns either extracted value, or adefaultValue
, if the current item in a stream is not a number.- or
tryNum() ?double
- it returns?double
that tells both if current node is actually a number and its value if it is.
You can call getNum/tryNum
:
- right after the
Parser.init
to check/extract a numeric root node (it's weird but legit), - or you can call
getNum/tryNum
fromgetArr
lambdas to fetch array items, - or use them inside
getObj
lambdas (explained later) to extract object fields:
p.init("[1, 2, 3]");
p.getArr {
p.tryNum() ? log("item {_} ");
};
// it prints: item 1 item 2 item 3
In this example we:
- initialize a parser with text "[1,2,3]", which is an array of numbers,
- parse the root item as an array
- try to extract numbers out of each array item,
- and if it's a number, print it.
// Or the same as above but using getNum and inline lambda:
p.getArr\log("item {p.getNum(0.0)} ");
BTW Argentum JSON Parser always treat numbers as doubles. It's by JSON standard. All parsers that expect anything other than 52-bit mantissa (double) from numeric items are not portable, not interoperable and not standard-conformant.
Booleans
To extract boolean values, call:
getBool(defaultValue bool) bool
- returns either an extracted value or a defaultValue- or
tryBool() ?bool
- returns?bool
- that tells both if the current value in the stream is a bool, and if it is - holds it's value.
p.init("[true, 5, false]");
p.getArr{
p.tryBool()
? log("item {_?"true":"false"} ")
: log("not a bool!");
};
// it prints: item true not a bool! item false
Strings
To extract string values, use tryStr()
or getStr(defaultValue str)
methods:
p.init("
[
"Baba", " ",
"yetu"
]
");
p.getArr{
p.tryStr() ? log(_);
};
// it prints: Baba yetu
Parser processes and checks utf-8 runes, handles all single-character escape sequences, like "\n\t" etc. It processes \uFFFF
-encoded Unicode codepoints, validates and combines surrogate pairs into valid utf-8 runes.
Sometimes application knows in advance that text is limited to some sane amount of characters. In this case instead of tryStr/getStr
it might call tryStrWithLimit/getStrWithLimit
that extracts no more than given number of utf8 runes, skipping the rest of the string.
Null
To extract null-value, use tryNull
method, which simply returns bool
indicating if there was null
in the stream or not.
When to use try* and get*
All try*
methods if they don't see their corresponding data types in the stream, just return optional-nothing
and leave the stream intact. This allows multiple attempts to extract data in different ways. For example, if we have multiple versions of different JSONs where certain bool flags sometimes returned as bool, sometimes as number 0/1, sometimes as strings "yes/no", we can handle that.
Let's extend the json_Parser
with a new method:
class Parser {
getBoolMyWay(def bool) bool {
tryNum() ? _ != 0.0 :
tryStrWithLimit(5) ? _ == "true" || _ == "yes" || _ == "1" :
getBool(def)
}
}
log(Parser.init("1").getBoolMyWay(false) ? "aye" : "nope");
// this prints aye
So the rule of thumb is:
- use
try*
methods:- if you need to check for multiple primitive types in one field/array item
- or you need some special handling if input data is of unexpected type
- use
get*
methods:- if you have only one type expected
- and you are ok with default value.
Read JSON objects
Objects are handled with getObj
method. This method takes a lambda that is called for each object field. This lambda has one parameter - a field name, and it can use all Parser's try
or get
-methods to extract field values. Example:
p.init("
{
"name": "Andrew",
"unexpected data": false,
"year": 1972
}
");
p.getObj {
_=="year" ? log("field year with number {p.getNum(0.0)}"):
_=="name" ? log("field name with string {p.getStr("unknown")}");
};
// This prints
// field name with string Andrew
// field year with number 1972
Create Arrays/Objects on demand
In most cases the object which fields we want to fill from JSON exists regardless if current JSON position contains object or not. But sometimes, we create objects and arrays only if current element is an actual array or object. For this scenario Parser
has two additional predicates:
isArr() bool
- tells if the current element is an arrayisObj() bool
- tells if it's an object
They are intended to be used this way:
x = json.isArr() ? DoubleArray.{
json.getArr\_.append(json.getNum(0.0))
}
This code checks if this is an array, and only if it is, it creates a DoubleArray instance and fills it with numbers from JSON. Variable x
here is of type optional DoubleArray
.
Error Handling
Parser has four methods for error handling:
getErrorMessage() ?str
- allows to know if the parser is in an error state and get its error messagegetErrorPos() ?Cursor
- allows to get the rest of unparsed text after error.setError(text str)
- allows to switch the Parser into an error state (if it's not already in one) and sets a error message.success()
- checks if the parser successfully parsed all its text start-to-end.
In the error state Parser returns false/optional-none/defaultVal to all calls, ends all iterations by array items and object fields. Once entered the error state, it can be cleared only with init
method that starts a new parsing.
Complex Example
Lets assume that our application has two classes - a Point and a Polygon:
class Polygon {
name = "";
points = Array(Point);
isActive = false;
}
class Point{
x = 0f;
y = 0f;
}
Our application expects JSON to contain an array of points, something like this:
[
{
"active": false,
"name": "p1",
"points": [
{"x": 11, "y": 32},
{"y": 23, "x": 12},
{"x": -1, "y": 4}
]
},
{
"points": [
{"x": 10, "y": 0},
{"x": 0, "y": 10},
{"y": 0, "x": 0}
],
"active": true,
"name": "Corner"
}
]
This structure can be parsed in a straightforward way:
fn readPolygonsFromJson(data str) Array(Polygon) {
Array(Polygon).{ // 1
json = Parser.init(data);
json.getArr\_.append(Polygon)-> json.getObj `f ( // 2
f=="active" ? _.isActive := json.getBool(false) :
f=="name" ? _.name := json.getStr("") :
f=="points" ? json.getArr\_.points.append(Point)-> json.getObj `f ( // 3
f=="x" ? _.x := float(json.getNum(0.0)) :
f=="y" ? _.y := float(json.getNum(0.0))));
json.success() : log("parsing error {json.getErrorMessage()}{json.getErrorPos()?" at {_.offset}"}");
}
}
In this code:
- In line #1 we create and return an array of Polygons, but before returning, we handle it using the "colombo" operator:
result_expression.{ actions }
. See details here. So inside{}
-block, the"_"
name denotes the resulting array. - In line #2 we create a
Polygon
instance for each array item, append it to the array and pass it with->
operator to the expressionjson.getObj
so inside thegetObj
lambda, the"_"
name refers to this newly created polygon. - In line #3 we do the same trick with a new
Point
instance that is inserted into the current polygon's points array. So inside the secondgetObj
lambda, the"_"
name corresponds to a newly created point.
These 12 lines of code handled all the edge cases:
- If input JSON has an unexpected field, this field is skipped with all its subtree.
- This code is tolerant to any order of fields in objects.
- If some field is absent from the JSON, the corresponding object gets a default value.
- If some field, root element or array item in the JSON is of an unexpected type, it will be skipped and replaced with the default value.
For example, ifjson.getBool(true)
is called on an array of objects, this array gets skipped and the default result (true
) is stored. - Since all parsing is performed in plain Argentum code, we can easily add validating/transforming/versioning logic without inventing template-driven or string-encoded DSLs.
- Parser is extremely rigid and resilient, it validates its input against JSON standard, detects and reports all errors.
Bottom line
- This is the first module entirely written in Argentum.
- It is implemented in 50% less lines of code in comparison to C++ version. So Argentum is pretty expressive.
- TODO: add more bragging and yapping
Readiness
JSON module exists in Argentum built from sources.
It is not yet included in the playground and binary demo.