Projects‎ > ‎USDS‎ > ‎USDS 1.0*‎ > ‎

Specification of the Text Dictionary

The text dictionary consists of the following parts:

  • head: contains the dictionary identifier;
  • tags: elements of the body;
  • restrictions: additional rules for constructing the document..
It's recommended to use file extension ".udic" end text encode utf-8 for the text dictionary. One text file can contain several dictionaries with the different identifiers. One binary USDS document can contain only one dictionary.

Here is an example of the file with several text dictionaries:

hell_world.udic

USDS DICTIONARY ID=888 v.1.0
{
    1: STRUCT I
    {
        1: UNSIGNED VARINT n;
        2: DOUBLE s;
        3: STRING<UTF-8> g;
        4: LONG t;
        5: BOOLEAN b;
    } RESTRICT {root=false;}

    2: STRUCT S
    {
        1: UNSIGNED VARINT n;
        2: INT m;
        3: LONG s;
        4: LONG e;
        5: ARRAY<I> v;
    }
}

USDS DICTIONARY ID=888 v.1.1
{
    1: STRUCT I
    {
        1: UNSIGNED VARINT n;
        2: DOUBLE s;
        3: STRING<UTF-8> g;
        4: LONG t;
        5: BOOLEAN b;
        6: BOOLEAN с;
    
} RESTRICT {root=false;}

    2: STRUCT S
    {
        1: UNSIGNED VARINT n;
        2: INT m;
        3: LONG s;
        4: LONG e;
        5: ARRAY<I> v;
    }
}

Head

The head format:

The scheme includes:

  • Dictionary identifier: integer, from 0 to 2147483647. The values from 1 to 65535 are reserved for public standards (HTML, SCT, etc.);
  • Major version: integer, from 0 to 255;
  • Minor version: integer, from 0 to 255;

The Dictionary identifier is used for integrity control when a dictionary and a body are storing separately. If software is compiled using the dictionary, only the head and body are necessary in USDS documents. If you are always storing a dictionary and body together (e.g. for a configuration file) then you can select zero as the dictionary identifier. It's recommended to use the dictionary versions anyway.

Tags

The tag format:

The scheme includes:

  • Tag identifier:
    • an integer, from 1 to 2147483647;
    • unique in one dictionary version;
    • it is necessary to number tags sequentially;
    • in the binary body the type "unsigned varint" is used;
  • Tag type: all USDS types represented below;
  • Tag name: not less than 1 symbol; can contain Latin letters, numbers and the symbol '_'. The first symbol must be a non-number. The binary body doesn't contain the tag names;
  • "RESTRICT" - addition rules for tags:
    • "notRoot": identifier for non-root tags. For example, the tag <H1> cannot be a root tag in HTML.

Data Types

The full list of USDS data types:
The gadget spec URL could not be found

Simple tags

The format of the simple tags (from 1 to 25 in the table above):

Signed integer numbers are represented as "Two's complement" in the binary body. Real number corresponds to IEEE 754.

Big-endian numbers are parsed correctly on Little-endian platforms.

The data types "VARINT" and "UNSIGNED VARINT" correspond to Little Endian Base 128.

The following restrictions are enabled for these types, instead of "BOOLEAN":

  • "values": valid values for the tag (ranges or list of values);
  • "precision": it's available for integer and float types.

Structure

The format of the structure:

The scheme includes:

  • Field identifier:
    • an integer, from 1 to 2147483647;
    • unique in 1 structure;
    • it is necessary to number fields sequentially;
    • in the binary body the type "unsigned varint" is used;
  • Field type:
    • any USDS type instead of "struct";
    • other tag name with any type;
    • this tag name (recursion is supported): in this case the field must be optional;
  • Field name: minimum of 1 symbol; can contain Latin letters, numbers and the symbol '_'. The first symbol must be a non-number. The binary body doesn't contain the field names;
  • The next restrictions are enabled:
    • all restrictions of the field's type;
    • "default": the default value for the field, only for the simple types (1-25) and strings. If a field's value is equal to the default value, it won't be included in the binary body;
    • "optional": for any type. Parsers will return NULL if the field was not included in the binary body.

Array

The format of the array:

The scheme includes:

  • Element's type:
    • the simple types (1-25);
    • "STRING";
    • other tag name with any type;
    • list of the tag names with any type: in this case the array can include all these tags (polymorph array);
  • the next restrictions are available:
    • all restrictions of the element's type;
    • "fixSize": fixed number of the elements in the array. The value starts at 1:
    • "minSize" - the minimum size of the array. The value starts at 0. It's incompatible with "fixSize";
    • "maxSize" - the maximum size of the array. The value starts at 1. It's incompatible with "fixSize";
    • "minOccure" - for the polymorph array only: minimum number of the elements in array. The value starts at 0;
    • "maxOccure" - for the polymorph array only: maximum number of the elements in array. The value starts at 1.

Multidimensional arrays are not supported, but you can use "array of arrays".

Strings

The format of the string:

The scheme includes:

  • Text encode: one of the supported text encodes; the full list is represented below;
  • Restrictions:
    • "fixSize": fixed number of the symbols in the string. The value starts at 1:
    • "minSize" - the minimum number of the symbols in the string. The value starts at 0. It's incompatible with "fixSize";
    • "maxSize" - the maximum number of the symbols in the string. The value starts at 1. It's incompatible with "fixSize";
    • "values": the regular expression for the string.
The gadget spec URL could not be found

Lists

The format of the list:

The list is optimized for data arrays of unknown size. The scheme includes:

  • Element's type:
    • the simple types (1-25);
    • "STRING";
    • other tag name with any type;
    • list of the tag names with any type: in this case the list can include all these tags (polymorph list);
  • the following restrictions are available:
    • all restrictions of the element's type;
    • "minSize" - the minimum size of the list. The value starts at 0;
    • "maxSize" - the maximum size of the list. The value starts at 1;
    • "minOccure" - for the polymorph list only: minimum number of the elements in list. The value starts at 0;
    • "maxOccure" - for the polymorph list only: maximum number of the elements in list. The value starts at 1.
The size of the list (the element's number) is stored in the binary body as the type "distributed unsigned varint".

Maps

The format of the map:

The scheme includes:

  • Key type:
    • simple types (1-25);
    • "STRING";
    • other tag name with any type;
    • list of the tag names with any type: in this case the key can be any of these tags (polymorph map);;
  • Value type:
    • simple types (1-25);
    • "STRING";
    • other tag name with any type;
    • list of the tag names with any type: in this case the value can be any of these tags (polymorph map);
  • the following restrictions are available:
    • all restrictions of the element's type;
    • "minSize" - the minimum size of the map. The value starts at 0;
    • "maxSize" - the maximum size of the map. The value starts at 1;
    • "minOccure" - for the polymorph map only: minimum number of the elements in list. The value starts at 0;
    • "maxOccure" - for the polymorph map only: maximum number of the elements in list. The value starts at 1.
The size of the map (the number of the couples) is stored in the binary body as the type "distributed unsigned varint".

Polymorph Tags

Polymorph tags are used for the following scenario: the tag <body> (in HTML) can include a list of tags (e.g. <H1>, <H2> and <p>) in any order. The problem can be solved with following dictionary:
1: STRUCT H1 { ... }
2: STRUCT H2
{ ... }
3: POLYMORPH<H1, H2> BodySubTags;
4: STRUCT Body
{
    
1: LIST<BodySubTags> subTags;
}

Other formats use type "ANY" for polymorphism, but this method reduces the accuracy of the schema description.

The format of the polymorph:

The scheme includes:

  • Tag names - the list of the tags, which are part of the polymorph;
  • only one restriction is supported:
    • notRoot: identifier for non-root tags.

Dictionary restrictions

The following restrictions are available for the dictionary:
  • "minOccure" - minimum number of the elements in the USDS Documents. The value starts at 0;
  • "maxOccure" - maximum number of the elements in the USDS Documents. The value starts at 1.
2015.08.11 Andrey Abramov
Updated at 2015.11.10
CC BY 4.0

Comments

The gadget spec URL could not be found