Data storage
Tarantool operates data in the form of tuples.
- tuple
A tuple is a group of data values in Tarantool’s memory. Think of it as a “database record” or a “row”. The data values in the tuple are called fields.
When Tarantool returns a tuple value in the console, by default, it uses YAML format, for example:
[3, 'Ace of Base', 1993]
.Internally, Tarantool stores tuples as MsgPack arrays.
- field
Fields are distinct data values, contained in a tuple. They play the same role as “row columns” or “record fields” in relational databases, with a few improvements:
- fields can be composite structures, such as arrays or maps,
- fields don’t need to have names.
A given tuple may have any number of fields, and the fields may be of different types.
The field’s number is the identifier of the field. Numbers are counted from base 1 in Lua and other 1-based languages, or from base 0 in languages like PHP or C/C++. So,
1
or0
can be used in some contexts to refer to the first field of a tuple.
Tarantool stores tuples in containers called spaces.
- space
In Tarantool, a space is a primary container that stores data. It is analogous to tables in relational databases. Spaces contain tuples – the Tarantool name for database records. The number of tuples in a space is unlimited.
At least one space is required to store data with Tarantool. Each space has the following attributes:
- a unique name specified by the user,
- a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool,
- an engine: memtx (default) — in-memory engine, fast but limited in size, or vinyl — on-disk engine for huge data sets.
To be functional, a space also needs to have a primary index. It can also have secondary indexes.
Tarantool is both a database manager and an application server. Therefore a developer often deals with two type sets: the types of the programming language (such as Lua) and the types of the Tarantool storage format (MsgPack).
Scalar / compound | MsgPack type | Lua type | Example value |
---|---|---|---|
scalar | nil | cdata | box.NULL |
scalar | boolean | boolean | true |
scalar | string | string | 'A B C' |
scalar | integer | number | 12345 |
scalar | integer | cdata | 12345 |
scalar | float64 (double) | number | 1.2345 |
scalar | float64 (double) | cdata | 1.2345 |
scalar | binary | cdata | [!!binary 3t7e] |
scalar | ext (for Tarantool decimal ) |
cdata | 1.2 |
scalar | ext (for Tarantool datetime ) |
cdata | '2021-08-20T16:21:25.122999906 Europe/Berlin' |
scalar | ext (for Tarantool interval ) |
cdata | +1 months, 1 days |
scalar | ext (for Tarantool uuid ) |
cdata | 12a34b5c-de67-8f90-123g-h4567ab8901 |
compound | map | table (with string keys) | {'a': 5, 'b': 6} |
compound | array | table (with integer keys) | [1, 2, 3, 4, 5] |
compound | array | tuple (cdata) | [12345, 'A B C'] |
Note
MsgPack values have variable lengths. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.
Note
The Lua nil type is encoded as MsgPack nil
but
decoded as msgpack.NULL.
In Lua, the nil type has only one possible value, also called nil
.
Tarantool displays it as null
when using the default
YAML format.
Nil may be compared to values of any types with == (is-equal)
or ~= (is-not-equal), but other comparison operations will not work.
Nil may not be used in Lua tables; the workaround is to use
box.NULL because nil == box.NULL
is true.
Example: nil
.
The Tarantool integer type is for integers between -9223372036854775808 and 18446744073709551615, which is about 18 quintillion. This type corresponds to the number type in Lua and to the integer type in MsgPack.
Example: -2^63
.
The Tarantool unsigned type is for integers between 0 and 18446744073709551615. So it is a subset of integer.
Example: 123456
.
The double field type exists
mainly to be equivalent to Tarantool/SQL’s
DOUBLE data type.
In msgpuck.h (Tarantool’s interface to MsgPack),
the storage type is MP_DOUBLE
and the size of the encoded value is always 9 bytes.
In Lua, fields of the double type can only contain non-integer numeric values and
cdata values with double floating-point numbers.
Examples: 1.234
, -44
, 1.447e+44
.
To avoid using the wrong kind of values inadvertently, use
ffi.cast()
when searching or changing double
fields.
For example, instead of
space_object:insert
{
value
}
use
ffi = require('ffi') ...
space_object:insert
({ffi.cast('double',
value
)})
.
Example:
s = box.schema.space.create('s', {format = {{'d', 'double'}}})
s:create_index('ii')
s:insert({1.1})
ffi = require('ffi')
s:insert({ffi.cast('double', 1)})
s:insert({ffi.cast('double', tonumber('123'))})
s:select(1.1)
s:select({ffi.cast('double', 1)})
Arithmetic with cdata double
will not work reliably, so
for Lua, it is better to use the number
type.
This warning does not apply for Tarantool/SQL because
Tarantool/SQL does
implicit casting.
The Tarantool number field may have both
integer and floating-point values, although in Lua a number
is a double-precision floating-point.
Tarantool will try to store a Lua number as
floating-point if the value contains a decimal point or is very large
(greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer.
To ensure that even very large numbers are stored as integers, use the
tonumber64 function, or the LL (Long Long) suffix,
or the ULL (Unsigned Long Long) suffix.
Here are examples of numbers using regular notation, exponential notation,
the ULL suffix and the tonumber64
function:
-55
, -2.7e+20
, 100000000000000ULL
, tonumber64('18446744073709551615')
.
You can also use the ffi
module to specify a C type to cast the number to.
In this case, the number will be stored as cdata.
The Tarantool decimal type is stored as a MsgPack ext (Extension). Values with the decimal type are not floating-point values although they may contain decimal points. They are exact with up to 38 digits of precision.
Example: a value returned by a function in the decimal module.
Introduced in v. 2.10.0.
The Tarantool datetime
type facilitates operations with date and time,
accounting for leap years or the varying number of days in a month.
It is stored as a MsgPack ext (Extension).
Operations with this data type use code from c-dt, a third-party library.
For more information, see Module datetime.
Since: v. 2.10.0
The Tarantool interval
type represents periods of time.
They can be added to or subtracted from datetime
values or each other.
Operations with this data type use code from c-dt, a third-party library.
The type is stored as a MsgPack ext (Extension).
For more information, see Module datetime.
A string is a variable-length sequence of bytes, usually represented with
alphanumeric characters inside single quotes. In both Lua and MsgPack, strings
are treated as binary data, with no attempts to determine a string’s
character set or to perform any string conversion – unless there is an optional
collation.
So, usually, string sorting and comparison are done byte-by-byte, without any special
collation rules applied.
For example, numbers are ordered by their point on the number line, so 2345 is
greater than 500; meanwhile, strings are ordered by the encoding of the first
byte, then the encoding of the second byte, and so on, so '2345'
is less than '500'
.
Example: 'A, B, C'
.
A bin (binary) value is not directly supported by Lua but there is
a Tarantool type varbinary
which is encoded as MsgPack binary.
For an (advanced) example showing how to insert varbinary into a database,
see the Cookbook Recipe for ffi_varbinary_insert.
Example: "\65 \66 \67"
.
The Tarantool uuid type is used for
Universally Unique Identifiers.
Since version 2.4.1 Tarantool stores
uuid
values as a MsgPack ext (Extension).
Example: 64d22e4d-ac92-4a23-899a-e5934af5479
.
An array is represented in Lua with {...}
(braces).
Examples: lists of numbers representing points in geometric figures:
{10, 11}
, {3, 5, 9, 10}
.
Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 are stored as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use box.NULL.
Example: a box.space.tester:select()
request will return a Lua table.
A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For tuple examples, see box.tuple.
Values in a scalar field can be boolean, integer, unsigned, double, number, decimal, string, uuid, or varbinary; but not array, map, or tuple.
Examples: true
, 1
, 'xxx'
.
Values in a field of this type can be boolean, integer, unsigned, double, number, decimal, string, uuid, varbinary, array, map, or tuple.
Examples: true
, 1
, 'xxx'
, {box.NULL, 0}
.
Examples of insert requests with different field types:
tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...
To learn more about what values can be stored in indexed fields, read the Indexes section.
By default, when Tarantool compares strings, it uses the so-called
binary collation.
It only considers the numeric value of each byte in a string.
For example, the encoding of 'A'
(what used to be called the “ASCII value”) is 65,
the encoding of 'B'
is 66, and the encoding of 'a'
is 98.
Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a'
.
Binary collation is the best choice for fast deterministic simple maintenance and searching with Tarantool indexes.
But if you want the ordering that you see in phone books and dictionaries,
then you need Tarantool’s optional collations, such as unicode
and
unicode_ci
, which allow for 'a' < 'A' < 'B'
and 'a' == 'A' < 'B'
respectively.
The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:
unicode
collation observes L1, L2, and L3 weights (strength = ‘tertiary’);unicode_ci
collation observes only L1 weights (strength = ‘primary’), so for example'a' == 'A' == 'á' == 'Á'
.
As an example, take some Russian words:
'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'
…and show the difference in ordering and selecting by index:
with
unicode
collation:tarantool> box.space.T:create_index('I', {parts = {{field = 1, type = 'str', collation='unicode'}}}) ... tarantool> box.space.T.index.I:select() --- - - ['ЕЛЕ'] - ['елейный'] - ['ёлка'] - ['еловый'] - ['елозить'] - ['Ёлочка'] - ['ёлочный'] - ['ель'] - ['ЕЛь'] ... tarantool> box.space.T.index.I:select{'ЁлКа'} --- - [] ...
with
unicode_ci
collation:tarantool> box.space.T:create_index('I', {parts = {{field = 1, type ='str', collation='unicode_ci'}}}) ... tarantool> box.space.T.index.I:select() --- - - ['ЕЛЕ'] - ['елейный'] - ['ёлка'] - ['еловый'] - ['елозить'] - ['Ёлочка'] - ['ёлочный'] - ['ЕЛь'] ... tarantool> box.space.T.index.I:select{'ЁлКа'} --- - - ['ёлка'] ...
In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.
For English, Russian, and most other languages and use cases, use the “unicode” and “unicode_ci” collations. If you need Cyrillic letters ‘Е’ and ‘Ё’ to have the same level-1 weights, try the Kyrgyz collation.
The tailored optional collations: for other languages, Tarantool supplies tailored collations for every
modern language that has more than a million native speakers, and
for specialized situations such as the difference between dictionary
order and telephone book order.
Run box.space._collation:select()
to see the complete list.
The tailored collation names have the form
unicode_[language code]_[strength]
, where language code is a standard
2-character or 3-character language abbreviation, and strength is s1
for “primary strength” (level-1 weights), s2
for “secondary”, s3
for “tertiary”.
Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of
Ubuntu and
Fedora.
Charts explaining the precise differences from DUCET order are
in the
Common Language Data Repository.
For better control over stored data, Tarantool supports constraints – user-defined limitations on the values of certain fields or entire tuples. Together with data types, constraints allow limiting the ranges of available field values both syntactically and semantically.
For example, the field age
typically has the number
type, so it cannot store
strings or boolean values. However, it can still have values that don’t make sense,
such as negative numbers. This is where constraints come to help.
There are two types of constraints in Tarantool:
- Field constraints check that the value being assigned to a field
satisfies a given condition. For example,
age
must be non-negative. - Tuple constraints check complex conditions that can involve all fields of
a tuple. For example, a tuple contains a date in three fields:
year
,month
, andday
. You can validateday
values based on themonth
value (and evenyear
if you consider leap years).
Field constraints work faster, while tuple constraints allow implementing a wider range of limitations.
Constraints use stored Lua functions or SQL expressions, which must return true
when the constraint
is satisfied. Other return values (including nil
) and exceptions make the
check fail and prevent tuple insertion or modification.
To create a constraint function, call box.schema.func.create() with the function definition specified in the body
attribute.
Constraint functions take two parameters:
The tuple and the constraint name for tuple constraints.
-- Define a tuple constraint function -- box.schema.func.create('check_person', { language = 'LUA', is_deterministic = true, body = 'function(t, c) return (t.age >= 0 and #(t.name) > 3) end' })
Warning
Tarantool doesn’t check field names used in tuple constraint functions. If a field referenced in a tuple constraint gets renamed, this constraint will break and prevent further insertions and modifications in the space.
The field value and the constraint name for field constraints.
-- Define a field constraint function -- box.schema.func.create('check_age', { language = 'LUA', is_deterministic = true, body = 'function(f, c) return (f >= 0 and f < 150) end' })
To create a constraint in a space, specify the corresponding function’s name
in the constraint
parameter:
Tuple constraints: when creating or altering a space.
-- Create a space with a tuple constraint -- customers = box.schema.space.create('customers', {constraint = 'check_person'})
Field constraints: when setting up the space format.
-- Specify format with a field constraint -- box.space.customers:format({ {name = 'id', type = 'number'}, {name = 'name', type = 'string'}, {name = 'age', type = 'number', constraint = 'check_age'}, })
In both cases, constraint
can contain multiple function names passed as a tuple.
Each constraint can have an optional name:
-- Create one more tuple constraint --
box.schema.func.create('another_constraint',
{language = 'LUA', is_deterministic = true, body = 'function(t, c) return true end'})
-- Set two constraints with optional names --
box.space.customers:alter{
constraint = { check1 = 'check_person', check2 = 'another_constraint'}
}
Note
When adding a constraint to an existing space with data, Tarantool checks it against the stored data. If there are fields or tuples that don’t satisfy the constraint, it won’t be applied to the space.
Foreign keys provide links between related fields, therefore maintaining the referential integrity of the database.
Fields can contain values that exist only in other fields. For example,
a shop order always belongs to a customer. Hence, all values of the customer
field of the orders
space must also exist in the id
field of the customers
space. In this case, customers
is a parent space for orders
(its child space).
When two spaces are linked with a foreign key, each time a tuple is inserted or
modified in the child space, Tarantool checks that a corresponding value is present
in the parent space.
Note
A foreign key can link a field to another field in the same space. In this case, the child field must be nullable. Otherwise, it is impossible to insert the first tuple in such a space because there is no parent tuple to which it can link.
There are two types of foreign keys in Tarantool:
- Field foreign keys check that the value being assigned to a field
is present in a particular field of another space. For example, the
customer
value in a tuple from theorders
space must match anid
stored in thecustomers
space. - Tuple foreign keys check that multiple fields of a tuple have a match in
another space. For example, if the
orders
space has fieldscustomer_id
andcustomer_name
, a tuple foreign key can check that thecustomers
space contains a tuple with both these values in the corresponding fields.
Field foreign keys work faster while tuple foreign keys allow implementing more strict references.
Important
For each foreign key, there must exist a parent space index that includes all its fields.
To create a foreign key in a space, specify the parent space and linked fields in the foreign_key
parameter.
Parent spaces can be referenced by name or by id. When linking to the same space, the space can be omitted.
Fields can be referenced by name or by number:
Field foreign keys: when setting up the space format.
-- Create a space with a field foreign key -- box.schema.space.create('orders') box.space.orders:format({ {name = 'id', type = 'number'}, {name = 'customer_id', foreign_key = {space = 'customers', field = 'id'}}, {name = 'price_total', type = 'number'}, })
Tuple foreign keys: when creating or altering a space. Note that for foreign keys with multiple fields there must exist an index that includes all these fields.
-- Create a space with a tuple foreign key -- box.schema.space.create("orders", { foreign_key = { space = 'customers', field = {customer_id = 'id', customer_name = 'name'} } }) box.space.orders:format({ {name = "id", type = "number"}, {name = "customer_id" }, {name = "customer_name"}, {name = "price_total", type = "number"}, })
Note
Type can be omitted for foreign key fields because it’s defined in the parent space.
Foreign keys can have an optional name.
-- Set a foreign key with an optional name -- box.space.orders:alter{ foreign_key = { customer = { space = 'customers', field = { customer_id = 'id', customer_name = 'name'} } } }
A space can have multiple tuple foreign keys. In this case, they all must have names.
-- Set two foreign keys: names are mandatory -- box.space.orders:alter{ foreign_key = { customer = { space = 'customers', field = {customer_id = 'id', customer_name = 'name'} }, item = { space = 'items', field = {item_id = 'id'} } } }
Tarantool performs integrity checks upon data modifications in parent spaces. If you try to remove a tuple referenced by a foreign key or an entire parent space, you will get an error.
Important
Renaming parent spaces or referenced fields may break the corresponding foreign keys and prevent further insertions or modifications in the child spaces.