CrateDB¶

About¶

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Details¶

This section summarizes CrateDB’s data model and query interface.

Data model¶

As a time-series/document/OLAP/RDBMS database with an SQL interface, CrateDB stores records into tables. Tables are grouped into schemas, which is equivalent to the concept of hosting multiple databases on the same server instance.

The schema of tables/records can be freely defined using a classic SQL DDL statement, leveraging CrateDB’s multi-modal data types. The tables can be queried also by using classic, standards-compliant SQL DQL statements.

A screenshot example of an SQL query submitted to CrateDB.

Other than the record-based scheme of RDBMS databases, CrateDB also allows you to store and retrieve nested data, by providing container types ARRAY and OBJECT, effectively providing document-oriented capabilities like CouchDB or MongoDB.

On disk, CrateDB stores data into Lucene indexes. By default, all fields are indexed, nested or not, but the indexing can be turned off selectively.

Query interface¶

Languages¶

CrateDB supports SQL as query language. Please inspect the Query examples, as well as the corresponding upstream documentation about how to insert data and query data.

Protocols¶

CrateDB clients communicate to servers using either HTTP, or by using the PostgreSQL wire protocol, version 3.

Key features¶

This section enumerates the key features of CrateDB, as advertised on its documentation.

At a glance¶

Use standard SQL via the PostgreSQL wire protocol or an HTTP API.
Dynamic table schemas and queryable objects provide document-oriented features in addition to the relational features of SQL.
Support for time-series data, realtime full-text search, geospatial data types and search capabilities.
Horizontally scalable, highly available and fault tolerant clusters that run very well in virtualized and containerised environments.
Extremely fast distributed query execution.
Auto-partitioning, auto-sharding, and auto-replication.
Self-healing and auto-rebalancing.
User-defined functions (UDFs) can be used to extend the functionality of CrateDB.

Storage layer¶

Lucene¶

The CrateDB storage layer is based on Lucene. This section enumerates some concepts of Lucene, and the article Indexing and Storage in CrateDB goes into more details by exploring its internal workings.

Lucene offers scalable and high-performance indexing which enables efficient search and aggregations over documents and rapid updates to the existing documents. Solr and Elasticsearch are building upon the same technologies.

Documents

A single record in Lucene is called “document”, which is a unit of information for search and indexing that contains a set of fields, where each field has a name and value. A Lucene index can store an arbitrary number of documents, with an arbitrary number of different fields.
Append-only segments

A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment, it is immutable, and built from a set of documents. When new documents are added to the existing index, they are added to the next segment, while previous segments are never modified. If the number of segments becomes too large, the system may decide to merge some segments and discard the freed ones. This way, adding a new document does not require rebuilding the whole index structure completely.
Column store

For text values, other than storing the row data as-is (and indexing each value by default), each value term is stored into a column-based store by default, which offers performance improvements for global aggregations and groupings, and enables efficient ordering, because the data for one column is packed at one place.

In CrateDB, the column store is enabled by default and can be disabled only for text fields, not for other primitive types. Furthermore, CrateDB does not support storing values for container and geospatial types in the column store.

Data structures¶

This section enumerates the three main Lucene data structures that are used within CrateDB: Inverted indexes for text values, BKD trees for numeric values, and DocValues.

Inverted index

The Lucene indexing strategy for text fields relies on a data structure called inverted index, which is defined as a “data structure storing a mapping from content, such as words and numbers, to its location in the database file, document or set of documents”.

Depending on the configuration of a column, the index can be plain (default) or full-text. An index of type “plain” indexes content of one or more fields without analyzing and tokenizing their values into terms. To create a “full-text” index, the field value is first analyzed and based on the used analyzer, split into smaller units, such as individual words. A full-text index is then created for each text unit separately.

The inverted index enables a very efficient search over textual data.
BKD tree

To optimize numeric range queries, Lucene uses an implementation of the Block KD (BKD) tree data structure. The BKD tree index structure is suitable for indexing large multi-dimensional point data sets. It is an I/O-efficient dynamic data structure based on the KD tree. Contrary to its predecessors, the BKD tree maintains its high space utilization and excellent query and update performance regardless of the number of updates performed on it.

Numeric range queries based on BKD trees can efficiently search numerical fields, including fields defined as TIMESTAMP types, supporting performant date range queries.
DocValues

Because Lucene’s inverted index data structure implementation is not optimal for finding field values by given document identifier, and for performing column-oriented retrieval of data, the DocValues data structure is used for those purposes instead.

DocValues is a column-based data storage built at document index time. They store all field values that are not analyzed as strings in a compact column, making it more effective for sorting and aggregations.

Clustering¶

Overview¶

CrateDB splits tables into shards and replicas, meaning that tables are divided and distributed across the nodes of a cluster. Each shard in CrateDB is a Lucene index broken into segments and stored on the filesystem.

CrateDB has been designed with clustering capabilities from the very beginning. The clustering subsystem, effectively and efficiently distributing data amongst multiple storage nodes, is originally based on prior art technology implementations from Elasticsearch, in turn based on both quorum-based consensus algorithms as well as primary-backup approaches.

Benefits¶

Database clusters are effective for storing and retrieving large amounts of data, in the range of billions of records, and terabytes of data.

On data retrieval, CrateDB’s distributed query execution engine parallelizes query workloads across the whole cluster.

By distributing data to multiple machines, and properly configuring replication parameters, you are also adding redundancy to your data, so it is protected against data-loss resulting from fatal failures of individual storage nodes.

Those concepts implement similar features like RAID drives, for the purposes of data redundancy, performance improvements, or both.

Early distributed systems and databases needed manual operations procedures, for example to initiate node fail-over procedures. With CrateDB, the corresponding steps around partitioning, sharding, replication, and rebalancing, are performed unattended and automatically, effectively providing cluster self-healing capabilities.

Complex queries¶

By using the SQL query language, CrateDB provides an advanced query execution layer, unlocking complex querying capabilities like date range filtering, sub-selects, aggregations, JOINs, UNIONs, and CTEs, all within the same SQL statement.

Query API¶

CrateDB provides an HTTP endpoint that can be used to submit SQL queries. As such, any HTTP client, like curl or Postman, can be used to communicate with CrateDB.
The standards-based PostgreSQL wire protocol interface unlocks compatibility with a wide range of client applications which can talk to PostgreSQL servers.

Ecosystem¶

CrateDB offers a wide range of connectivity options. In general, any PostgreSQL- compatible driver or framework can be used to connect to CrateDB.

User interface¶

CrateDB offers both a graphical-, and a commandline-based user interface. In general, any PostgreSQL-compatible applications and systems can be used to connect to CrateDB.

CrateDB Admin, a graphical, web-based user interface, is built into CrateDB.
crash is a command-line based terminal program, similar to psql, but with a bit more convenience.

Query examples¶

This section demonstrates a few query examples from CrateDB’s documentation.

Typical queries¶

-- Aggregations with date range filtering and
-- time bucketing using specified intervals.
SELECT
    DATE_BIN('5 min'::INTERVAL, time::TIMESTAMPTZ, 0) AS time,
    MEAN(fields['humidity']) AS humidity
FROM
    readings
WHERE
    time > NOW() - '1 hour'::INTERVAL
GROUP BY
    time
ORDER BY
    time;

-- An SQL DDL statement defining a custom schema for holding sensor data.
CREATE TABLE iot_data (
  timestamp TIMESTAMP WITH TIME ZONE,
  sensor_data OBJECT (DYNAMIC) AS (
    temperature FLOAT,
    humidity FLOAT,
    location OBJECT (DYNAMIC) AS (
      latitude DOUBLE PRECISION, longitude DOUBLE PRECISION
    )
  )
);

-- Inserting data into the table defined above.
INSERT INTO iot_data (ts, sensor_data) VALUES
    -- Vienna
    ('2022-01-01T01:00:00', '{"temperature": 20.3, "humidity": 50.5, "location": {"latitude": 48.2082, "longitude": 16.3738}}'),
    -- Stockholm
    ('2022-01-01T02:00:00', '{"temperature": 18.0, "humidity": 55.2, "location": {"latitude": 59.3293, "longitude": 18.0686}}'),
    -- Tokyo
    ('2022-01-01T03:00:00', '{"temperature": 24.5, "humidity": 60.8, "location": {"latitude": 35.6895, "longitude": 139.6917}}'),
    -- Sydney
    ('2022-01-01T04:00:00', '{"temperature": 25.7, "humidity": 65.0, "location": {"latitude": -33.8688, "longitude": 151.2093}}');

-- Create a user-defined function to calculate the distance between two coordinates.
CREATE FUNCTION haversine_distance(
  lat1 DOUBLE PRECISION, lon1 DOUBLE PRECISION,
  lat2 DOUBLE PRECISION, lon2 DOUBLE PRECISION
) RETURNS DOUBLE PRECISION LANGUAGE JAVASCRIPT AS '...';

-- Use the user-defined function with nested data.
SELECT
  id,
  haversine_distance(
    sensor_data[ 'location' ][ 'latitude' ],
    sensor_data[ 'location' ][ 'longitude' ],
    40.7128, -74.0060
  ) AS distance
FROM
  iot_data
ORDER BY
  distance;

Advanced queries¶

Time-series data¶

/**
 * Based on device data, this query returns the average
 * of the battery level for every hour for each `device_id`.
**/

WITH avg_metrics AS (
    SELECT device_id,
        DATE_BIN('1 hour'::INTERVAL, time, 0) AS period,
        AVG(battery_level) AS avg_battery_level
    FROM devices.readings
    GROUP BY 1, 2
    ORDER BY 1, 2
)
SELECT period,
    t.device_id,
    manufacturer,
    avg_battery_level
FROM avg_metrics t, devices.info i
WHERE t.device_id = i.device_id
    AND model = 'mustang'
LIMIT 10;

IoT & sensor data¶

/**
 * Based on data acquisition from power metering devices, this query
 * returns the voltage corresponding to the maximum global active power
 * for each `meter_id`.
**/

SELECT meter_id,
    MAX_BY("Voltage", "Global_active_power") AS voltage_max_global_power
FROM iot.power_consumption
GROUP BY 1
LIMIT 10;

Geospatial tracking¶

/**
 * Based on the location of the International Space Station,
 * this query returns the 10 closest capital cities from
 * the last known position.
**/

SELECT city AS "City Name",
    country AS "Country",
    DISTANCE(i.position, c.location)::LONG / 1000 AS "Distance [km]"
FROM demo.iss i
CROSS JOIN demo.world_cities c
WHERE capital = 'primary'
    AND ts = (SELECT MAX(ts) FROM demo.iss)
ORDER BY 3 ASC
LIMIT 10;

Log analysis¶

/**
 * Based on system event logs, this query calculates:
 * - a filter for specific messages using a full-text index
 * - the number of entries per minute
 * - the average scoring ratio for each matched row
**/

SELECT DATE_TRUNC('minute', receivedat) AS event_time,
    COUNT(*) AS entries,
    AVG(_score) AS avg_score
FROM "syslog"."systemevents"
WHERE MATCH(message, 'authentication failure')
USING most_fields WITH (analyzer = 'whitespace')
    AND MATCH(syslogtag, 'sshd')
GROUP BY 1
ORDER BY 1 DESC
LIMIT 10;

Tracking analytics¶

This complex query executes in under 200 milliseconds on two tables containing 6 million records (pageview), respectively 35_000 records (user_session).

/**
 * An analytics query about user visits and metrics.
 * This SQL DQL statement uses date range filtering,
 * sub-selects, aggregations, JOINs, UNIONs, and CTEs.
**/

WITH sessions AS (
  SELECT
    user_id,
    session_id
  FROM
    af_dev.user_session
  WHERE
    user_session.domain = 'domain.com'
    AND user_session.hostname = 'www.domain.com'
    AND user_session.event_time BETWEEN '2022-12-05' AND '2023-01-05'
    AND user_session.device_type IS NOT NULL
),
pageviews AS (
  SELECT
    pageview.totaltime,
    pageview.user_id,
    pageview.event_time
  FROM
    af_dev.pageview
  WHERE
    pageview.event_time BETWEEN '2022-12-05' AND '2023-01-05'
    AND pageview.domain = 'domain.com'
    AND pageview.host = 'www.domain.com'
),
visits AS (
  SELECT
    MAX(totaltime) AS sess_len,
    COUNT(session_id) AS sess_count,
    COUNT(DISTINCT sessions.user_id) AS visitors,
    MIN(event_time) AS event_time
  FROM
    pageviews
    JOIN sessions ON pageviews.user_id = sessions.user_id
  GROUP BY
    pageviews.user_id,
    sessions.session_id
),
psessions AS (
  SELECT
    user_id,
    session_id
  FROM
    af_dev.user_session
  WHERE
    user_session.domain = 'domain.com'
    AND user_session.hostname = 'www.domain.com'
    AND user_session.event_time BETWEEN '2022-11-05' AND '2022-12-05'
    AND user_session.device_type IS NOT NULL
),
ppageviews AS (
  SELECT
    pageview.totaltime,
    pageview.user_id,
    pageview.event_time
  FROM
    af_dev.pageview
  WHERE
    pageview.event_time BETWEEN '2022-11-05' AND '2022-12-05'
    AND pageview.domain = 'domain.com'
    AND pageview.host = 'www.domain.com'
),
pvisits AS (
  SELECT
    MAX(totaltime) AS sess_len,
    COUNT(session_id) AS sess_count,
    COUNT(DISTINCT psessions.user_id) AS visitors,
    MIN(event_time) AS event_time
  FROM
    ppageviews
    JOIN psessions ON ppageviews.user_id = psessions.user_id
  GROUP BY
    ppageviews.user_id,
    psessions.session_id
)
SELECT
      MIN(event_time) AS event_date,
      SUM(visitors) AS tot_vis,
      SUM(visitors) FILTER (WHERE sess_count = 1) AS new_vis,
      SUM(visitors) FILTER (WHERE sess_count > 1) AS ret_vis,
      AVG(sess_len) AS tot_avg,
      AVG(sess_len) FILTER (WHERE sess_count = 1) AS new_avg,
      AVG(sess_len) FILTER (WHERE sess_count > 1) AS ret_avg
FROM visits
UNION
SELECT
      MIN(event_time) AS event_date,
      SUM(visitors) AS tot_vis,
      SUM(visitors) FILTER (WHERE sess_count = 1) AS new_vis,
      SUM(visitors) FILTER (WHERE sess_count > 1) AS ret_vis,
      AVG(sess_len) AS tot_avg,
      AVG(sess_len) FILTER (WHERE sess_count = 1) AS new_avg,
      AVG(sess_len) FILTER (WHERE sess_count > 1) AS ret_avg
FROM pvisits;

Usage¶

Purpose¶

Kotori uses CrateDB to store timeseries-data of data acquisition channels.

Documentation¶

See CrateDB handbook and the CrateDB reference documentation.

Compatibility¶

Kotori supports data acquisition and export with CrateDB 4.2 and higher.