Stores¶
Data Samples
are represented as semi-structured data, such as JSON documents, and may not have a fixed schema. Therefore, any document database capable of efficiently storing and, more importantly, querying this data format can be used as a Data Samples
store.
Of course, you need some way to insert these Data Samples
into the database, but since Data Samples
are encoded as OpenTelemetry (OTLP) logs, and use the OTLP/gRPC protocol for transport, any pipeline capable of ingesting OTLP logs can be used to store Data Samples
in a database.
You need to keep in ming account that once you have stored them in a database, you want to be able to explore them easily and efficiently. For example, you want to be able to perform searches using expressions that can target semi-structured nested objects without a fixed schema (for example, if your Data Sample
has these contents: {id: "1", product_name: "T-Shirt", price: -10 }
, you should be able to create an expression that targets this data similar to sample.price < 0
). You may think that this is pretty common, but in practice, not many open-source databases allow you to easily explore data in this way.
Therefore, this page shows a list of databases and UIs that provide a good Neblic experience.
Grafana Loki¶
Grafana Loki, quoting their website, is a log aggregation system designed to store and query logs from all your applications and infrastructure
. Data Samples
are similar to logs so Loki's
architecture and design fit well with Neblic goals.
It is the primary open-source option for storing Data Samples
due to its ease of deployment and maintenance. In also integrates perfectly with Grafana, making it easy and efficient to explore data samples.
Concepts¶
Note
For more details in how Grafana Loki
works, refer to its documentation. This page only describes some concepts relevant to how Neblic uses it.
Labels¶
Labels are one of the most important concepts of Grafana Loki
. See its documentation for more details. To simplify, you can think of labels as Loki's equivalent to indexes. To use Neblic efficiently, it is recommended to set the Sampler
name and resource (which uniquely identifies a set of samplers) as labels. This way, queries that span one or more Samplers
are performed fast and efficiently. The default configuration using the Neblic collector stores Data Samples
this way.
Configuration¶
No special configuration is required to use Grafana Loki
as a Neblic store, other than making sure that the correct labels are set when indexing Data Samples
.
Visualization¶
To explore Data Samples
stored in Grafana Loki
, Grafana
is the best option.
First, you need to add Grafana Loki
as a Data Source
, the official documentation provides a guide.
Then, you can use the Explore
tab to perform queries and explore your Data Samples
. You will need to:
- Select
Loki
on the top-level dropdown to explore its contents. - Apply a
label filter
selecting one or moresampler_name
labels to filter whatSamplers
you want to explore. - Optionally, add a
json
filter so it parses the contents of theData Samples
.
There are two ways to filter Data Samples
, line and label filter expressions. A line filter expression
, as described in Loki's
documentation, is a distributed grep over the entire Data Sample
body, decoded as a JSON document. It supports regular expressions.
You can also use a label filter expression
when adding a json
filter. This filter will make Grafana Loki
parse the Data Sample
body and create a label for each field. Then, you can use predicates to filter them. See the documentation for more details.
This image shows the labels that get created when you use the json
filter to parse the Data Sample
. Given that Data Samples
are encoded as OpenTelemetry logs, the contents of the Data Sample
are in the body
key. This makes all labels to be prepended by body_
.