Tag cardinality limit
Limit the cardinality of tags on metrics events as a safeguard against cardinality explosion
Configuration
Example configurations
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"limit_exceeded_action": "drop_tag",
"mode": "exact",
"value_limit": 500
}
}
}
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
limit_exceeded_action = "drop_tag"
mode = "exact"
value_limit = 500
---
transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
limit_exceeded_action: drop_tag
mode: exact
value_limit: 500
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"cache_size_per_tag": 5120000,
"limit_exceeded_action": "drop_tag",
"mode": "exact",
"value_limit": 500
}
}
}
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
cache_size_per_tag = 5_120_000
limit_exceeded_action = "drop_tag"
mode = "exact"
value_limit = 500
---
transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
cache_size_per_tag: 5120000
limit_exceeded_action: drop_tag
mode: exact
value_limit: 500
cache_size_per_tag
optional uint5.12e+06
(bytes)mode = "probabilistic"
inputs
required [string]A list of upstream source or transform
IDs. Wildcards (*
) are supported.
See configuration for more info.
limit_exceeded_action
common optional string literal enumOption | Description |
---|---|
drop_event | Drop any metric events that contain tags that would exceed the configured limit |
drop_tag | Remove tags that would exceed the configured limit from the incoming metric |
drop_tag
mode
required string literal enumOption | Description |
---|---|
exact | Has higher memory requirements than probabilistic , but never falsely outputs metrics with new tags after the limit has been hit. |
probabilistic | Has lower memory requirements than exact , but may occasionally allow metric events to pass through the transform even when they contain new tags that exceed the configured limit. The rate at which this happens can be controlled by changing the value of cache_size_per_tag . |
Outputs
<component_id>
Telemetry
Metrics
linkcomponent_received_event_bytes_total
countercomponent_id
instead. The value is the same as component_id
.component_received_events_count
histogramcomponent_id
instead. The value is the same as component_id
.component_received_events_total
countercomponent_id
instead. The value is the same as component_id
.component_sent_event_bytes_total
countercomponent_id
instead. The value is the same as component_id
.component_sent_events_total
countercomponent_id
instead. The value is the same as component_id
.events_in_total
countercomponent_received_events_total
instead.component_id
instead. The value is the same as component_id
.events_out_total
countercomponent_sent_events_total
instead.component_id
instead. The value is the same as component_id
.processed_bytes_total
countercomponent_id
instead. The value is the same as component_id
.processed_events_total
countercomponent_received_events_total
and
component_sent_events_total
metrics.component_id
instead. The value is the same as component_id
.tag_value_limit_exceeded_total
countervalue_limit
.component_id
instead. The value is the same as component_id
.utilization
gaugecomponent_id
instead. The value is the same as component_id
.value_limit_reached_total
countercomponent_id
instead. The value is the same as component_id
.Examples
Drop high-cardinality tag
Given this event...[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_2"}}}]
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
[transforms.my_transform_id.fields]
value_limit = 1
limit_exceeded_action = "drop_tag"
---
transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
fields:
value_limit: 1
limit_exceeded_action: drop_tag
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"fields": {
"value_limit": 1,
"limit_exceeded_action": "drop_tag"
}
}
}
}
[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{}}}]
How it works
Intended Usage
request_id
tag. When this is happens, it is recommended to fix the upstream error as soon
as possible. This is because Vector’s cardinality cache is held in memory and it
will be erased when Vector is restarted. This will cause new tag values to pass
through until the cardinality limit is reached again. For normal usage this
should not be a common problem since Vector processes are normally long-lived.Failed Parsing
This transform stores in memory a copy of the key for every tag on every metric
event seen by this transform. In mode exact
, a copy of every distinct
value for each key is also kept in memory, until value_limit
distinct values
have been seen for a given key, at which point new values for that key will be
rejected. So to estimate the memory usage of this transform in mode exact
you can use the following formula:
(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
your metrics * `value_limit` * average length of the values of tags for your
metrics)
In mode probabilistic
, rather than storing all values seen for each key, each
distinct key has a bloom filter which can probabilistically determine whether
a given value has been seen for that key. The formula for estimating memory
usage in mode probabilistic
is:
(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
-your metrics * `cache_size_per_tag`)
The cache_size_per_tag
option controls the size of the bloom filter used
for storing the set of acceptable values for any single key. The larger the
bloom filter the lower the false positive rate, which in our case means the less
likely we are to allow a new tag value that would otherwise violate a
configured limit. If you want to know the exact false positive rate for a given
cache_size_per_tag
and value_limit
, there are many free on-line bloom filter
calculators that can answer this. The formula is generally presented in terms of
‘n’, ‘p’, ‘k’, and ’m' where ‘n’ is the number of items in the filter
(value_limit
in our case), ‘p’ is the probability of false positives (what we
want to solve for), ‘k’ is the number of hash functions used internally, and ’m'
is the number of bits in the bloom filter. You should be able to provide values
for just ‘n’ and ’m' and get back the value for ‘p’ with an optimal ‘k’ selected
for you. Remember when converting from value_limit
to the ’m' value to plug
into the calculator that value_limit
is in bytes, and ’m' is often presented
in bits (1/8 of a byte).