file source

Ingests data through one or more local files and outputs `log` events.

The file source is in beta. Please see the current enhancements and bugs for known issues. We kindly ask that you add any missing issues as it will help shape the roadmap of this component.

The file source ingests data through one or more local files and outputs log events.

Config File

vector.toml (simple)
vector.toml (advanced)
[sources.my_source_id]
type = "file" # must be: "file"
include = ["/var/log/nginx/*.log"]
# For a complete list of options see the "advanced" tab above.

Examples

Given the following input:

/var/log/rails.log
2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1

A log event will be emitted with the following structure:

log
{
"timestamp": <timestamp> # current time,
"message": "2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1",
"file": "/var/log/rails.log", # original file
"host": "10.2.22.122" # current nostname
}

The "timestamp", "file", and "host" keys were automatically added as context. You can further parse the "message" key with a transform, such as the regex_parser transform.

How It Works

Auto Discovery

Vector will continually look for new files matching any of your include patterns. The frequency is controlled via the glob_minimum_cooldown option. If a new file is added that matches any of the supplied patterns, Vector will begin tailing it. Vector maintains a unique list of files and will not tail a file more than once, even if it matches multiple patterns. You can read more about how we identify files in the Identification section.

Checkpointing

Vector checkpoints the current read position in the file after each successful read. This ensures that Vector resumes where it left off if restarted, preventing data from being read twice. The checkpoint positions are stored in the data directory which is specified via the global data_dir option but can be overridden via the data_dir option in the file sink directly.

Context

By default, the file source will add context keys to your events via the file_key and host_key options.

Delivery Guarantee

Due to the nature of this component, it offers a best effort delivery guarantee.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

File Deletion

When a watched file is deleted, Vector will maintain its open file handle and continue reading until it reaches EOF. When a file is no longer findable in the includes glob and the reader has reached EOF, that file's reader is discarded.

File Identification

By default, Vector identifies files by creating a cyclic redundancy check (CRC) on the first 256 bytes of the file. This serves as a fingerprint to uniquely identify the file. The amount of bytes read can be controlled via the fingerprint_bytes and ignored_header_bytes options.

This strategy avoids the common pitfalls of using device and inode names since inode names can be reused across files. This enables Vector to properly tail files across various rotation strategies.

File Read Order

By default, Vector attempts to allocate its read bandwidth fairly across all of the files it's currently watching. This prevents a single very busy file from starving other independent files from being read. In certain situations, however, this can lead to interleaved reads from files that should be read one after the other.

For example, consider a service that logs to timestamped file, creating a new one at an interval and leaving the old one as-is. Under normal operation, Vector would follow writes as they happen to each file and there would be no interleaving. In an overload situation, however, Vector may pick up and begin tailing newer files before catching up to the latest writes from older files. This would cause writes from a single logical log stream to be interleaved in time and potentially slow down ingestion as a whole, since the fixed total read bandwidth is allocated across an increasing number of files.

To address this type of situation, Vector provides the oldest_first flag. When set, Vector will not read from any file younger than the oldest file that it hasn't yet caught up to. In other words, Vector will continue reading from older files as long as there is more data to read. Only once it hits the end will it then move on to read from younger files.

Whether or not to use the oldest_first flag depends on the organization of the logs you're configuring Vector to tail. If your include glob contains multiple independent logical log streams (e.g. nginx's access.log and error.log, or logs from multiple services), you are likely better off with the default behavior. If you're dealing with a single logical log stream or if you value per-stream ordering over fairness across streams, consider setting oldest_first to true.

File Rotation

Vector supports tailing across a number of file rotation strategies. The default behavior of logrotate is simply to move the old log file and create a new one. This requires no special configuration of Vector, as it will maintain its open file handle to the rotated log until it has finished reading and it will find the newly created file normally.

A popular alternative strategy is copytruncate, in which logrotate will copy the old log file to a new location before truncating the original. Vector will also handle this well out of the box, but there are a couple configuration options that will help reduce the very small chance of missed data in some edge cases. We recommend a combination of delaycompress (if applicable) on the logrotate side and including the first rotated file in Vector's include option. This allows Vector to find the file after rotation, read it uncompressed to identify it, and then ensure it has all of the data, including any written in a gap between Vector's last read and the actual rotation event.

Globbing

Globbing is supported in all provided file paths, files will be autodiscovered continually at a rate defined by the glob_minimum_cooldown option.

Line Delimiters

Each line is read until a new line delimiter (the 0xA byte) or EOF is found.

Read Position

By default, Vector will read new data only for newly discovered files, similar to the tail command. You can read from the beginning of the file by setting the start_at_beginning option to true.

Previously discovered files will be checkpointed, and the read position will resume from the last checkpoint.

Troubleshooting

The best place to start with troubleshooting is to check the Vector logs. This is typically located at /var/log/vector.log, then proceed to follow the Troubleshooting Guide.

If the Troubleshooting Guide does not resolve your issue, please:

  1. If encountered a bug, please file a bug report.

  2. If encountered a missing feature, please file a feature request.

  3. If you need help, join our chat/forum community. You can post a question and search previous questions.

Resources