When Vector serves as an agent, its purpose is to efficiently and quietly collect data. In this scenario, Vector is typically sharing a host with a more important service. Therefore, it is critically important that Vector is a good citizen, limiting its resource usage and efficiently forwarding data.
If you are not deploying Vector in a platform context, then data collection must be achieved through more generic means, such as journald, a file, or stdin. The method you use depends on your setup. In general, we recommend avoiding stdin unless reducing disk usage is top priority. Stdin is limiting, in that it is coupled with a single input stream, meaning you cannot restart Vector independently, and you cannot accept data from multiple streams at the same time. This makes it more difficult to manage.
If you're forwarding data to a downstream Vector service, then you should use the
vector sink. The downstream Vector service should use the
vector source. This handles communication between Vector instances.
If you are not forwarding data to a downstream Vector service then you can choose any sink you'd like, but be cognizant of how many sinks you're using, as more sinks usually means more resource usage. If you find that Vector is hogging resources, then you should provision additional resources, or consider a centralized or stream-based topology to push resource usage downstream.
There is nothing that prevents you from sending data to both a downstream Vector service and another independent service. Often times this makes the most sense if the downstream service is designed for streaming, as it takes load off of the downstream Vector service. For example, you might want to take advantage of Big Query's streaming inserts. This feature is designed for rapid streaming and it has the added benefit of making data quickly available for querying. To implement this, you can forgo using a centralized Vector service entirely and stream data directly from your client nodes.
Vector is designed to be highly efficient, but this does not preclude Vector from consuming an excess amount of resources in certain scenarios. This is not ideal as an agent where priority should be given to the primary service on the host. Therefore, it is recommended to limit Vector resource usage. We strongly believe resource limiting should be achieved at higher levels, and depending on your platform this can be achieved through a variety of means. For example:
If none of the above links are relevant, please refer to your platform's documentation on limiting resources.
In general, we recommend the following minimum limitations:
CPU >= 5%
Memory >= 256mb
A good resource manager will allow Vector to use more CPU and memory when available. You should not have to provide any additional limitations.
Vector can be reloaded to apply configuration changes. This is the recommended strategy and should be used instead of restarting whenever possible.
To update Vector you'll need to restart the process. This is why we recommend using sources that use the disk as a buffer, as it decouples the Vector process from the source process. This allows other processes to continue running while Vector restarts. When Vector is restarted, it can resume where it left off.