A Helm chart for Running VMCluster on Multiple Availability Zones
Install the follow packages: git
, kubectl
, helm
, helm-docs
. See this tutorial.
PV support on underlying infrastructure.
Multiple availability zones.
This chart sets up multiple VictoriaMetrics instances(cluster or single node version, using cluster by default) on multiple availability zones, provides both global write and read entrypoints.
The default setup is as shown below:
For write:
vmagent
with least_loaded
policy..Values.availabilityZones[*].write.allow
, and buffer data on disk when zone is unavailable to ingest.least_loaded
policy.For read:
least_loaded
policy.vmauth-read-balancer
as servers if zone has .Values.availabilityZones[*].read.allow
enabled, always prefer “local” vmauth-read-balancer
to reduce cross-zone traffic with first_available
policy.vmauth-read-proxy
with first_available
policy.vmauth-global-read
as default datasource.Note: As the topology shown above, this chart doesn’t include components like vmalert, alertmanager, etc by default. You can install them using dependency victoria-metrics-k8s-stack or having separate release.
victoria-metrics-distributed
chart?One of the best practice of running production kubernetes cluster is running with multiple availability zones. And apart from kubernetes control plane components, we also want to spread our application pods on multiple zones, to continue serving even if zone outage happens.
VictoriaMetrics supports data replication natively which can guarantees data availability when part of the vmstorage instances failed. But it doesn’t works well if vmstorage instances are spread on multiple availability zones, since data replication could be stored on single availability zone, which will be lost when zone outage happens. To avoid this, vmcluster must be installed on multiple availability zones, each containing a 100% copy of data. As long as one zone is available, both global write and read entrypoints should work without interruption.
The chart provides vmauth-global-write
as global write entrypoint, it supports push-based data ingestion protocols as VictoriaMetrics does.
Optionally, you can push data to any of the per-zone vmagents, and they will replicate the received data across zones.
The chart provides vmauth-global-read
as global read entrypoint, it picks the first available zone (see first_available policy) as it’s preferred datasource and switches automatically to next zone if first one is unavailable, check vmauth first_available
for more details.
If you have services like vmalert or Grafana deployed in each zone, then configure them to use local vmauth-read-proxy
. Per-zone vmauth-read-proxy
always prefers “local” vmcluster for querying and reduces cross-zone traffic.
You can also pick other proxies like kubernetes service which supports Topology Aware Routing as global read entrypoint.
If availability zone zone-eu-1
is experiencing an outage, vmauth-global-write
and vmauth-global-read
will work without interruption:
vmauth-global-write
stops proxying write requests to zone-eu-1
automatically;vmauth-global-read
and vmauth-read-proxy
stops proxying read requests to zone-eu-1
automatically;vmagent
on zone-us-1
fails to send data to zone-eu-1.vmauth-write-balancer
, starts to buffer data on disk(unless -remoteWrite.disableOnDiskQueue
is specified, which is not recommended for this topology);
To keep data completeness for all the availability zones, make sure you have enough disk space on vmagent for buffer, see this doc for size recommendation.And to avoid getting incomplete responses from zone-eu-1
which gets recovered from outage, check vmagent on zone-us-1
to see if persistent queue has been drained. If not, remove zone-eu-1
from serving query by setting .Values.availabilityZones.{zone-eu-1}.read.allow=false
and change it back after confirm all data are restored.
By default, all the data that written to vmauth-global-write
belong to tenant 0
. To write data to different tenants, set .Values.enableMultitenancy=true
and create new tenant users for vmauth-global-write
.
For example, writing data to tenant 1088
with following steps:
vmauth-global-write
to use:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
name: tenant-1088-rw
labels:
tenant-test: “true”
spec:
targetRefs:
Add extra VMUser selector in vmauth vmauth-global-write
spec:
userSelector:
matchLabels:
tenant-test: "true"
vmauth-global-write
using above token.
Example command using vmagent:
/path/to/vmagent -remoteWrite.url=http://vmauth-vmauth-global-write-$ReleaseName-vm-distributed:8427/prometheus/api/v1/write -remoteWrite.basicAuth.username=tenant-1088 -remoteWrite.basicAuth.password=secret
Access a Kubernetes cluster.
Add a chart helm repository with follow commands:
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update
List versions of vm/victoria-metrics-distributed
chart available to installation:
helm search repo vm/victoria-metrics-distributed -l
victoria-metrics-distributed
chartExport default values of victoria-metrics-distributed
chart to file values.yaml
:
For HTTPS repository
helm show values vm/victoria-metrics-distributed > values.yaml
For OCI repository
helm show values oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed > values.yaml
Change the values according to the need of the environment in values.yaml
file.
Test the installation with command:
For HTTPS repository
helm install vmd vm/victoria-metrics-distributed -f values.yaml -n NAMESPACE --debug --dry-run
For OCI repository
helm install vmd oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed -f values.yaml -n NAMESPACE --debug --dry-run
Install chart with command:
For HTTPS repository
helm install vmd vm/victoria-metrics-distributed -f values.yaml -n NAMESPACE
For OCI repository
helm install vmd oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed -f values.yaml -n NAMESPACE
Get the pods lists by running this commands:
kubectl get pods -A | grep 'vmd'
Get the application by running this command:
helm list -f vmd -n NAMESPACE
See the history of versions of vmd
application with command.
helm history vmd -n NAMESPACE
In order to serving query and ingestion while upgrading components version or changing configurations, it’s recommended to perform maintenance on availability zone one by one.
First, performing update on availability zone zone-eu-1
:
zone-eu-1
from serving query by setting .Values.availabilityZones.{zone-eu-1}.read.allow=false
;helm upgrade vm-dis -n NAMESPACE
with updated configurations for zone-eu-1
in values.yaml
;zone-eu-1
running;zone-us-1
vmagent persistent queue for zone-eu-1
been drained, add zone-eu-1
back to serving query by setting .Values.availabilityZones.{zone-eu-1}.read.allow=true
.Then, perform update on availability zone zone-us-1
with the same steps1~4.
Introduction of VMCluster’s requestsLoadBalancer
allowed to simplify distributed chart setup by removing VMAuth CRs for read and write load balancing. Some parameters are not needed anymore:
availabilityZones[*].write.vmauth
availabilityZones[*].read.perZone.vmauth
zoneTpl.write.vmauth
zoneTpl.read.perZone.vmauth
zoneTpl.read.crossZone.vmauth
to zoneTpl.read.vmauth
availabilityZones[*].read.perZone.vmauth
to availabilityZones[*].read.vmauth
This release was refactored, names of the parameters was changed:
vmauthIngestGlobal
was changed to write.global.vmauth
vmauthQueryGlobal
was changed to read.global.vmauth
availabilityZones[*].allowIngest
was changed to availabilityZones[*].write.allow
availabilityZones[*].allowRead
was changed to availabilityZones[*].read.allow
availabilityZones[*].nodeSelector
was moved to availabilityZones[*].common.spec.nodeSelector
availabilityZones[*].extraAffinity
was moved to availabilityZones[*].common.spec.affinity
availabilityZones[*].topologySpreadConstraints
was moved to availabilityZones[*].common.spec.topologySpreadConstraints
availabilityZones[*].vmauthIngest
was moved to availabilityZones[*].write.vmauth
availabilityZones[*].vmauthQueryPerZone
was moved to availabilityZones[*].read.perZone.vmauth
availabilityZones[*].vmauthCrossAZQuery
was moved to availabilityZones[*].read.crossZone.vmauth
Example:
If before an upgrade you had given below configuration
vmauthIngestGlobal:
spec:
extraArgs:
discoverBackendIPs: "true"
vmauthQueryGlobal:
spec:
extraArgs:
discoverBackendIPs: "true"
availabilityZones:
- name: zone-eu-1
vmauthIngest:
spec:
extraArgs:
discoverBackendIPs: "true"
vmcluster:
spec:
retentionPeriod: "14"
after upgrade it will look like this:
write:
global:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
read:
global:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
availabilityZones:
- name: zone-eu-1
write:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
vmcluster:
spec:
retentionPeriod: "14"
Remove application with command.
helm uninstall vmd -n NAMESPACE
Install helm-docs
following the instructions on this tutorial.
Generate docs with helm-docs
command.
cd charts/victoria-metrics-distributed
helm-docs
The markdown generation is entirely go template driven. The tool parses metadata from charts and generates a number of sub-templates that can be referenced in a template file (by default README.md.gotmpl
). If no template file is provided, the tool has a default internal template that will generate a reasonably formatted README.
The following tables lists the configurable parameters of the chart and their default values.
Change the values according to the need of the environment in victoria-metrics-distributed`/values.yaml
file.