VictoriaMetrics 实现多集群监控及告警
之前的文章 Prometheus+Thanos实现多集群监控及告警 介绍了使用Thanos实现多集群监控的方法,这里介绍下使用 VictoriaMetrics 进行多集群监控的方法
首先扔一篇 VictoriaMetircs 与 Thanos 的比较文章 https://my.oschina.net/u/4148359/blog/4531605 ,文章由VictoriaMetrics 核心开发者所著,所以可能会更倾向于 VictoriaMetrics,不过依照我的使用情况来看,VictoriaMetrics 确实更轻量,部署也比较简单,资源占用更少,比如之前使用 Thanos 的时候,只接入了两个集群,gateway组件给了10G的内存,还是经常被OOM重启。。
VictoriaMetrics 本身是一个高性能的时序数据库,并且天然支持Prometheus API,并拥有多种功能组件,如果是中小型集群监控的话,单机版足够用,也是本文使用的方式
| 安装组件 | |
|---|---|
| 集群A | Prometheus-operator、Prometheus、alertmanager、grafana、kube-state-metrics、node-exporter、prometheus-adapter、victriametrics、vmalert |
| 集群B | Prometheus-operator、Prometheus、kube-state-metrics、node-exporter、prometheus-adapter |
集群A部署
VictoriaMetrics
简单的方式是使用helm安装,具体安装方式可参考 https://github.com/VictoriaMetrics/helm-charts
这里贴一下单机版各资源文件方便理解
clusterrole.yaml
kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:name: victoria-metrics-single-clusterrolenamespace: monitoringlabels:helm.sh/chart: victoria-metrics-single-0.6.1rules:- apiGroups: ['extensions']resources: ['podsecuritypolicies']verbs: ['use']resourceNames: [victoria-metrics-single]
serviceaccount.yaml
apiVersion: v1kind: ServiceAccountmetadata:labels:helm.sh/chart: victoria-metrics-single-0.6.1name: victoria-metrics-singlenamespace: monitoring
clusterrolebinding.yaml
kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:name: victoria-metrics-single-clusterrolebindingnamespace: monitoringlabels:helm.sh/chart: victoria-metrics-single-0.6.1subjects:- kind: ServiceAccountname: victoria-metrics-singlenamespace: monitoringroleRef:kind: ClusterRolename: victoria-metrics-single-clusterroleapiGroup: rbac.authorization.k8s.io
role.yaml
apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:name: victoria-metrics-singlenamespace: monitoringlabels:helm.sh/chart: victoria-metrics-single-0.6.1rules:- apiGroups: ['extensions']resources: ['podsecuritypolicies']verbs: ['use']resourceNames: [victoria-metrics-single]
rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:name: victoria-metrics-singlenamespace: monitoringlabels:helm.sh/chart: victoria-metrics-single-0.6.1roleRef:apiGroup: rbac.authorization.k8s.iokind: Rolename: victoria-metrics-singlesubjects:- kind: ServiceAccountname: victoria-metrics-singlenamespace: monitoring
podsecuritypolicy.yaml
apiVersion: policy/v1beta1kind: PodSecurityPolicymetadata:name: victoria-metrics-singlenamespace: monitoringlabels:helm.sh/chart: victoria-metrics-single-0.6.1annotations:seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'spec:privileged: falseallowPrivilegeEscalation: falserequiredDropCapabilities:- ALLvolumes:- 'configMap'- 'emptyDir'- 'projected'- 'secret'- 'downwardAPI'- 'persistentVolumeClaim'hostNetwork: falsehostIPC: falsehostPID: falserunAsUser:rule: 'RunAsAny'seLinux:rule: 'RunAsAny'supplementalGroups:rule: 'RunAsAny'fsGroup:rule: 'RunAsAny'readOnlyRootFilesystem: false
server-statefulset.yaml
apiVersion: apps/v1kind: StatefulSetmetadata:namespace: monitoringlabels:app: serverapp.kubernetes.io/name: victoria-metrics-singleapp.kubernetes.io/instance: victoria-singlehelm.sh/chart: victoria-metrics-single-0.6.1name: victoria-metrics-single-serverspec:serviceName: victoria-metrics-single-serverselector:matchLabels:app: serverapp.kubernetes.io/name: victoria-metrics-singleapp.kubernetes.io/instance: victoria-singlereplicas: 1podManagementPolicy: OrderedReadytemplate:metadata:labels:app: serverapp.kubernetes.io/name: victoria-metrics-singleapp.kubernetes.io/instance: victoria-singlehelm.sh/chart: victoria-metrics-single-0.6.1spec:automountServiceAccountToken: truecontainers:- name: victoria-metrics-single-serverimage: "victoriametrics/victoria-metrics:v1.45.0"imagePullPolicy: "IfNotPresent"args:- "--retentionPeriod=1"- "--storageDataPath=/storage"- -dedup.minScrapeInterval=10s- --envflag.enable=true- --envflag.prefix=VM_- --loggerFormat=jsonports:- name: httpcontainerPort: 8428livenessProbe:initialDelaySeconds: 5periodSeconds: 15tcpSocket:port: httptimeoutSeconds: 5readinessProbe:httpGet:path: /healthport: httpinitialDelaySeconds: 5periodSeconds: 15timeoutSeconds: 5resources:limits:cpu: 1000mmemory: 5000Mirequests:cpu: 100mmemory: 512MivolumeMounts:- name: server-volumemountPath: /storagesubPath:serviceAccountName: victoria-metrics-singleterminationGracePeriodSeconds: 60volumeClaimTemplates:- metadata:name: server-volumespec:accessModes:- ReadWriteOnceresources:requests:storage: "30Gi"storageClassName: "nfs-storage"
server-service-headless.yaml
apiVersion: v1kind: Servicemetadata:namespace: monitoringlabels:app: serverapp.kubernetes.io/name: victoria-metrics-singleapp.kubernetes.io/instance: victoria-singlehelm.sh/chart: victoria-metrics-single-0.6.1name: victoria-metrics-single-serverspec:clusterIP: Noneports:- name: httpport: 8428protocol: TCPtargetPort: httpselector:app: serverapp.kubernetes.io/name: victoria-metrics-singleapp.kubernetes.io/instance: victoria-single
Prometheus-operator
按照之前的文章 prometheus-operator 安装 使用kube-prometheus进行安装
需要修改的文件 prometheus-prometheus.yaml
添加如下内容
externalLabels:cluster: cluster-a # 查询的数据中,会包含cluster标签,用于区分集群数据remote_write:- url: http://<victoriametrics-addr>:8428/api/v1/write
去掉如下内容
alerting:alertmanagers:- name: alertmanager-mainnamespace: monitoringport: web
因为不需要Prometheus和AlertManager直接通信,后面改用vmalert组件
VMAlert
这里使用operator的方式部署,参考 https://github.com/VictoriaMetrics/operator
1、下载operator资源文件
export VM_VERSION=`basename $(curl -fs -o/dev/null -w %{redirect_url} https://github.com/VictoriaMetrics/operator/releases/latest)`wget https://github.com/VictoriaMetrics/operator/releases/download/$VM_VERSION/bundle_crd.zipunzip bundle_crd.zip
资源默认使用 monitoring-system 命名空间进行安装,可以使用如下命令修改为自定义空间
sed -i "s/namespace: monitoring-system/namespace: YOUR_NAMESPACE/g" release/operator/*
2、创建crd
kubectl apply -f release/crds3、创建operator
kubectl apply -f release/operator/4、创建vmalert.yaml
apiVersion: operator.victoriametrics.com/v1beta1kind: VMAlertmetadata:name: vmalertnamespace: monitoringspec:replicaCount: 1image:repository: victoriametrics/vmalerttag: v1.40.0pullPolicy: IfNotPresentdatasource:url: "http://victoria-metrics-single-server:8428"notifier:url: "http://alertmanager-main:9093"remoteWrite:url: "http://victoria-metrics-single-server:8428"flushInterval: 1mremoteRead:url: "http://victoria-metrics-single-server:8428"lookback: 1hevaluationInterval: "30s"ruleSelector:matchLabels:prometheus: k8srole: alert-rules
创建以上资源即可
grafana
添加数据源,选择类型同样是Prometheus,地址填写VictoriaMetrics的地址即可:http://victoria-metrics-single-server:8428
配置ingress将VictoriaMetrics服务暴露,方便其他集群接入
apiVersion: extensions/v1beta1kind: Ingressmetadata:labels:app: victoria-metircsname: victoria-metircs-ingressnamespace: monitoringspec:rules:- host: victoria.abc.comhttp:paths:- backend:serviceName: victoria-metrics-single-serverservicePort: 8428path: /
集群B部署
1、按照集群A的步骤编辑 prometheus-prometheus.yaml,只是remoteWrite需要修改为上面创建的ingress地址
remoteWrite:- url: http://victoria.abc.com/api/v1/write
2、在coredns的配置中添加 victoria.abc.com 的解析
kubectl edit cm -n kube-system corednshosts {172.16.27.115 victoria.abc.comfallthrough}
3、按照之前的文章 prometheus-operator 安装 即可,只需安装kube-state-metrics、node-exporter、prometheus、prometheus-adapter、prometheus-operator即可
至此基于VictoriaMetrics和Prometheus-operator的多集群监控就搭建完成了,再有其他集群想要接入的话,直接按照集群B的步骤操作即可
来源 https://mp.weixin.qq.com/s?__biz=MzAwNjMyMzYyNg==&mid=2247483931&idx=1&sn=65ba5f3519ca5c7ad1c59b6d1e2894c8&chksm=9b0e6edeac79e7c894879270bb7b841f7abba8ab2ff0a6b3103599760055646f81bcc0d98a92&mpshare=1&scene=1&srcid=03244DxSQTg0xNkG4BzayHaC&sharer_sharetime=1648104370587&sharer_shareid=b37f669367a61860107a8412c2f3cc74#rd
相关文章