728x90
반응형
설치조건
RockyOS 8.9 버전에
Prometheus
Alertmanager
Grafanaf를 수동으로 설치해봄

 

Prometheus/Alertmanager 설치하기
  • Prometheus/Alertmanager 다운로드 사이트

https://prometheus.io/download/

 

Download | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

  • Prometheus github

https://github.com/prometheus/prometheus

 

GitHub - prometheus/prometheus: The Prometheus monitoring system and time series database.

The Prometheus monitoring system and time series database. - prometheus/prometheus

github.com

 

https://grafana.com/grafana/download

 

Download Grafana | Grafana Labs

Overview of how to download and install different versions of Grafana on different operating systems.

grafana.com

 

사용자 계정 생성

 

# User 추가 
# useradd -M -r -s /bin/false prometheus
# useradd -M -r -s /bin/false altermanager
# useradd -M -r -s /bin/false grafana
  • 다운로드 받을 버젼 경로 복사 및 설치하기

 

  • 설치하기
# cd /root

# wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz

# tar xvfz prometheus-2.52.0.linux-amd64.tar.gz

# mv /root/prometheus-2.52.0.linux-amd64 /root/prometheus

# mkdir /root/prometheus_data  (별도 로그 data경로 수집 디렉토리 생성)

# cd /root/prometheus

# (실행방법-1) 기본적으로 로그수집 /data 경로를 이용하려면 아래방법으로 실행 (&은 백그라운드 실행)
# ./prometheus --config.file=prometheus.yml  &

# (실행방법-2) 로그 수집 data가 많은 경우 별도 san, nas 를 이용하기 때문에 별도 디렉토리 지정하여 실행
# ./prometheus --config.file=prometheus.yml --storage.tsdb.path=/root/prometheus_data &
[1] 7189

# *.9090 포트가 나오면 정상
# netstat -ntpa |grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      830/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      830/sshd
tcp6       0      0 :::9090                 :::*                    LISTEN      7189/./prometheus
  • 해당 서버의 ip 및 9090 Port 접속하여 화면 나오면 정상임

 

AlertManger 설치

 

  • 설치하기
# cd /root/prometheus/
# wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz

# tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz

# mv alertmanager-0.27.0.linux-amd64 /root/prometheus/alertmanager

# cd /root/prometheus/alertmanager

# ./alertmanager &

# ps -ef |grep alertmanager
root        2611    1403  0 15:35 pts/1    00:00:10 ./alertmanager
root        3387    1121  0 16:14 pts/0    00:00:00 grep --color=auto alertmanager

# 9032번 PORT가 떠있으면 정상
# netstat -ntpa |grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      832/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      832/sshd
tcp6       0      0 :::3000                 :::*                    LISTEN      3299/grafana
tcp6       0      0 :::9090                 :::*                    LISTEN      2541/./prometheus
tcp6       0      0 :::9093                 :::*                    LISTEN      2611/./alertmanager
tcp6       0      0 :::9094                 :::*                    LISTEN      2611/./alertmanager

 

  • AlertManager 접속하기

 

Grafa나 설치하기

  • 설치하기 (Standalone Linux Binaries) 설치
# cd /root/

# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-11.0.0.linux-amd64.tar.gz

# tar -zxvf grafana-enterprise-11.0.0.linux-amd64.tar.gz

# mv grafana-enterprise-11.0.0.linux-amd64 /root/grafana

# cd /root/grafana/bin

#  ./grafana-server &

# # ps -ef |grep grafana
root        3299    1403  0 15:58 pts/1    00:00:06 grafana server
root        3408    1403  0 16:20 pts/1    00:00:00 grep --color=auto grafana

# 3000번 포트가 정상적으로 나오면 성공
# netstat -ntpa |grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      832/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      832/sshd
tcp6       0      0 :::3000                 :::*                    LISTEN      3299/grafana
tcp6       0      0 :::9090                 :::*                    LISTEN      2541/./prometheus
tcp6       0      0 :::9093                 :::*                    LISTEN      2611/./alertmanager
tcp6       0      0 :::9094                 :::*                    LISTEN      2611/./alertmanager

 

  • 접속하기 : 192.168.56.120:3000 웹 접속 (Grafana 초기 패스워드는 admin/admin임)

 

 

 

  • 2024.06.07 수정

 

728x90
반응형
LIST
728x90
반응형

https://prometheus.io/docs/instrumenting/exporters/

 

Exporters and integrations | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

 

 

https://github.com/mindprince/nvidia_gpu_prometheus_exporter

 

GitHub - mindprince/nvidia_gpu_prometheus_exporter: NVIDIA GPU Prometheus Exporter

NVIDIA GPU Prometheus Exporter. Contribute to mindprince/nvidia_gpu_prometheus_exporter development by creating an account on GitHub.

github.com

 

 

# go 언어 설치
# apt-get install golang

# go version
go version go1.18.1 linux/amd64

 

 

# nvidia-docker 설치
  • GPG키와 저장소 추가
# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  • nvidia-docker 설치
# sudo apt-get update
# sudo apt-get install -y nvidia-docker2

# docker restart

# nvidia-docker

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Common Commands:
  run         Create and run a new container from an image
  exec        Execute a command in a running container
  ps          List containers
  build       Build an image from a Dockerfile
  pull        Download an image from a registry
  push        Upload an image to a registry
  images      List images
  login       Log in to a registry
  logout      Log out from a registry
  search      Search Docker Hub for images
  version     Show the Docker version information
  info        Display system-wide information

 

 

 

728x90
반응형
LIST
728x90
반응형

 

PushGateway ?
서버의 일괄처리 작업(batch job)은 시간 단위나 일 단위 등의 방식으로 정기적인 일정에 따라 수행된다. 이와 같이 일괄처리 작업은 시작되고, 무언가 작업을 수행 한뒤, 종료된다. 지속적으로 동작하지 않기 때문에, 프로메테우스는 이러한 작업에 대한 정보를 정확하게 수집할수 없다. 그렇게 때문에 푸시게이트웨이가 필요하다.

푸시게이트웨이는 서비스 레벨의 일괄처리 작업에 대한 메트릭 캐시다.

 

https://prometheus.io/download/

 

Download | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

 

# pushgateway 다운로드 및 설치 
# pwd
/etc/prometheus

# wget https://github.com/prometheus/pushgateway/releases/download/v1.7.0/pushgateway-1.7.0.linux-amd64.tar.gz

# ls -al
-rw-r--r--   1 root       root       10273763 Jan 19 22:30 pushgateway-1.7.0.linux-amd64.tar.gz

# tar -zxvf pushgateway-1.7.0.linux-amd64.tar.gz
pushgateway-1.7.0.linux-amd64/
pushgateway-1.7.0.linux-amd64/LICENSE
pushgateway-1.7.0.linux-amd64/pushgateway
pushgateway-1.7.0.linux-amd64/NOTICE

# ls
alertmanager  console_libraries  consoles  ep-examples-master  prometheus.yml  prometheus.yml.20240225  pushgateway-1.7.0.linux-amd64  pushgateway-1.7.0.linux-amd64.tar.gz

# mv pushgateway-1.7.0.linux-amd64 pushgateway

# rm pushgateway-1.7.0.linux-amd64.tar.gz

 

# 리눅스 서비스 등록
  • pushgateway 실행파일 /usr/local/bin/ 복사
# cd etc/prometheus/pushgateway

# ls -al
total 17736
drwxr-xr-x 2       1001       1002     4096 Jan 19 22:30 .
drwxr-xr-x 7 prometheus prometheus     4096 Mar  9 14:19 ..
-rw-r--r-- 1       1001       1002    11357 Jan 19 22:29 LICENSE
-rw-r--r-- 1       1001       1002      487 Jan 19 22:29 NOTICE
-rwxr-xr-x 1       1001       1002 18135918 Jan 19 22:29 pushgateway

# cp pushgateway /usr/local/bin/

# cd /usr/local/bin/

# chown prometheus:prometheus pushgateway

# ls -al
total 317708
drwxr-xr-x  2 root         root              4096 Mar  9 14:23 .
drwxr-xr-x 10 root         root              4096 Aug 10  2023 ..
-rwxr-xr-x  1 alertmanager alertmanager  37345962 Mar  3 09:49 alertmanager
-rwxr-xr-x  1 root         root          17031320 Feb 25 14:58 docker-compose
-rwxr-xr-x  1         1001         1002  19925095 Nov 13 08:54 node_exporter
-rwxr-xr-x  1 prometheus   prometheus   119902884 Sep 30 06:13 prometheus
-rwxr-xr-x  1 prometheus   prometheus   112964537 Sep 30 06:15 promtool
-rwxr-xr-x  1 prometheus   prometheus    18135918 Mar  9 14:23 pushgateway
  • pushgateway.service 파일 생성
# cd /etc/systemd/system

# vi pushgateway.service   # (아래 내용 추가)

[Unit]
Description=Push_Gateway
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/pushgateway

[Install]
WantedBy=multi-user.target
  • service 확인
# systemctl daemon-reload

# systemctl enable pushgateway
Created symlink /etc/systemd/system/multi-user.target.wants/pushgateway.service → /etc/systemd/system/pushgateway.service.

# systemctl start pushgateway

# systemctl status pushgateway
● pushgateway.service - Push_Gateway
     Loaded: loaded (/etc/systemd/system/pushgateway.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-03-09 14:33:08 KST; 5s ago
   Main PID: 2134 (pushgateway)
      Tasks: 6 (limit: 2219)
     Memory: 4.5M
        CPU: 63ms
     CGroup: /system.slice/pushgateway.service
             └─2134 /usr/local/bin/pushgateway

Mar 09 14:33:08 servidor systemd[1]: Started Push_Gateway.
Mar 09 14:33:08 servidor pushgateway[2134]: ts=2024-03-09T05:33:08.625Z caller=main.go:86 level=info msg="starting pushgateway" version="(version=1.7.0, branch=HEAD, revision=109280c17d29059623c6f5dbf1d6babab34166cf)"
Mar 09 14:33:08 servidor pushgateway[2134]: ts=2024-03-09T05:33:08.626Z caller=main.go:87 level=info build_context="(go=go1.21.6, platform=linux/amd64, user=root@c05cb3457dcb, date=20240119-13:28:37, tags=unknown)"
Mar 09 14:33:08 servidor pushgateway[2134]: ts=2024-03-09T05:33:08.642Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9091
Mar 09 14:33:08 servidor pushgateway[2134]: ts=2024-03-09T05:33:08.642Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9091

# netstat -ntpa |grep LISTEN
tcp6       0      0 :::9091                 :::*                    LISTEN      2134/pushgateway
  • 웹접속 확인 192.168.56.128:9091

 

prometheus 서버 prometheus.yml 파일에 등록함

 

# cd /etc/prometheus

# pwd
/etc/prometheus
  • 아래 소스는 본인 prometheus.yml 샘플 파일이며  scrape_configs: 아래에 아래 내용만 추가
- job_name: pushgateway
  honor_labels: true
  static_configs:
    - targets: ['192.168.56.128:9091']
# vi prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.56.128:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/etc/prometheus/alertmanager/rules/test_rule.yml"
  - "/etc/prometheus/alertmanager/rules/alert_rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.


scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  - job_name: "node_exporter"
    static_configs:
      - targets: ["192.168.56.128:9100"]
      - targets: ["192.168.56.130:9100"]

  - job_name: 'PostgreSQL_exporter'
    static_configs:
      - targets: ['192.168.56.130:9187', '192.168.56.128:9187']
        #      - targets: ['192.168.56.128:9187']

  - job_name: 'jmx_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['192.168.56.130:8081']

  - job_name: 'kubernetes_exporter'
    static_configs:
      - targets: ['192.168.56.10:9100']
      - targets: ['192.168.56.101:9100']
      - targets: ['192.168.56.102:9100']
      - targets: ['192.168.56.103:9100']

  - job_name: 'example'
    static_configs:
      - targets: ['192.168.56.128:8000']

  - job_name: pushgateway
    honor_labels: true
    static_configs:
      - targets: ['192.168.56.128:9091']
  • prometeus 서버 pushgateway 동작 확인

 

 

# pushgateway 저장 확인
  • pushgateway 테스트 python sample 파일작성 및 실행
# cat 4-12-pushgateway.py  (python 실행파일 생성 vi로 편집)

from prometheus_client import CollectorRegistry, Gauge, pushadd_to_gateway

registry = CollectorRegistry()
duration = Gauge('my_job_duration_seconds',
        'Duration of my batch job in seconds', registry=registry)
try:
    with duration.time():
        # Your code here.
        pass

    # This only runs if there wasn't an exception.
    g = Gauge('my_job_last_success_seconds',
            'Last time my batch job successfully finished', registry=registry)
    g.set_to_current_time()
finally:
    pushadd_to_gateway('192.168.56.128:9091', job='batch', registry=registry)
    
    
# python3 4-12-pushgateway.py  (python 실행)
  • 동작확인

 

728x90
반응형
LIST
728x90
반응형
python 파일  (3-1example.py)
# cat 3-1-example.py

import http.server
from prometheus_client import start_http_server

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")


# http 서버 서비스 포트는 8001
# prometheus matric 수집 포트는 8000
if __name__ == "__main__":
    start_http_server(8000)
    server = http.server.HTTPServer(('192.168.56.128', 8001), MyHandler)
    server.serve_forever()

 

# PIP3  설치 및 실행 (에러)
  • pip3 및 prometheus_client 설치가 안되어 있다고 함
# python3 3-1-example.py
Traceback (most recent call last):
  File "/etc/prometheus/ep-examples-master/3/3-1-example.py", line 2, in <module>
    from prometheus_client import start_http_server
ModuleNotFoundError: No module named 'prometheus_client'

# pip3 --version
-bash: pip3: command not found
  • pip3 및 prometheus_client 설치
# apt update

# apt install python3-pip
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  build-essential cpp cpp-11 dpkg-dev fakeroot g++ g++-11 gcc gcc-11 gcc-11-base javascript-common libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan6 libatomic1 libc-dev-bin libc-devtools libc6 libc6-dev libcc1-0 libcrypt-dev
  libdeflate0 libdpkg-perl libexpat1-dev libfakeroot libfile-fcntllock-perl libgcc-11-dev libgd3 libgomp1 libisl23 libitm1 libjbig0 libjpeg-turbo8 libjpeg8 libjs-jquery libjs-sphinxdoc libjs-underscore liblsan0 libmpc3 libnsl-dev libpython3-dev libpython3.10
  libpython3.10-dev libpython3.10-minimal libpython3.10-stdlib libquadmath0 libstdc++-11-dev libtiff5 libtirpc-dev libtsan0 libubsan1 libwebp7 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxpm4 linux-libc-dev lto-disabled-list make manpages-dev python3-dev
  python3-wheel python3.10 python3.10-dev python3.10-minimal rpcsvc-proto zlib1g-dev
..
..
..
..
  
  
# pip3 install prometheus_client
Collecting prometheus_client
  Downloading prometheus_client-0.20.0-py3-none-any.whl (54 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 KB 1.2 MB/s eta 0:00:00
Installing collected packages: prometheus_client
Successfully installed prometheus_client-0.20.0

# pip3 -V
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

 

# python 파일  실행 (3-1example.py)
# python3 3-1-example.py
192.168.56.1 - - [09/Mar/2024 09:05:08] "GET / HTTP/1.1" 200 -
192.168.56.1 - - [09/Mar/2024 09:05:08] "GET /favicon.ico HTTP/1.1" 200 -

# python3 3-1-example.py &

# netstat -ntpa |grep LISTEN
tcp        0      0 192.168.56.128:8001     0.0.0.0:*               LISTEN      3295/python3
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      3299/python3

# ps -ef |grep python3
root        3299    1387  0 09:21 pts/0    00:00:00 python3 3-1-example.py


# 3-1-example.py 중지 또는 kill

# kill -9 3299

 

# 동작 확인

 

# python3 3-1-example.py
192.168.56.1 - - [09/Mar/2024 09:05:08] "GET / HTTP/1.1" 200 -
192.168.56.1 - - [09/Mar/2024 09:05:08] "GET /favicon.ico HTTP/1.1" 200 -

 

 

# prometheus에 clinet 추가하기 
# pwd
/etc/prometheus

# ls
alertmanager  console_libraries  consoles  ep-examples-master  prometheus.yml  prometheus.yml.20240225

 

  • prometheus.yml 파일에 등록 및 prometheus 재기동  (주의 : 8000번 포트로 등록)
# vi prometheus.yml

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  - job_name: 'example'
    static_configs:
      - targets: ['192.168.56.128:8000']
      
# systemctl restart prometheus

# systemctl status prometheus
● prometheus.service - Prometheus
     Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-03-09 09:34:02 KST; 7s ago
   Main PID: 3320 (prometheus)
      Tasks: 8 (limit: 2219)
     Memory: 63.8M
        CPU: 1.698s
     CGroup: /system.slice/prometheus.service
             └─3320 /usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path />

 

  • Prometheus -> Status -> Targets 에서 확인

  • 192.168.56.128:8000/metrics 값 확인

  • python_info 값 입력후 Execute 확인
python_info{implementation="CPython", instance="192.168.56.128:8000", job="example", major="3", minor="10", patchlevel="12", version="3.10.12"}
1

 

728x90
반응형
LIST
728x90
반응형
# Alertmanager 컨피그 수정

 

# pwd
/etc/prometheus/alertmanager


# cat alertmanager.yml
# global: 아래 slack_api_uri은 slack 사이트에서 생성한 주소임(아래 설명 참조)
global:
  slack_api_url: https://hooks.slack.com/services/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

receivers:

- name: slack-notifier
  slack_configs:
  # channel : slack 사이트에서 생성한 slack channel 주소
  - channel: #prometeus-slack
    send_resolved: true
    title: '[{{.Status | toUpper}}] {{ .CommonLabels.alertname }}'
    text: >-
      *Description:* {{ .CommonAnnotations.description }}
      *summary* {{ .CommonAnnotations.instance  }}

route:
  group_wait: 10s
  group_interval: 1m
  repeat_interval: 1m
  receiver: slack-notifier

 

# Slack 회원가입 및 환경설정

 

https://slack.com/

 

Slack은 생산성 플랫폼입니다

Slack은 팀과 커뮤니케이션할 수 있는 새로운 방법입니다. 이메일보다 빠르고, 더 조직적이며, 훨씬 안전합니다.

slack.com

 

  • 회원가입후 왼쪽 메뉴에서 채널 추가를 선택

  • 채널 추가 (이름 생성) ---> prometeus-slack 은 이미 있으므로 화면 오류표시는 무시하면됨

# 위에 alertmanager.yml 파일의 요부분임 

  slack_configs:
  - channel: #prometeus-slack

  • 앱추가 (앱 -> 관리 -> 앱 찾아보기)

  • 앱 검색 및 추가 (incomming webhooks)

  • 수신 웹후크 Slack에 추가 

  • 수신 웹후크 구성편집  (위에서 생성한 prometeus-slack )

  • 채널에 포스트 (위에서 설정한 #prometeus-slack,  웹후크 URL 복사) 및 설정저장 

  • alertmanager.yml 파일 재확인
# cd /etc/prometheus/alertmanager

/etc/prometheus/alertmanager

# vi alertmanager.yml 
global:
# slack 홈페이지에 복사한 웹후크 url을 복사한다
  slack_api_url: https://hooks.slack.com/services/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  <--
receivers:
- name: slack-notifier
  slack_configs:
# slackr 홈페이지에서 생성한 channel 이름을 입력한다
  - channel: #prometeus-slack   <--
    send_resolved: true
    title: '[{{.Status | toUpper}}] {{ .CommonLabels.alertname }}'
    text: >-
      *Description:* {{ .CommonAnnotations.description }}
      *summary* {{ .CommonAnnotations.instance  }}
route:
  group_wait: 10s
  group_interval: 1m
  repeat_interval: 1m
  receiver: slack-notifier

 

# slack 문자가 slack 홈페이지에 수신(receivers:) 되는지 확인
  • alertmanager.yml 파일에  text 아래 Description, summary  이후 {{  값  }}  문구가 안먹힘 (확인중)
    text: >-
      *Description:* {{ .CommonAnnotations.description }}
      *summary* {{ .CommonAnnotations.instance  }}
  • slack 홈페이지에 에러 문자 확인

  • 실제로는 서버에서 나오는 아래 정보가 나와야 됨

  • 스마트폰 slack 앱에서 문자수신 확인

 

  • Windows slack Desktop에서 문자수신 확인

 

# slack에 수신되는 Description, Summary 세부 정보가 안나오는 것 확인중에 있음 (2024. 3. 3)
728x90
반응형
LIST
728x90
반응형

 

https://prometheus.io/download/

 

Download | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

 

(설치방법 1) # AlertManager 설치 (바이너리 설치 권장)
# wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz

# tar xvzf alertmanager-0.27.0.linux-amd64.tar.gz

# mv alertmanager-0.27.0.linux-amd64 /etc/prometheus/alertmanager

# /etc/prometheus/alertmanager/./alertmanager &         

  -- alertmanager를 백그라운드로 실행함
  -- alertmanager.service를 만들어서 systemctl start, restart, disable 로 관리해도 됨

# netstat -ntpa |grep LISTEN
tcp6       0      0 :::9093                 :::*                    LISTEN      6101/./alertmanager
tcp6       0      0 :::9094                 :::*                    LISTEN      6101/./alertmanager

 

# alertmanager를 systemctl {restart, start, disable, enable} 등으로 관리하려면 https://hwpform.tistory.com/134 참조

 

또는 (설치방법 2)  # Docker AlertManager 설치
  • docker 확인
# docker run --name alertmanager -d -p 9093:9093 quay.io/prometheus/alertmanager

# docker ps -a
CONTAINER ID   IMAGE                                           COMMAND                  CREATED         STATUS         PORTS                                       NAMES
77d5896ee529   quay.io/prometheus/alertmanager                 "/bin/alertmanager -…"   5 minutes ago   Up 5 minutes   0.0.0.0:9093->9093/tcp, :::9093->9093/tcp   alertmanager

# docmer images
REPOSITORY                                      TAG       IMAGE ID       CREATED        SIZE
quay.io/prometheus/alertmanager                 latest    11f11916f8cd   2 days ago     70.3MB
  • web페이지 확인 http://192.168.56.128:9093

 

# prometheus 서버에 alertmanager 설정
# cat /etc/prometheus/prometheus.yml

# my global config
global:
  scrape_interval: 15s 
  evaluation_interval: 15s 

# Alertmanager configuration 
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.56.128:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/etc/prometheus/alertmanager/rules/test_rule.yml"
  - "/etc/prometheus/alertmanager/rules/alert_rules.yml"

 

 

Alerting rules | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

# cat /etc/prometheus/alertmanager/rules/test_rule.yml

groups:
- name: example
  rules:

  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
# cat alert_rules.yml

groups:
- name: alert.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

  - alert: HostOutOfMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Host out of memory (instance {{ $labels.instance }})"
      description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: HostMemoryUnderMemoryPressure
    expr: rate(node_vmstat_pgmajfault[1m]) > 1000
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Host memory under memory pressure (instance {{ $labels.instance }})"
      description: "The node is under heavy memory pressure. High rate of major page faults\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  # Please add ignored mountpoints in node_exporter parameters like
  # "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/)".
  # Same rule using "node_filesystem_free_bytes" will fire when disk fills for non-root users.
  - alert: HostOutOfDiskSpace
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Host out of disk space (instance {{ $labels.instance }})"
      description: "Disk is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: HostHighCpuLoad
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: "Host high CPU load (instance {{ $labels.instance }})"
      description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

 

  • prometheus 서버 재기동
# systemctl restart prometheus

 

Prometheus 및 Alertmanager 동작확인
  • 설치기본 정보 확인
# netstat -ntpa |grep LISTEN
tcp6       0      0 :::9090                 :::*                    LISTEN      8085/prometheus
tcp6       0      0 :::9093                 :::*                    LISTEN      6101/./alertmanager
tcp6       0      0 :::9094                 :::*                    LISTEN      6101/./alertmanager
포트 프로세서 명 설치위치 및 파일 web 접속
9090 prometheus /etc/prometheus/prometheus.yml http://192.168.56.128:9090
9093
9094
alertmanager /etc/prometheus/alertmanager/alertmanager
/etc/prometheus/alertmanager/alertmanager/rules
http://192.168.56.128:9093
  • prometheus 동작 확인 http://192.168.56.128:9090  

 

  • AlertManager 동작확인

728x90
반응형
LIST
728x90
반응형
가상화 서버(VM)  2대 ( Prometheus, Grafana 서버 1대, 연동 클라이언트 서버 1대) 를 설치하고 수집데이터를 Prometheus로 연동하고 연동한 데이터를 가지고 Grafana로 대쉬보드를 만들어 본다
- node_exporter (서버, 쿠버네티스, 도커 등) 데이터 수집 
- postgresql_exporter : postgresql 데이터 수집
- jmx_exporter  : tomcat 데이터 수집

 

기본 용어
  • Exporter ? prometheus  Exporter는 메트릭 정보를 수집하는 모티터링 할 서버, 에이전트, 데몬 등을 대상시스템에서 메트릭을 수집하고 HTTP 엔드포인트(default: /metrics)에 노출시키는 소프트웨어 
  • 대표적인 Exporter 종류

      - node-exporter

      - mysql-exporter

      - wmi-expoter(windwos server)

      - postgre-exporter

      - redis-exporter

      - kafra-exporter

      - jmx-exporter 

      ※ Prometheus 및 각종 exporter 설치파일 사이트  https://prometheus.io/download/

 

Download | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

prometheus.io

 

    여기에서는 node-exporter, postgre-exporter, jmx-exporter만 다룬다(연동해 본다)

 

  • 메트릭이란 ? : 메트릭(Metric) 은 현재 시스템의 상태를 알수 있는 측정값이다. 인프라 환경에서는 크게 2가지 상태롤 메트릭을 구분.  CPU와 메모리 사용량을 나타내는 시스템 메트릭(System Metric), HTTP 상태 코드 같은 서비스 상태를 나태내는 지표인 서비스 메트릭(Service Metri) 
  • 시계열 데이터베이스 ? 시계열 데이터베이스는 시간을 축(키)으로 시간의 흐름에 따라 발생하는 데이터를 저장하는 데 최적화된 데이터베이스. 예를 들어 네트워크 흐름을 알 수 있는 패킷과 각종 기기로부터 전달받는 IoT  센서 값, 이벤트 로그 등이 있다.  

 

목표시스템 구성도

 

20240229_프로메테우스-그라파타목표시스템_ver0.1.pptx
0.69MB

 

Prometheus - Grafana 서버 설치 (vagrant로 설치)
  • OracleVM 과 Vagrant 는 미리 설치되어 있어야 함 (서버 설치는 Vagrant 로 설치)

      - Vargrant 를 통한 가상화 서버 설치방법은 https://hwpform.tistory.com/111 참조

용도 IP 설치 Pkg 포트
Prometheus, Grafana 서버 192.168.56.128
(Ubuntu 22.04)
Prometheus
Grafana
Node_exporter
PostgreSQL(Docker)
PostgreSQL_Exporter(Docker)
9090
3000
9100
5432
9187
Prometheus, Grafana
연동 클라이언트 테스트
서버
192.168.56.130
(CentOS8)
Node_exporter
PortgreSQL(Docker)
PostgreSQL_Exporter(Docker)
Tomcat
jmx_exporter(Tomcat)
9100
5432
9187
8080
8081

 

Vagrant로 설치할 서버 정보 https://app.vagrantup.com/boxes/search
  • Prometheus, Grafana 서버

  • Prometheus, Grafana 연동 클라이언트 테스트 서버

  • Vagrnafile 및 설치 방법
# Ubuntu 22.04
Vagrant.configure("2") do |config|
  config.vm.box = "davidurbano/prometheus-grafana"
end

# CentOS 8
Vagrant.configure("2") do |config|
  config.vm.box = "centos/8"
end

 

  • Vargrant로 설치된 Oracle VM 서버 

< 192.168.56.128 >
< 192.168.56.130 >

설치된 Prometheus 정보

< 192.168.56.128:9090 >

 

설치된 Grafana  예시 : Node_exporterl (서버관제)

< 192.168.56.128:3000 >

설치된 Grafana  예시 : Postgresql_exporterl (DB관제)

< 192.168.56.128:3000 >

 

설치된 Grafana  예시 : jmx_exporterl (Tomcat 관제)

< 192.168.56.128:3000 >

Prometheus, Grafana 서버(192.168.56.128) 
# Vagrnat를 이용하여 설치된 서버(192.168.56.128) config.vm.box = "davidurbano/prometheus-grafana"는  
  기본적으로 Prometeus(9090), Grafana(3000), Node_expoter(9100)가 설치되어 있음 
  
# netstat -ntpa |grep LISTEN
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      2632/systemd-resolv
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      735/sshd: /usr/sbin
tcp6       0      0 :::22                   :::*                    LISTEN      735/sshd: /usr/sbin
tcp6       0      0 :::9090                 :::*                    LISTEN      4369/prometheus
tcp6       0      0 :::9100                 :::*                    LISTEN      670/node_exporter
tcp6       0      0 :::3000                 :::*                    LISTEN      666/grafana
  • prometheus 서버 : http://192.168.56.128:9090
# ls -al
total 24
drwxr-xr-x   4 prometheus prometheus 4096 Feb 29 13:47 .
drwxr-xr-x 102 root       root       4096 Feb 29 11:29 ..
drwxr-xr-x   2 prometheus prometheus 4096 Sep 29 21:42 console_libraries
drwxr-xr-x   2 prometheus prometheus 4096 Sep 29 21:42 consoles
-rw-r--r--   1 vagrant    vagrant    1385 Feb 29 13:47 prometheus.yml
-rw-r--r--   1 root       root        934 Feb 25 00:00 prometheus.yml.20240225

# pwd
/etc/prometheus
  • prometheus 컨피그 설정파일 (수집하고 싶은 targets 서버 : port를 지정해 주면됨)
# cat /etc/prometheus/prometheus.yml

scrape_configs:

# Node_exporter (서버 관제용)

  - job_name: "node_exporter"
    static_configs:
      - targets: ["192.168.56.128:9100"]
      - targets: ["192.168.56.130:9100"]

# PostgreSQL_exporter (DB 관제용)

  - job_name: 'PostgreSQL_exporter'
    static_configs:
      - targets: ['192.168.56.130:9187']
      - targets: ['192.168.56.128:9187']

# jmx_exporter (tomcat 관제용)

  - job_name: 'jmx_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['192.168.56.130:8081']

# kubernetes_exporter (쿠버네티스 서버 관제용)

  - job_name: 'kubernetes_exporter'
    static_configs:
      - targets: ['192.168.56.10:9100']
      - targets: ['192.168.56.101:9100']
      - targets: ['192.168.56.102:9100']
      - targets: ['192.168.56.103:9100']

 

  • Prometheus 서버 : http://192.168.56.128:9090

 

  • Promethus -> Status -> Targets 클릭하면 위의 /etc/prometheus/prometheus.yml 설정된 값이 동일함

 

  •   연동 metrics 정보 (대쉬보드로 표현 가능한 함수값)

node_exporter.txt
0.07MB
PostgreSQL_exporter.txt
0.08MB
jmx_exporter.txt
0.24MB

 

 

  • Grafana서버 : http://192.168.56.128:3000

 

  • Node Exporter : http://192.168.56.128:9100

 

1. 1부는 여기까지 .. 2부에서 계속

728x90
반응형
LIST

+ Recent posts