Cumulus VX는 스위치와 같은 네트워킹 하드웨어용으로 설계된 네트워크 운영 체제인 Cumulus Linux의 가상 어플라이언스 버전입니다. 이를 통해 사용자는 물리적 하드웨어 없이도 네트워크 구성 및 기능을 시뮬레이션하고 테스트할 수 있습니다. Cumulus VX는 VMware, KVM, VirtualBox 등의 하이퍼바이저에서 실행되므로 사용자는 프로덕션 네트워크에 배포하기 전에 가상 환경에서 Cumulus Linux 기능을 실험해 볼 수 있습니다. 이 가상화 접근 방식을 사용하면 물리적 하드웨어 설정에 따른 비용과 복잡성 없이 네트워크 설계, 테스트 및 학습이 용이해집니다.
Cumulus-vx VM가상화를 띄우기 위해서 Oracle VM 확장패키지 설치 및 vagrant plugin 설치
Oracle_VM_VirtualBox_Extension_Pack-7.0.14 확장 패키지 다운로드
- cumulus 초기 login 는 cumulus / cumulus 로 접속 (또는 vagrant / vagrant)
login as: cumulus
Pre-authentication banner message from server:
| Debian GNU/Linux 10
End of banner message from server
cumulus@192.168.56.61's password:
Linux cumulus 5.10.0-cl-1-amd64 #1 SMP Debian 5.10.189-1+cl5.8.0u16 (2024-01-27) x86_64
Welcome to NVIDIA Cumulus VX (TM)
NVIDIA Cumulus VX (TM) is a community supported virtual appliance designed
for experiencing, testing and prototyping NVIDIA Cumulus' latest technology.
For any questions or technical support, visit our community site at:
https://www.nvidia.com/en-us/support
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
Last login: Sat Apr 6 07:27:34 2024
*****************************************************************
Please send these support file(s) to Networking-support@nvidia.com:
/var/support/cl_support_ansible-cl01_20240331_075001.txz
/var/support/cl_support_cumulus_20240331_072156.txz
*****************************************************************
root 패스워드 변경
$ sudo passwd root
[sudo] password for cumulus: <---- cumulus 입력
New password: <---- root 패스워드 입력
Retype new password:
passwd: password updated successfully <---- root 패스워드 확인
@ root로 스위칭후 ssh 환경설정
# su - root
# vi /etc/ssh/sshd_config
PermitRootLogin yes
PasswordAuthentication yes
@ sshd_config 파일 수정후 ssh 대몬 재기동
# systemctl restart ssh
putty 같은 tool로 접속하여 환경 확인 또는 ip 설정 (vargrant 파일로 ip를 192.168.56.61로 설정을 하였으나 안먹힘)
~$ net show configuration
dns
nameserver
10.0.61.3 # vrf mgmt
time
zone
Etc/UTC
snmp-server
listening-address localhost
ptp
frr defaults datacenter
log syslog informational
vrf default
vrf mgmt
interface lo
interface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
interface eth0
address dhcp
ip-forward off
ip6-forward off
vrf mgmt
hostname # Auto-generated by NVUE!
dot1x
mab-activation-delay 30
max-number-stations 6
default-dacl-preauth-filename default_preauth_dacl.rules
eap-reauth-period 0
radius
ip 수동 설정 및 설정 확인
$ nv set interface swp1 ip address 192.168.56.61/24
$ nv config apply
@ IP설정 확인
$ net show interface
State Name Spd MTU Mode LLDP Summary
----- ---- --- ----- ------------ ---- -----------------------
UP lo N/A 65536 Loopback IP: 127.0.0.1/8
lo IP: ::1/128
UP eth0 1G 1500 Mgmt Master: mgmt(UP)
eth0 IP: 10.0.61.15/24(DHCP)
UP swp1 1G 9216 Interface/L3 IP: 192.168.56.61/24
UP mgmt N/A 65575 VRF IP: 127.0.0.1/8
mgmt IP: ::1/128
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.define "vyos_current" do |cfg|
cfg.vm.box = "vyos/current"
cfg.vm.provider "virtualbox" do |vb|
vb.name = "vyos253"
vb.customize ["modifyvm", :id, "--groups", "/default_group"]
end
cfg.vm.host_name = "vyos253"
cfg.vm.network "public_network", ip: "192.168.56.253"
cfg.vm.network "forwarded_port", guest: 22, host: 60253, auto_correct: true, id: "ssh"
cfg.vm.network "private_network", virtualbox__intnet: "eth2", auto_config: false
cfg.vm.network "private_network", virtualbox__intnet: "eth3", auto_config: false
cfg.vm.synced_folder "../data", "/vagrant", disabled: true
end
end
Vagrant up
PS C:\Users\shim> vagrant up
Bringing machine 'vm_define_vyos_current' up with 'virtualbox' provider...
==> vm_define_vyos_current: Importing base box 'vyos/current'...
==> vm_define_vyos_current: Matching MAC address for NAT networking...
==> vm_define_vyos_current: Checking if box 'vyos/current' version '20240325.00.19' is up to date...
==> vm_define_vyos_current: Setting the name of the VM: vyos254
==> vm_define_vyos_current: Clearing any previously set network interfaces...
==> vm_define_vyos_current: Preparing network interfaces based on configuration...
vm_define_vyos_current: Adapter 1: nat
vm_define_vyos_current: Adapter 2: bridged
vm_define_vyos_current: Adapter 3: intnet
vm_define_vyos_current: Adapter 4: intnet
==> vm_define_vyos_current: Forwarding ports...
vm_define_vyos_current: 22 (guest) => 2222 (host) (adapter 1)
==> vm_define_vyos_current: Booting VM...
==> vm_define_vyos_current: Waiting for machine to boot. This may take a few minutes...
vm_define_vyos_current: SSH address: 127.0.0.1:2222
vm_define_vyos_current: SSH username: vyos
vm_define_vyos_current: SSH auth method: private key
vm_define_vyos_current: Warning: Connection aborted. Retrying...
vm_define_vyos_current: Warning: Connection reset. Retrying...
vm_define_vyos_current: Warning: Connection reset. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
vm_define_vyos_current: Warning: Authentication failure. Retrying...
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.
If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.
If you're using a custom box, make sure that networking is properly
working and you're able to connect to the machine. It is a common
problem that networking isn't setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.
If the box appears to be booting properly, you may want to increase
the timeout ("config.vm.boot_timeout") value.
PS C:\Users\shim>
Oracle VM 에 HOST VM vyos253생성 및 확인
vyos253 인터페이스 어댑터 변경
- 어댑터 1 : NAT
- 어댑터 2 : 호스트 전용 어댑터
- 어댑터 3 : 내부 네트워크
- 어댑터 4 : 내부 네트워크
vyos 설정
HOST VM vyos253 접속
Loing id/pw는 vyos / vyos 로 접속
login as: vyos
vyos@192.168.56.253's password:
Welcome to VyOS!
┌── ┐
. VyOS 1.5-rolling-202403250019
└ ──┘ current
* Documentation: https://docs.vyos.io/en/latest
* Project news: https://blog.vyos.io
* Bug reports: https://vyos.dev
You can change this banner using "set system login banner post-login" command.
VyOS is a free software distribution that includes multiple components,
you can check individual component licenses under /usr/share/doc/*/copyright
Last login: Sun Mar 31 04:06:08 2024 from 192.168.56.1
vyos@vyos:~$
위에 vagrantfile로 public Network 를 설정했으나 ip설정이 안되어 수동으로 설정함
ssh 설정
snmp 설정 (community) 값을 public으로 설정
snmp 포트 161번 포트 설정
$ vyos / vyos 로 로그인
$ sudo passwd root / root 패스워드 변경 (필요한 경우 설정 root 패스워드 설정 변경)
$ configure
[edit]
# set service ssh port 22
# set service ssh disable-password-authentication
# set service ssh disable-host-validation
# set interfaces ethernet eth1 address 192.168.56.253/24
# set service snmp community public authorization ro
# set service snmp listen-address 192.168.56.254 port 161
# set service snmp v3
# commit
configuration changes to commit
[edit]
# save
[edit]
전체적인 설정 정보 확인
$ show configuration 또는 configuration 모드로 진입하여 run show configuration
# run show configuration
interfaces {
ethernet eth0 {
address dhcp
hw-id 08:00:27:8d:c0:4d
speed auto
}
ethernet eth1 {
address 192.168.56.253/24
hw-id 08:00:27:e3:05:9b
}
ethernet eth2 {
hw-id 08:00:27:9b:97:43
}
ethernet eth3 {
hw-id 08:00:27:5a:40:a5
}
loopback lo {
}
}
service {
ntp {
allow-client {
address 0.0.0.0/0
address ::/0
}
server time1.vyos.net {
}
server time2.vyos.net {
}
server time3.vyos.net {
}
}
snmp {
community public {
authorization ro
}
listen-address 192.168.56.253 {
port 161
}
v3 {
}
}
ssh {
port 22
}
}
system {
config-management {
commit-revisions 100
}
conntrack {
modules {
ftp
h323
nfs
pptp
sip
sqlnet
tftp
}
}
console {
}
host-name vyos
login {
user vyos {
authentication {
encrypted-password ****************
plaintext-password ****************
}
}
}
name-server eth0
syslog {
global {
facility all {
level notice
}
facility local7 {
level debug
}
}
}
}
[edit]
Zabbix 서버 연동 (snmp 연동)
Data collection 에서 Hosts 클릭
오른쪽 상단에 Create host 클릭
host 등록 - Interfaces 아래 Add -> SNMP 클릭하여 해당정보 입력
SNMPv2 / SNMP Community는 위에서 설정한 public 또는 {$SNMP_COMMUNITY} 값 입력후 추가 또는 Update
# vagrant 설치 로그
C:\Users\shim>vagrant up
Bringing machine 'rock8Zabbix' up with 'virtualbox' provider...
==> rock8Zabbix: Importing base box 'generic/rocky8'...
==> rock8Zabbix: Matching MAC address for NAT networking...
==> rock8Zabbix: Checking if box 'generic/rocky8' version '4.3.12' is up to date...
==> rock8Zabbix: Setting the name of the VM: rocky8Zabbix
==> rock8Zabbix: Clearing any previously set network interfaces...
==> rock8Zabbix: Preparing network interfaces based on configuration...
rock8Zabbix: Adapter 1: nat
rock8Zabbix: Adapter 2: hostonly
==> rock8Zabbix: Forwarding ports...
rock8Zabbix: 22 (guest) => 60230 (host) (adapter 1)
==> rock8Zabbix: Running 'pre-boot' VM customizations...
==> rock8Zabbix: Booting VM...
==> rock8Zabbix: Waiting for machine to boot. This may take a few minutes...
rock8Zabbix: SSH address: 127.0.0.1:60230
rock8Zabbix: SSH username: vagrant
rock8Zabbix: SSH auth method: private key
rock8Zabbix:
rock8Zabbix: Vagrant insecure key detected. Vagrant will automatically replace
rock8Zabbix: this with a newly generated keypair for better security.
rock8Zabbix:
rock8Zabbix: Inserting generated public key within guest...
rock8Zabbix: Removing insecure key from the guest if it's present...
rock8Zabbix: Key inserted! Disconnecting and reconnecting using new SSH key...
==> rock8Zabbix: Machine booted and ready!
==> rock8Zabbix: Checking for guest additions in VM...
rock8Zabbix: The guest additions on this VM do not match the installed version of
rock8Zabbix: VirtualBox! In most cases this is fine, but in rare cases it can
rock8Zabbix: prevent things such as shared folders from working properly. If you see
rock8Zabbix: shared folder errors, please make sure the guest additions within the
rock8Zabbix: virtual machine match the version of VirtualBox you have installed on
rock8Zabbix: your host and reload your VM.
rock8Zabbix:
rock8Zabbix: Guest Additions Version: 6.1.30
rock8Zabbix: VirtualBox Version: 7.0
==> rock8Zabbix: Setting hostname...
==> rock8Zabbix: Configuring and enabling network interfaces...
C:\Users\shim>
Zabbix 설치하고픈 환경 선택 (Zabbix 6.4, Rocky Linux 8, Server, Agent, PostgreSQL, Apache)를 선택함
설치방법 안내 (홈페이지 안내 방법에 의해 설치하면 됨)
0. 서버 환경 (설치로그)
# 최초 vagrant / vagrant 로 서버 로그인하여 root 패스워드 변경
$ sudo passwd root
Changing password for user root.
New password:
===========================================================================================
# 원격 접속 가능하게 sshd_config 파일 수정
# pwd
/etc/ssh
# ls
moduli ssh_config ssh_config.d sshd_config ssh_host_ecdsa_key ssh_host_ecdsa_key.pub ssh_host_ed25519_key ssh_host_ed25519_key.pub ssh_host_rsa_key ssh_host_rsa_key.pub
# vi sshd_config
PermitRootLogin yes <-- yes로 변경
PasswordAuthentication yes <-- yes로 변경
============================================================================================
# 시간 맞추기
# sudo timedatectl set-timezone Asia/Seoul
# dnf module switch-to php:7.4
Rocky Linux 8 - AppStream 4.9 MB/s | 11 MB 00:02
Rocky Linux 8 - BaseOS 2.8 MB/s | 7.1 MB 00:02
Rocky Linux 8 - Extras 8.0 kB/s | 14 kB 00:01
Extra Packages for Enterprise Linux 8 - x86_64 5.3 MB/s | 16 MB 00:03
Zabbix Official Repository - x86_64 122 kB/s | 208 kB 00:01
Zabbix Official Repository non-supported - x86_64 1.1 kB/s | 1.4 kB 00:01
Dependencies resolved.
=======================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=======================================================================================================================================================================================================================================
Enabling module streams:
httpd 2.4
nginx 1.14
php 7.4
Transaction Summary
=======================================================================================================================================================================================================================================
Is this ok [y/N]: y
Complete!
c. Install Zabbix server, frontend, agent
# dnf install zabbix-server-pgsql zabbix-web-pgsql zabbix-apache-conf zabbix-sql-scripts zabbix-selinux-policy zabbix-agent
=======================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=======================================================================================================================================================================================================================================
Installing:
zabbix-agent x86_64 6.4.13-release1.el8 zabbix 592 k
zabbix-apache-conf noarch 6.4.13-release1.el8 zabbix 27 k
zabbix-selinux-policy x86_64 6.4.13-release1.el8 zabbix 320 k
zabbix-server-pgsql x86_64 6.4.13-release1.el8 zabbix 1.9 M
zabbix-sql-scripts noarch 6.4.13-release1.el8 zabbix 7.9 M
zabbix-web-pgsql noarch 6.4.13-release1.el8 zabbix 26 k
..
..
Transaction Summary
=======================================================================================================================================================================================================================================
Install 51 Packages
Complete!
d. Create initial database
먼저 서버에 postgresql을 설치해야 됨
- 레포지토리 버전 확인 및 기존 버전 비활성화
# dnf module list postgresql
Rocky Linux 8 - AppStream 5.6 kB/s | 4.8 kB 00:00
Rocky Linux 8 - BaseOS 5.3 kB/s | 4.3 kB 00:00
Rocky Linux 8 - Extras 3.7 kB/s | 3.1 kB 00:00
Extra Packages for Enterprise Linux 8 - x86_64 4.8 kB/s | 8.2 kB 00:01
Zabbix Official Repository - x86_64 4.8 kB/s | 2.9 kB 00:00
Zabbix Official Repository non-supported - x86_64 4.9 kB/s | 2.9 kB 00:00
Rocky Linux 8 - AppStream
Name Stream Profiles Summary
postgresql 9.6 client, server [d] PostgreSQL server and client module
postgresql 10 [d] client, server [d] PostgreSQL server and client module
postgresql 12 client, server [d] PostgreSQL server and client module
postgresql 13 client, server [d] PostgreSQL server and client module
postgresql 15 client, server [d] PostgreSQL server and client module
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
# dnf -qy module disable postgresql
# dnf module list postgresql
Last metadata expiration check: 0:02:24 ago on Sun 31 Mar 2024 07:33:39 AM KST.
Rocky Linux 8 - AppStream
Name Stream Profiles Summary
postgresql 9.6 [x] client, server [d] PostgreSQL server and client module
postgresql 10 [d][x] client, server [d] PostgreSQL server and client module
postgresql 12 [x] client, server [d] PostgreSQL server and client module
postgresql 13 [x] client, server [d] PostgreSQL server and client module
postgresql 15 [x] client, server [d] PostgreSQL server and client module
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
# cat passwd
postgres:x:26:26:PostgreSQL Server:/var/lib/pgsql:/bin/bash
# su - postgres
$ pwd
/var/lib/pgsql
$ pwd
/usr/pgsql-13/bin
# postgresql-13-setup initdb
Initializing database ... OK
# cat /var/lib/pgsql/13/initdb.log
runuser: may not be used by non-root users
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/pgsql/13/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Seoul
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/pgsql-13/bin/pg_ctl -D /var/lib/pgsql/13/data/ -l logfile start
- PostgreSQL 13 서비스 상태 확인 (Active : Failed)
# systemctl status postgresql-13.service
● postgresql-13.service - PostgreSQL 13 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-13.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2024-03-31 07:58:51 KST; 3min 59s ago
Docs: https://www.postgresql.org/docs/13/static/
Process: 27856 ExecStartPre=/usr/pgsql-13/bin/postgresql-13-check-db-dir ${PGDATA} (code=exited, status=1/FAILURE)
Mar 31 07:58:51 rocky8Zabbix systemd[1]: Starting PostgreSQL 13 database server...
Mar 31 07:58:51 rocky8Zabbix postgresql-13-check-db-dir[27856]: "/var/lib/pgsql/13/data/" is missing or empty.
Mar 31 07:58:51 rocky8Zabbix postgresql-13-check-db-dir[27856]: Use "/usr/pgsql-13/bin/postgresql-13-setup initdb" to initialize the database cluster.
Mar 31 07:58:51 rocky8Zabbix postgresql-13-check-db-dir[27856]: See /usr/share/doc/postgresql13/README.rpm-dist for more information.
Mar 31 07:58:51 rocky8Zabbix systemd[1]: postgresql-13.service: Control process exited, code=exited status=1
Mar 31 07:58:51 rocky8Zabbix systemd[1]: postgresql-13.service: Failed with result 'exit-code'.
Mar 31 07:58:51 rocky8Zabbix systemd[1]: Failed to start PostgreSQL 13 database server.
- PostgreSQL 13 서비스 재기동 (Active : running)
# systemctl disable postgresql-13.service
Removed /etc/systemd/system/multi-user.target.wants/postgresql-13.service.
# systemctl stop postgresql-13.service
# systemctl enable postgresql-13.service
Created symlink /etc/systemd/system/multi-user.target.wants/postgresql-13.service → /usr/lib/systemd/system/postgresql-13.service.
# systemctl start postgresql-13.service
# systemctl status postgresql-13.service
● postgresql-13.service - PostgreSQL 13 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-13.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2024-03-31 08:03:17 KST; 4s ago
Docs: https://www.postgresql.org/docs/13/static/
Process: 28095 ExecStartPre=/usr/pgsql-13/bin/postgresql-13-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 28101 (postmaster)
Tasks: 8 (limit: 23144)
Memory: 16.9M
CGroup: /system.slice/postgresql-13.service
├─28101 /usr/pgsql-13/bin/postmaster -D /var/lib/pgsql/13/data/
├─28102 postgres: logger
├─28104 postgres: checkpointer
├─28105 postgres: background writer
├─28106 postgres: walwriter
├─28107 postgres: autovacuum launcher
├─28108 postgres: stats collector
└─28109 postgres: logical replication launcher
Mar 31 08:03:17 rocky8Zabbix systemd[1]: Starting PostgreSQL 13 database server...
Mar 31 08:03:17 rocky8Zabbix postmaster[28101]: 2024-03-31 08:03:17.692 KST [28101] LOG: redirecting log output to logging collector process
Mar 31 08:03:17 rocky8Zabbix postmaster[28101]: 2024-03-31 08:03:17.692 KST [28101] HINT: Future log output will appear in directory "log".
Mar 31 08:03:17 rocky8Zabbix systemd[1]: Started PostgreSQL 13 database server.
- PostgreSQL 13 계정 및 데이터베이스 생성 ※ zabbix 홈페이지에 나와 있는 zabbix 계정생성 및 db 생성
# sudo -u postgres createuser --pwprompt zabbix
# sudo -u postgres createdb -O zabbix zabbix
또는 postgres 로 로그인 하여
$ cd /usr/pgsql-13/bin/
$ psql
postgres=# create user zabbix password 'zabbix' superuser;
postgres=# create database zabbix owner zabbix;
혹시 몰라서 postgres 패스워드를 zabbix으로 변경
postgres=# alter user postgres with password 'zabbix';
- 계정생성 확인 ( zabbix 계정 superuser로 생성)
$ cd /usr/pgsql-13/bin/
$ psql
psql (13.14)
Type "help" for help.
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
zabbix | | {}
postgres=# ALTER USER zabbix superuser;
ALTER ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
zabbix | Superuser | {}
# /var/lib/pgsql/13/data/postgresql.conf
listen_addresses = 'localhost' 를 listen_addresses = '*' 로 변경
#/var/lib/pgsql/13/data/pg_hba.conf
host all all 0.0.0.0/0 scram-sha-256 / 줄 추가
host all all 0.0.0.0/0 trust / superuser 패스워드 없이 로그인시 줄 추가
(간혹 superuser 패스워드를 모를경우 trust mode로 설정하고
HeidiSQL 툴로 패스워드 없이 접속하여 superuser 초기 패스워드 변경)
# systemctl restart postgresql-13 / 시스템 환경 변경시 restart는 필수
- 데이터베이스 스키마 확인 (스키마 명 \dn명령어로 확인 public 으로 확인됨)
# su - postgres
Last login: Sun Mar 31 09:05:46 KST 2024 on pts/0
$ psql
psql (13.14)
Type "help" for help.
postgres=# \dn
List of schemas
Name | Owner
--------+----------
public | postgres
(1 row)
@ 소스코드 압축 풀기
# tar xvzpf nagios-4.4.6.tar.gz
@ /opt/nagios에 설치
# cd nagios-4.4.6
# ./configure --prefix=/opt/nagios
@ 컴파일 실행
# make all
@ 그룹 및 사용자 생성
# groupadd nagios
# useradd -g nagios nagios
@ 컴파일된 nagios 바이너리 설치
# make install
@ 퍼미션 조정
# make install-commandmode
/usr/bin/install -c -m 775 -o nagios -g nagios -d /opt/nagios/var/rw
chmod g+s /opt/nagios/var/rw
@ 예제 파일 설정
# make install-config
@ 아파치 설정 파일을 설치
# make install-webconf
/usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf
if [ 0 -eq 1 ]; then \
ln -s /etc/httpd/conf.d/nagios.conf /etc/apache2/sites-enabled/nagios.conf; \
fi
*** Nagios/Apache conf file installed ***
@ systemd 가 nagios 서비스를 제어할수 있도록 unit 파일을 만들고 확인
# make install-daemoninit
@ 상태 확인
# systemctl status nagios
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Docs: https://www.nagios.org/documentation
@ 웹페이지 계정 생성
# htpasswd -c /opt/nagios/etc/htpasswd.users nagios
New password:
Re-type new password:
Adding password for user nagios
@ apache 웹서버를 재실행하고, 재부팅
# systemctl restart httpd
# systemctl enable httpd
@ Nagios 플러그인 설치
# wget https://nagios-plugins.org/download/nagios-plugins-2.3.3.tar.gz
@압축을 풀고, 압축푼 디렉토리로 이동한다.
# tar xvzpf nagios-plugins-2.3.3.tar.gz
# cd nagios-plugins-2.3.3
# ./configure --prefix=/opt/nagios
# make
# make install
@ nagios 서비스 실행
# systemctl start nagios
# sudo apt-get update
# sudo apt-get install -y nvidia-docker2
# docker restart
# nvidia-docker
Usage: docker [OPTIONS] COMMAND
A self-sufficient runtime for containers
Common Commands:
run Create and run a new container from an image
exec Execute a command in a running container
ps List containers
build Build an image from a Dockerfile
pull Download an image from a registry
push Upload an image to a registry
images List images
login Log in to a registry
logout Log out from a registry
search Search Docker Hub for images
version Show the Docker version information
info Display system-wide information
서버의 일괄처리 작업(batch job)은 시간 단위나 일 단위 등의 방식으로 정기적인 일정에 따라 수행된다. 이와 같이 일괄처리 작업은 시작되고, 무언가 작업을 수행 한뒤, 종료된다. 지속적으로 동작하지 않기 때문에, 프로메테우스는 이러한 작업에 대한 정보를 정확하게 수집할수 없다. 그렇게 때문에 푸시게이트웨이가 필요하다.
# vi prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.56.128:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/etc/prometheus/alertmanager/rules/test_rule.yml"
- "/etc/prometheus/alertmanager/rules/alert_rules.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.56.128:9100"]
- targets: ["192.168.56.130:9100"]
- job_name: 'PostgreSQL_exporter'
static_configs:
- targets: ['192.168.56.130:9187', '192.168.56.128:9187']
# - targets: ['192.168.56.128:9187']
- job_name: 'jmx_exporter'
scrape_interval: 5s
static_configs:
- targets: ['192.168.56.130:8081']
- job_name: 'kubernetes_exporter'
static_configs:
- targets: ['192.168.56.10:9100']
- targets: ['192.168.56.101:9100']
- targets: ['192.168.56.102:9100']
- targets: ['192.168.56.103:9100']
- job_name: 'example'
static_configs:
- targets: ['192.168.56.128:8000']
- job_name: pushgateway
honor_labels: true
static_configs:
- targets: ['192.168.56.128:9091']
prometeus 서버 pushgateway 동작 확인
# pushgateway 저장 확인
pushgateway 테스트 python sample 파일작성 및 실행
# cat 4-12-pushgateway.py (python 실행파일 생성 vi로 편집)
from prometheus_client import CollectorRegistry, Gauge, pushadd_to_gateway
registry = CollectorRegistry()
duration = Gauge('my_job_duration_seconds',
'Duration of my batch job in seconds', registry=registry)
try:
with duration.time():
# Your code here.
pass
# This only runs if there wasn't an exception.
g = Gauge('my_job_last_success_seconds',
'Last time my batch job successfully finished', registry=registry)
g.set_to_current_time()
finally:
pushadd_to_gateway('192.168.56.128:9091', job='batch', registry=registry)
# python3 4-12-pushgateway.py (python 실행)
# cat 3-1-example.py
import http.server
from prometheus_client import start_http_server
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello World")
# http 서버 서비스 포트는 8001
# prometheus matric 수집 포트는 8000
if __name__ == "__main__":
start_http_server(8000)
server = http.server.HTTPServer(('192.168.56.128', 8001), MyHandler)
server.serve_forever()
# PIP3 설치 및 실행 (에러)
pip3 및 prometheus_client 설치가 안되어 있다고 함
# python3 3-1-example.py
Traceback (most recent call last):
File "/etc/prometheus/ep-examples-master/3/3-1-example.py", line 2, in <module>
from prometheus_client import start_http_server
ModuleNotFoundError: No module named 'prometheus_client'
# pip3 --version
-bash: pip3: command not found
prometheus.yml 파일에 등록 및 prometheus 재기동 (주의 : 8000번 포트로 등록)
# vi prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'example'
static_configs:
- targets: ['192.168.56.128:8000']
# systemctl restart prometheus
# systemctl status prometheus
● prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2024-03-09 09:34:02 KST; 7s ago
Main PID: 3320 (prometheus)
Tasks: 8 (limit: 2219)
Memory: 63.8M
CPU: 1.698s
CGroup: /system.slice/prometheus.service
└─3320 /usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path />
파일 실행 (*.9093/*.9094 LISTEN, node_exporter) 은 되지만 서버 리부팅 등 이후 수동으로 재기동 해줘야 됨
# 프로세서를 리눅스 서비스로 등록하는 방법(systemctl)
해당 alertmanager 파일 관리계정 및 파일 복사 준비
# cd /etc/prometheus/alertmanager/
# pwd
/etc/prometheus/alertmanager/
# ls -al
total 65932
drwxr-xr-x 4 prometheus prometheus 4096 Mar 3 10:05 .
drwxr-xr-x 5 prometheus prometheus 4096 Mar 3 07:31 ..
-rwxr-xr-x 1 prometheus prometheus 37345962 Feb 28 20:52 alertmanager <-- 이 실행파일을 리눅스 서비스로 만든다
-rw-r--r-- 1 prometheus prometheus 356 Feb 28 20:55 alertmanager.yml
-rwxr-xr-x 1 prometheus prometheus 30130103 Feb 28 20:52 amtool
drwxr-xr-x 2 root root 4096 Mar 3 09:41 data
-rw-r--r-- 1 prometheus prometheus 11357 Feb 28 20:55 LICENSE
-rw-r--r-- 1 prometheus prometheus 457 Feb 28 20:55 NOTICE
drwxr-xr-x 2 prometheus prometheus 4096 Mar 3 09:41 rules
# User 추가
# useradd -M -r -s /bin/false alertmanager
# User 추가 확인
# cat /etc/passwd
alertmanager:x:995:994::/home/alertmanager:/bin/false
# 실행 파일을 /usr/local/bin으로 경로 이동
# cp alertmanager /usr/local/bin/
# 유저, 그룹 권한 추가
# cd /usr/local/bin
# chown alertmanager:alertmanager /usr/local/bin/alertmanager
리눅스 서비스 등록
# cd /etc/systemd/system
# vi alertmanager.service
# 아래 내용을 추가
[Unit]
Description=alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager
[Install]
WantedBy=multi-user.target
# 파일 퍼미션 변경
# chmod 744 alertmanager.service
리눅스 서비스 동작상태 확인
# systemctl daemon-reload
# systemctl stop alertmanager.service
# systemctl enable alertmanager.service
# systemctl start alertmanager.service
# systemctl status alertmanager.service
× alertmanager.service - alertmanager
Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2024-03-03 10:09:13 KST; 3min 29s ago
Main PID: 12922 (code=exited, status=1/FAILURE)
CPU: 66ms
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.098Z caller=main.go:182 level=info build_context="(go=go1.21.7, platform=linux/amd64, user=root@22cd11f671e9, date=20240228-11:51:20, tags=netgo)"
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.104Z caller=cluster.go:186 level=info component=cluster msg="setting advertise address explicitly" addr=10.0.2.15 port=9094
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.110Z caller=cluster.go:683 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.130Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=alertmanager.yml
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.130Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=alertmanager.yml err="open alertmanager.yml: no such file or directory"
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.130Z caller=cluster.go:692 level=info component=cluster msg="gossip not settled but continuing anyway" polls=0 elapsed=20.021084ms
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.130Z caller=silence.go:442 level=info component=silences msg="Creating shutdown snapshot failed" err="open data/silences.51ab1e5945c48bff: permission denied"
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.131Z caller=nflog.go:362 level=error component=nflog msg="Creating shutdown snapshot failed" err="open data/nflog.5acef5dc6432c333: permission denied"
Mar 03 10:09:13 servidor systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Mar 03 10:09:13 servidor systemd[1]: alertmanager.service: Failed with result 'exit-code'.
서비스 faild 떨어짐 (환경변수 파일을 찾을수 없어 수정)
# failed 로그를 보면 alertmanager.yml 파일을 찾을수 없다고 나옴
Mar 03 10:09:13 servidor alertmanager[12922]: ts=2024-03-03T01:09:13.130Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=alertmanager.yml
# servie 파일 다시 수정
# vi alertmanager.service
[Unit]
Description=alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/prometheus/alertmanager/alertmanager.yml <--- yml 파일 추가
[Install]
WantedBy=multi-user.target
서비스 재확인
# systemctl stop alertmanager.service
# systemctl start alertmanager.service
# systemctl enable alertmanager.service
# systemctl status alertmanager.service
● alertmanager.service - alertmanager
Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2024-03-03 11:19:33 KST; 16s ago
Main PID: 14007 (alertmanager)
Tasks: 7 (limit: 2219)
Memory: 13.1M
CPU: 146ms
CGroup: /system.slice/alertmanager.service
└─14007 /usr/local/bin/alertmanager --config.file /etc/prometheus/alertmanager/alertmanager.yml
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.083Z caller=main.go:181 level=info msg="Starting Alertmanager" version="(version=0.27.0, branch=HEAD, revision=0aa3c2aad14cff039931923ab16b26b7481783b5)"
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.083Z caller=main.go:182 level=info build_context="(go=go1.21.7, platform=linux/amd64, user=root@22cd11f671e9, date=20240228-11:51:20, tags=netgo)"
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.091Z caller=cluster.go:186 level=info component=cluster msg="setting advertise address explicitly" addr=10.0.2.15 port=9094
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.094Z caller=cluster.go:683 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.120Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/prometheus/alertmanager/alertmanager.yml
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.121Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/prometheus/alertmanager/alertmanager.yml
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.123Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9093
Mar 03 11:19:33 servidor alertmanager[14007]: ts=2024-03-03T02:19:33.123Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9093
Mar 03 11:19:35 servidor alertmanager[14007]: ts=2024-03-03T02:19:35.096Z caller=cluster.go:708 level=info component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.002291107s
Mar 03 11:19:43 servidor alertmanager[14007]: ts=2024-03-03T02:19:43.124Z caller=cluster.go:700 level=info component=cluster msg="gossip settled; proceeding" elapsed=10.029680225s
파일 실행 (*.9100 LISTEN, node_exporter) 은 되지만 서버 리부팅 등 이후 수동으로 재기동 해줘야 됨
# 프로세서를 리눅스 서비스로 등록하는 방법(systemctl)
해당 node_exporter 파일 관리계정 및 파일 복사 준비
# cd /root/node_exporter-1.5.0
# pwd
/root/node_exporter-1.5.0
# ls -al
total 19340
-rwxr-xr-x. 1 3434 3434 19779640 Nov 30 2022 node_exporter
# User 추가
# useradd -M -r -s /bin/false node_exporter
# User 추가 확인
# cat /etc/passwd
node_exporter:x:993:987::/home/node_exporter:/bin/false
# 실행 파일을 /usr/local/bin으로 경로 이동
# cp node_exporter /usr/local/bin/
# 유저, 그룹 권한 추가
# cd /usr/local/bin
# chown node_exporter:node_exporter /usr/local/bin/node_exporter
리눅스 서비스 등록
# cd /etc/systemd/system
# vi node_exporter.service
# 아래 내용 넣기
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
#ExecStart=/root/prometheus/node_exporter/node_exporter
#ExecStart=/root/node_exporter-1.5.0/node_exporter
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
# 파일 퍼미션 변경
# chmod 744 node_exporter.service
리눅스 서비스 동작상태 확인
# systemctl daemon-reload
# systemctl stop node_exporter.service
# systemctl enable node_exporter.service
# systemctl start node_exporter.service
# systemctl status node_exporter.service
● node_exporter.service - Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2024-03-03 09:15:16 KST; 22min ago
Main PID: 5085 (node_exporter)
Tasks: 6 (limit: 24909)
Memory: 11.9M
CGroup: /system.slice/node_exporter.service
└─5085 /usr/local/bin/node_exporter
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=thermal_zone
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=time
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=timex
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=udp_queues
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=uname
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=vmstat
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=xfs
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.538Z caller=node_exporter.go:117 level=info collector=zfs
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.539Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9100
Mar 03 09:15:16 centos8 node_exporter[5085]: ts=2024-03-03T00:15:16.539Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9100
# netstat -ntpa |grep LISTEN
tcp6 0 0 :::9100 :::* LISTEN 5275/node_exporter
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
77d5896ee529 quay.io/prometheus/alertmanager "/bin/alertmanager -…" 42 minutes ago Up 7 minutes 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp alertmanager
d5e072461359 quay.io/prometheuscommunity/postgres-exporter "/bin/postgres_expor…" 43 hours ago Up 2 hours 0.0.0.0:9187->9187/tcp, :::9187->9187/tcp postgres-exporter
e841da551b71 postgres "docker-entrypoint.s…" 5 days ago Up 2 hours 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp postgres
Docker 세부정보 확인
# docker 계정정보 확인 (컨테이너 id가 77d5896ee529 ---> alertmanager docker)
# docker exec 77d5896ee529 cat /etc/passwd
root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/false
bin:x:2:2:bin:/bin:/bin/false
sys:x:3:3:sys:/dev:/bin/false
sync:x:4:100:sync:/bin:/bin/sync
mail:x:8:8:mail:/var/spool/mail:/bin/false
www-data:x:33:33:www-data:/var/www:/bin/false
operator:x:37:37:Operator:/var:/bin/false
nobody:x:65534:65534:nobody:/home:/bin/false
# docker home 디렉토리 확인
/# docker exec 77d5896ee529 pwd
/alertmanager
# docker env 정보 확인
# docker exec 77d5896ee529 env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=77d5896ee529
HOME=/home
# docker exec 77d5896ee529 uname -a
Linux 77d5896ee529 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 GNU/Linux
# # docker exec 77d5896ee529 uname -s
Linux
# docker exec 77d5896ee529 uname -r
5.15.0-83-generic
# docker exec 77d5896ee529 uname -v
92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023
# docker exec 77d5896ee529 cat /proc/version
Linux version 5.15.0-83-generic (buildd@lcy02-amd64-027) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023
Docker 컨테이너 접속시 Error
/bin/bash 지원하지 않을 경우 /bin/sh로 접속
# docker exec -it 77d5896ee529 /bin/bash
OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
# docker exec 77d5896ee529 env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=77d5896ee529
HOME=/home
# docker exec 77d5896ee529 ls /bin/bash # env로 볼때 /bin/bash가 없음
ls: /bin/bash: No such file or directory
# docker exec 77d5896ee529 ls /bin/sh # sh는 있는 지 확인
/bin//sh
# docker exec -it 77d5896ee529 /bin/sh # sh로 접속
/alertmanager $
# docker run --name alertmanager -d -p 9093:9093 quay.io/prometheus/alertmanager
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
77d5896ee529 quay.io/prometheus/alertmanager "/bin/alertmanager -…" 5 minutes ago Up 5 minutes 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp alertmanager
# docmer images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/prometheus/alertmanager latest 11f11916f8cd 2 days ago 70.3MB
web페이지 확인 http://192.168.56.128:9093
# prometheus 서버에 alertmanager 설정
# cat /etc/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.56.128:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/etc/prometheus/alertmanager/rules/test_rule.yml"
- "/etc/prometheus/alertmanager/rules/alert_rules.yml"
# cat /etc/prometheus/alertmanager/rules/test_rule.yml
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# Alert for any instance that has a median request latency >1s.
- alert: APIHighRequestLatency
expr: api_http_request_latencies_second{quantile="0.5"} > 1
for: 10m
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
# cat alert_rules.yml
groups:
- name: alert.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- alert: HostOutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 2m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostMemoryUnderMemoryPressure
expr: rate(node_vmstat_pgmajfault[1m]) > 1000
for: 2m
labels:
severity: warning
annotations:
summary: "Host memory under memory pressure (instance {{ $labels.instance }})"
description: "The node is under heavy memory pressure. High rate of major page faults\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
# Please add ignored mountpoints in node_exporter parameters like
# "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/)".
# Same rule using "node_filesystem_free_bytes" will fire when disk fills for non-root users.
- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
for: 2m
labels:
severity: warning
annotations:
summary: "Host out of disk space (instance {{ $labels.instance }})"
description: "Disk is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostHighCpuLoad
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 0m
labels:
severity: warning
annotations:
summary: "Host high CPU load (instance {{ $labels.instance }})"
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
가상화 서버(VM) 2대 ( Prometheus, Grafana 서버 1대, 연동 클라이언트 서버 1대) 를 설치하고 수집데이터를 Prometheus로 연동하고 연동한 데이터를 가지고 Grafana로 대쉬보드를 만들어 본다 - node_exporter (서버, 쿠버네티스, 도커 등) 데이터 수집 - postgresql_exporter : postgresql 데이터 수집 - jmx_exporter : tomcat 데이터 수집
기본 용어
Exporter ? prometheus Exporter는 메트릭 정보를 수집하는 모티터링 할 서버, 에이전트, 데몬 등을 대상시스템에서 메트릭을 수집하고 HTTP 엔드포인트(default: /metrics)에 노출시키는 소프트웨어
메트릭이란 ? : 메트릭(Metric) 은 현재 시스템의 상태를 알수 있는 측정값이다. 인프라 환경에서는 크게 2가지 상태롤 메트릭을 구분. CPU와 메모리 사용량을 나타내는 시스템 메트릭(System Metric), HTTP 상태 코드 같은 서비스 상태를 나태내는 지표인 서비스 메트릭(Service Metri)
시계열 데이터베이스 ? 시계열 데이터베이스는 시간을 축(키)으로 시간의 흐름에 따라 발생하는 데이터를 저장하는 데 최적화된 데이터베이스. 예를 들어 네트워크 흐름을 알 수 있는 패킷과 각종 기기로부터 전달받는 IoT 센서 값, 이벤트 로그 등이 있다.
목표시스템 구성도
Prometheus - Grafana 서버 설치 (vagrant로 설치)
OracleVM 과 Vagrant 는 미리 설치되어 있어야 함 (서버 설치는 Vagrant 로 설치)
# Ubuntu 22.04
Vagrant.configure("2") do |config|
config.vm.box = "davidurbano/prometheus-grafana"
end
# CentOS 8
Vagrant.configure("2") do |config|
config.vm.box = "centos/8"
end
# java -version
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
2. 설치 가능한 java 버젼 확인
# yum list java*jdk-devel
Last metadata expiration check: 1:45:54 ago on Thu 29 Feb 2024 05:22:45 PM KST.
Installed Packages
java-11-openjdk-devel.x86_64 1:11.0.13.0.8-4.el8_5 @appstream
Available Packages
java-1.8.0-openjdk-devel.x86_64 1:1.8.0.312.b07-2.el8_5 appstream
java-17-openjdk-devel.x86_64 1:17.0.1.0.12-2.el8_5 appstream
3. 원하는 버젼 설치
# yum install -y java-17-openjdk-devel.x86_64
Last metadata expiration check: 1:47:33 ago on Thu 29 Feb 2024 05:22:45 PM KST.
Dependencies resolved.
==================================================================================================================================================================================
Package Architecture Version Repository Size
==================================================================================================================================================================================
Installing:
java-17-openjdk-devel x86_64 1:17.0.1.0.12-2.el8_5 appstream 5.1 M
Installing dependencies:
java-17-openjdk x86_64 1:17.0.1.0.12-2.el8_5 appstream 244 k
java-17-openjdk-headless x86_64 1:17.0.1.0.12-2.el8_5 appstream 41 M
Transaction Summary
==================================================================================================================================================================================
Install 3 Packages
..
..
..
Installed:
java-17-openjdk-1:17.0.1.0.12-2.el8_5.x86_64 java-17-openjdk-devel-1:17.0.1.0.12-2.el8_5.x86_64 java-17-openjdk-headless-1:17.0.1.0.12-2.el8_5.x86_64
Complete!
4. Default Java 변경하기
# /usr/sbin/alternatives --config java
There are 2 programs which provide 'java'.
Selection Command
-----------------------------------------------
*+ 1 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-4.el8_5.x86_64/bin/java)
2 java-17-openjdk.x86_64 (/usr/lib/jvm/java-17-openjdk-17.0.1.0.12-2.el8_5.x86_64/bin/java)
Enter to keep the current selection[+], or type selection number: 2
5. 환경변수 재설정
# java -version
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
# echo $JAVA_HOME
/usr/bin/javac
# vi /etc/profile
--> 삭제 JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.275.b01-0.el7_9.i386
JAVA_HOME=/usr/lib/jvm/java-17-openjdk-17.0.1.0.12-2.el8_5.x86_64
export JAVA_HOME
PATH=$PATH:$JAVA_HOME/bin
export PATH
6. 자바 버젼 확인
# su - root
Last login: Thu Feb 29 19:44:10 KST 2024 on pts/0
# echo $JAVA_HOME
/usr/lib/jvm/java-17-openjdk-17.0.1.0.12-2.el8_5.x86_64
# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-17-openjdk-17.0.1.0.12-2.el8_5.x86_64/bin:/root/bin
# java -version
openjdk version "17.0.1" 2021-10-19 LTS
OpenJDK Runtime Environment 21.9 (build 17.0.1+12-LTS)
OpenJDK 64-Bit Server VM 21.9 (build 17.0.1+12-LTS, mixed mode, sharing)
# 마스터 노드 및 워크노드 정보
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m-k8s Ready master 3d5h v1.18.4 192.168.56.10 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 docker://18.9.9
w1-k8s Ready <none> 3d4h v1.18.4 192.168.56.101 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 docker://18.9.9
w2-k8s Ready <none> 3d4h v1.18.4 192.168.56.102 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 docker://18.9.9
w3-k8s Ready <none> 3d4h v1.18.4 192.168.56.103 <none> CentOS Linux 7 (Core) 3.10.0-1160.90.1.el7.x86_64 docker://18.9.9
# 파드 오브젝트 스펙 노드포트 서비스
# cat nodeport.yaml
apiVersion: v1
kind: Service # kind : Service
metadata: # Metadata 서비스의 이름
name: np-svc
spec: # Spec 셀렉터의 레이블 지정
selector:
app: np-pods
ports: # 사용할 프토토콜과 포트들을 지정
- name: http
protocol: TCP
port: 80
targetPort: 80
nodePort: 30000
type: NodePort # 서비스 타입을 설정
# 파드 생성
# kubectl create deployment np-pods --image=sysnet4admin/echo-hname
deployment.apps/np-pods created
# 파드 노드포트 서비스 생성
# kubectl create -f nodeport.yaml
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
np-pods-5767d54d4b-txwm4 1/1 Running 0 10m 172.16.103.169 w2-k8s <none> <none>
# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
np-svc NodePort 10.104.110.226 <none> 80:30000/TCP 7m32s
# kubectl create deployment in-hname-pod --image=sysnet4admin/echo-hname
deployment.apps/in-hname-pod created
# kubectl create deployment in-ip-pod --image=sysnet4admin/echo-ip
deployment.apps/in-ip-pod created
# kubectl apply -f ingress-nginx.yaml
namespace/ingress-nginx created
configmap/nginx-configuration created
configmap/tcp-services created
configmap/udp-services created
serviceaccount/nginx-ingress-serviceaccount created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole created
role.rbac.authorization.k8s.io/nginx-ingress-role created
rolebinding.rbac.authorization.k8s.io/nginx-ingress-role-nisa-binding created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding created
deployment.apps/nginx-ingress-controller created
limitrange/ingress-nginx created
# kubectl apply -f ingress-config.yaml
ingress.networking.k8s.io/ingress-nginx configured
# kubectl apply -f ingress.yaml
service/nginx-ingress-controller created
# kubectl expose deployment in-hname-pod --name=hname-svc-default --port=80,443
service/hname-svc-default exposed
# kubectl expose deployment in-ip-pod --name=ip-svc --port=80,443
service/ip-svc exposed
# kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
nginx-ingress-controller-5bb8fb4bb6-mnx88 1/1 Running 0 23s
# kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-nginx <none> * 80 40m
# kubectl get services -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-ingress-controller NodePort 10.101.20.235 <none> 80:30100/TCP,443:30101/TCP 20s
# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hname-svc-default ClusterIP 10.97.228.75 <none> 80/TCP,443/TCP 18s
ip-svc ClusterIP 10.108.49.235 <none> 80/TCP,443/TCP 9s
# kubectl get pods
NAME READY STATUS RESTARTS AGE
in-hname-pod-8565c86448-d8q9h 1/1 Running 0 2m40s
in-ip-pod-76bf6989d-j7pdk 1/1 Running 0 2m30s