Upload
vsevolod-polyakov
View
432
Download
3
Embed Size (px)
Citation preview
МОНИТОРИНГ. ОПЯТЬ.Всеволод Поляков
Что такое метрики?
Успешность
Количество
Время
Взаимодействие
Внутренние процессы
Системные метрики
Зачем нужны метрики?
Алерты
Аналитика
Graphite
Default graphite architecture
what?• RRD-like (gram.ly/gfsx)
• so.it.is.my.metric → /so/it/is/my/metric.wsp
• Fixed retention (by name\pattern)
• Fixed size (actually no)
Retention and size• 1s:1d → 1 036 828 bytes
• 10s:10d → 1 036 828 bytes
• 1s:365d → 378 432 028 bytes (1 TB ~ 3 000)
• 10s:365d → 37 843 228 bytes (1 TB ~ 30 000)
whisper calc
Retention and size• 10s:30d,1m:120d,10m:365d → 4 564 864 bytes
• 240 864 metrics in 1 TB
• aggregation: average, sum, min, max, and last.
• can be assign per metric
How• terraform (https://www.terraform.io/)
• docker (https://www.docker.com/)
• ansible (https://www.ansible.com/)
• rocker (https://github.com/grammarly/rocker)
• rocker-compose (https://github.com/grammarly/rocker-compose)
Default graphite architecture
carbon-cache.py
• single-core
• many options in config file
• default
link
architecturecarbon-cache.py
Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)
• retentions = 1s:1d
• MAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND, MAX_CREATES_PER_MINUTE = inf
• defaults
• almost 1.5h to get limit :(
carbon-cache.py cache size → 75k m\s
results
• 75 000 m\s max
• 60 000 m\s flagman speed
• I\O :(
Try to tune!
• WHISPER_SPARSE_CREATE = true (don’t allocate space on creation) non-linear I\O load.
• CACHE_WRITE_STRATEGY = sorted (default)
cache size 1k → 195k m\s
results
• 120 000 m\s flagman speed • cache flush problem :(
Try to tune!
• CACHE_WRITE_STRATEGY = max will give a strong flush preference to frequently updated metrics and will also reduce random file-io.
from 1k to 150k
results
• 90 000 m\s flagman speed • cache flush problem :(
Try to tune!
• CACHE_WRITE_STRATEGY = naive just flush. Better with random I\O.
from 45k to 135k
results
• 120 000 m\s flagman speed • still CPU
sorted
max
naive
• Maybe it’s I\O EBS limitation? → 512 GB disk.
• No.
go-carbon
• multi-core single daemon
• written in golang
• not many options to tune :(
link
Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)
• retentions = 1s:1d
• max-size = 0
• max-updates-per-second = 0
• almost 1h to get limit :(
1k → 130k m\s ~3k/min
results• 120 000 m\s flagman speed • but it’s without sparse. • try to implement
try to tune! remaining := whisper.Size() - whisper.MetadataSize() whisper.file.Seek(int64(remaining-1), 0) whisper.file.Write([]byte{0}) chunkSize := 16384 zeros := make([]byte, chunkSize) for remaining > chunkSize { // if _, err = whisper.file.Write(zeros); err != nil { // return nil, err // } remaining -= chunkSize } if _, err = whisper.file.Write(zeros[:remaining]); err != nil { return nil, err }
Уже есть в go-carbon
180 000 m\s !
try to tune!
• max update operation = 1500
results
• TLDR 210 000 - 240 000 m\s flagman speed
• 31 000 000 cache size!
try to tune!
• max update operation = 0
• input-buffer = 400 000
results
• 270 000 m\s flagman speed
• 10-20kk cache size!
try to tune!
• vm.dirty_background_ratio=40
• vm.dirty_ratio=60
300 000 req\s
results
• 300 000 m\s flagman speed
• 180k+ m\s ±without cache
Re:Lays
Default graphite architecture
arch forward
arch named\regexp
arch hash
arch hash replicafactor: 2
carbon-relay.py
• twisted based
• native
Start load testing• c4.xlarge instance (4 CPU, 7.5 GB ram)
• ~1 Gb lan
• default parameters
• hashing
• 10 connections
WTF!
carbon-relay-ng• golang-based
• web-panel
• live-updates
• aggregators
• spooling
link
<150 000 req\s
carbon-c-relay
• написан на C
• advanced cluster management
from 100 000 to 1 600 000 req\s
1 400 000 flagman speed. Or not?
Итак…go-carbon + carbon-c-relay = ♡
Контейнеры
Всё перепутано
Различия• Окружение
• Роль
• Трек (Модификатор)
• IP
• Датацентр
• Что-угодно
Теги
TSDB с тегами
• influxDB
• openTSDB (hbase)
• cyanite (cassandra)
• newTS (cassandra)
• Prometheus
(cluster) influx, 130k metric\sувеличить график
openTSDB single instance + hbase cluster = upto 150k metric\s
Compaction
Graphite
Найти уникальное
Работает с Grafana
Zipper
• https://github.com/grobian/carbonserver
• https://github.com/dgryski/carbonzipper
• https://github.com/dgryski/carbonapi
ALSO
• https://github.com/jssjr/carbonate
• https://github.com/jjneely/buckytools
• https://github.com/dgryski/carbonmem
• https://github.com/grobian/carbonwriter
Планы
• Патч statsd → ES
• Патч carbonserver → carbonlink
feel free to ask• Vsevolod Polyakov
• skype: ctrlok1987
• github.com/ctrlok
• twitter.com/ctrlok
• slack: HangOps
• Gitter: dev_ua/devops
• skype: DevOps from Ukraine
• slack.ukrops.club
Мы хайрим!