101
Troubleshooting Redis @charsyam KAKAO

Troubleshooting Redis- DaeMyung Kang, Kakao

Embed Size (px)

Citation preview

Page 1: Troubleshooting Redis- DaeMyung Kang, Kakao

Troubleshooting Redis

@charsyam

KAKAO

Page 2: Troubleshooting Redis- DaeMyung Kang, Kakao

About me•Senior Software Engineer in KAKAO

•Redis/Twemproxy Contributor

•Redis-doc project merger.

•Apache Tajo Commiter

Page 3: Troubleshooting Redis- DaeMyung Kang, Kakao

Kakaostory

Page 4: Troubleshooting Redis- DaeMyung Kang, Kakao

Kakaostory

DAU: 8MMAU: 15M

Page 5: Troubleshooting Redis- DaeMyung Kang, Kakao

Kakaostory

420M API CALL COUNT

Page 6: Troubleshooting Redis- DaeMyung Kang, Kakao

Kakaostory Service Stack• For Storage

•MariaDB(Master/Slave for HA)• Hbase• Cassandra

• For Cache• Redis•Arcus

• (Memcached variant, opensource, supporting collections)

Page 7: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis5.2TB, 274 Servers

(Arcus: 3.3TB, 137 Servers)

Page 8: Troubleshooting Redis- DaeMyung Kang, Kakao

Why Redis?•As lookaside Cache for service data•Example)•User Profile Information•Feeds•Activities•Friends•Notifications

Page 9: Troubleshooting Redis- DaeMyung Kang, Kakao

Agenda•Single Threaded

•Memory Fragmentation

•Redis Troubleshooting cases

•Redis Monitoring

•Redis HA

Page 10: Troubleshooting Redis- DaeMyung Kang, Kakao

Single Threaded

Page 11: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Event Loop

Client #1

Client #2

……

Client #N

Redis Event Loop

I/O Multiplexing

ProcessCommand

command #1

command #2

Page 12: Troubleshooting Redis- DaeMyung Kang, Kakao

Only One Commandat Once

Page 13: Troubleshooting Redis- DaeMyung Kang, Kakao

Long-time Spendingoperations

Page 14: Troubleshooting Redis- DaeMyung Kang, Kakao

KEYSFlushAll/FlushDB

LUA ScriptMULTI/EXEC

Delete Collections

Page 15: Troubleshooting Redis- DaeMyung Kang, Kakao

Why slow?

Page 16: Troubleshooting Redis- DaeMyung Kang, Kakao

O(n)

Page 17: Troubleshooting Redis- DaeMyung Kang, Kakao

KEYS – Iterating all Keys

di = dictGetSafeIterator(c->db->dict);allkeys = (pattern[0] == '*' && pattern[1] == '\0');while((de = dictNext(di)) != NULL) {

……stringmatchlen(pattern,plen,key,sdslen(key),0)

}

Page 18: Troubleshooting Redis- DaeMyung Kang, Kakao

FlushAll – Deleting all itemsfor (i = 0; i < ht->size && ht->used > 0; i++) {

dictEntry *he, *nextHe;if ((he = ht->table[i]) == NULL) continue;while(he) {

nextHe = he->next;dictFreeKey(d, he);dictFreeVal(d, he);zfree(he);ht->used--;he = nextHe;

}}

Page 19: Troubleshooting Redis- DaeMyung Kang, Kakao

How slow?

Page 20: Troubleshooting Redis- DaeMyung Kang, Kakao

Command Item Count Time

flushall 1,000,000 1000ms(1 second)

FlushAll

Page 21: Troubleshooting Redis- DaeMyung Kang, Kakao

Delete collections

Item Count Time

list 1,000,000 1000ms(1 second)

set 1,000,000 1000ms(1 second)

Sorted set 1,000,000 1000ms(1 second)

hash 1,000,000 1000ms(1 second)

You can use Xscan commands from 2.8.x

Page 22: Troubleshooting Redis- DaeMyung Kang, Kakao

Using Multiple Instancesin a Physical Server(can use more cpus)

Page 23: Troubleshooting Redis- DaeMyung Kang, Kakao

Fork forCreating RDB,AOF Rewrite

Page 24: Troubleshooting Redis- DaeMyung Kang, Kakao

Maximum 2x MemoryDisk IO

CPU Load/Usage

Page 25: Troubleshooting Redis- DaeMyung Kang, Kakao

CPU 4 core, 32G Memory

Mem: 24G

Mem: 8G

Mem: 8G

Mem: 8G

more Reliable

Page 26: Troubleshooting Redis- DaeMyung Kang, Kakao

Set CPU Affinityusing taskset

Page 27: Troubleshooting Redis- DaeMyung Kang, Kakao

Divide NIC Interrupt CPUand Redis Process CPU

Page 28: Troubleshooting Redis- DaeMyung Kang, Kakao

Memory Fragmentation

Page 29: Troubleshooting Redis- DaeMyung Kang, Kakao

Memory Fragmentation #1Used_memory RSS

Page 30: Troubleshooting Redis- DaeMyung Kang, Kakao

Memory Fragmentation #2Used_memory RSS

Starting to use Arcus at this case

Page 31: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Troubleshooting Cases

Page 32: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #1KEYS

Page 33: Troubleshooting Redis- DaeMyung Kang, Kakao

Performance Spike

Page 34: Troubleshooting Redis- DaeMyung Kang, Kakao

INFO all# Commandstatscmdstat_psetex:calls=2326667,usec=9322929,usec_per_call=4.01……cmdstat_pexpire:calls=3695333,usec=10068580,usec_per_call=2.72cmdstat_keys:calls=249,usec=1000314022,usec_per_call=4017325.50cmdstat_ping:calls=27005,usec=30027,usec_per_call=1.11……

Page 35: Troubleshooting Redis- DaeMyung Kang, Kakao

Slowlog get 10

Page 36: Troubleshooting Redis- DaeMyung Kang, Kakao

rename KEYS Command

Page 37: Troubleshooting Redis- DaeMyung Kang, Kakao

Using Scan

Page 38: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Dict Structure

Page 39: Troubleshooting Redis- DaeMyung Kang, Kakao

Scan #1

Page 40: Troubleshooting Redis- DaeMyung Kang, Kakao

Scan #2

Page 41: Troubleshooting Redis- DaeMyung Kang, Kakao

Scan #3

Page 42: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #2All Write Commands Fail

Page 43: Troubleshooting Redis- DaeMyung Kang, Kakao

“MISCONF Redis is configured to save RDB

snapshots, but is currently not able to persist on

disk. Commands that may modify the data set are

disabled. Please check Redis logs for details about

the error.”

Page 44: Troubleshooting Redis- DaeMyung Kang, Kakao

Reasonif (((server.stop_writes_on_bgsave_err &&

server.saveparamslen > 0 &&server.lastbgsave_status == C_ERR) ||server.aof_last_write_status == C_ERR) &&

server.masterhost == NULL &&(c->cmd->flags & CMD_WRITE ||c->cmd->proc == pingCommand))

{…

}

Page 45: Troubleshooting Redis- DaeMyung Kang, Kakao

config set stop-writes-on-bgsave-error no

Page 46: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #3Using Default Option

Page 47: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis as Cache

Page 48: Troubleshooting Redis- DaeMyung Kang, Kakao

SAVE 900 1SAVE 300 10SAVE 60 10000

Page 49: Troubleshooting Redis- DaeMyung Kang, Kakao

Heavy Disk IOHigh Cpu Load

with creating RDB

Page 50: Troubleshooting Redis- DaeMyung Kang, Kakao

Config set SAVE “”

Page 51: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #4Using Swap Memory

Page 52: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis using 28Gon single 32G machine

Page 53: Troubleshooting Redis- DaeMyung Kang, Kakao

Migrate or Restart

Page 54: Troubleshooting Redis- DaeMyung Kang, Kakao

Monitor Redis Serverand keep within bounds

Page 55: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #5Simultaneous AOF Rewrite

Page 56: Troubleshooting Redis- DaeMyung Kang, Kakao

A 256GB Single MachineRedis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Page 57: Troubleshooting Redis- DaeMyung Kang, Kakao

Simultaneous AOF RewriteRedis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite

AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite

Page 58: Troubleshooting Redis- DaeMyung Kang, Kakao

Stop all AOF Rewrites

Page 59: Troubleshooting Redis- DaeMyung Kang, Kakao

Turn off Automatic AOF Rewrite

Page 60: Troubleshooting Redis- DaeMyung Kang, Kakao

Config set auto-aof-rewrite-percentage 0

Page 61: Troubleshooting Redis- DaeMyung Kang, Kakao

Manually Run AOF Rewrite

Page 62: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #6Replication is Broken with

Network Line Failure

Page 63: Troubleshooting Redis- DaeMyung Kang, Kakao

All redis replication are broken

by Network line failure

Page 64: Troubleshooting Redis- DaeMyung Kang, Kakao

What Happensif network is recovered

Page 65: Troubleshooting Redis- DaeMyung Kang, Kakao

Replication

Master SlavereplicationCron

Health check Periodically

Page 66: Troubleshooting Redis- DaeMyung Kang, Kakao

All slaves automatically try to reconnect to

master.

Page 67: Troubleshooting Redis- DaeMyung Kang, Kakao

Slave of no one

Page 68: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #7Replication Failure

Page 69: Troubleshooting Redis- DaeMyung Kang, Kakao

Permission

Page 70: Troubleshooting Redis- DaeMyung Kang, Kakao

Memory Allocation Failsysctl vm.overcommit_memory=1

Page 71: Troubleshooting Redis- DaeMyung Kang, Kakao

Replication Failurewith OutputBufferSize

Page 72: Troubleshooting Redis- DaeMyung Kang, Kakao

Hard LimitSoft Limit

Page 73: Troubleshooting Redis- DaeMyung Kang, Kakao

config set client-output-buffer-limit "slave 1024mb 1024mb 60"

Page 74: Troubleshooting Redis- DaeMyung Kang, Kakao

Problem #8Hash Table Expansion

Page 75: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Dict – Hash Table Expansion #1

Page 76: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Dict – Hash Table Expansion #2

Page 77: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Dict – Hash Table Expansion #3

Page 78: Troubleshooting Redis- DaeMyung Kang, Kakao

Grows by twice

Page 79: Troubleshooting Redis- DaeMyung Kang, Kakao

Maxmemoryand

freeMemoryIfNeeded

Page 80: Troubleshooting Redis- DaeMyung Kang, Kakao

1 Billion items

Page 81: Troubleshooting Redis- DaeMyung Kang, Kakao

1,000,000,000 * 4 = 4G

Page 82: Troubleshooting Redis- DaeMyung Kang, Kakao

Maxmemory = 16GUsed_memory = 12G

Page 83: Troubleshooting Redis- DaeMyung Kang, Kakao

Hash Table Expansionis needed.

Page 84: Troubleshooting Redis- DaeMyung Kang, Kakao

4G * 2 = 8G.You need 20G(12G + 8G)

Page 85: Troubleshooting Redis- DaeMyung Kang, Kakao

20G > 16G(maxmemory)

Page 86: Troubleshooting Redis- DaeMyung Kang, Kakao

Need a feature that can Set Initial size of Hash

Table (Not Supported)

https://github.com/antirez/redis/pull/2812

Page 87: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Monitoring

Page 88: Troubleshooting Redis- DaeMyung Kang, Kakao

Monitoring is important as much as

Management

Page 89: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis Monitoring MetricsFactor System or Redis Info

CPU Usage, Load System

Network Inbound/outbound System

Client connectionsMaxclient setting

Info

Key sizeProcessed commands

Redis

Memory Usage, RSS(very Important)

Redis

Disk Usage, IO System

Expired Keys, Evicted Keys Redis

Page 90: Troubleshooting Redis- DaeMyung Kang, Kakao

Redis HA

Page 91: Troubleshooting Redis- DaeMyung Kang, Kakao

Using DNS for Failover

Page 92: Troubleshooting Redis- DaeMyung Kang, Kakao

Private Internal DNS Serverwith TTL 0

Page 93: Troubleshooting Redis- DaeMyung Kang, Kakao

DNS HA FlowDetect A

RedisFailure

ChangeB can write

Change DNS A with B

Send AClient Kill

New clientsWill connect to B

B Configrewrite

Page 94: Troubleshooting Redis- DaeMyung Kang, Kakao

JVMadd –Dsun.net.inetaddr.ttl=0

Page 95: Troubleshooting Redis- DaeMyung Kang, Kakao

twemproxyusing 0.4.1

Page 96: Troubleshooting Redis- DaeMyung Kang, Kakao

UsingCoordinator

Page 97: Troubleshooting Redis- DaeMyung Kang, Kakao

Zookeeper

Page 98: Troubleshooting Redis- DaeMyung Kang, Kakao

Zookeeper with Redis Information

Page 99: Troubleshooting Redis- DaeMyung Kang, Kakao

Zookeeper with RedisApplication Servers

ZooKeeper

RedisShard-1

RedisShard-2

RedisShard-3

Redis Cluster Monitor

Get Redis Shard Information

Health Check

Update ShardInfo

Event: Node Add or Remove, Master change

Page 100: Troubleshooting Redis- DaeMyung Kang, Kakao

Summary•Redis is Single Threaded

•Creating RDB or AOF Rewrite is expensive

•Don’t use KEYS command.

•Don’t use default redis configuration.

•Monitoring is very importatnt.

Page 101: Troubleshooting Redis- DaeMyung Kang, Kakao

Thanks