Click here to load reader

Cassandra - lesson learned

  • View
    147

  • Download
    0

Embed Size (px)

Text of Cassandra - lesson learned

  • Cassandra - lesson learned

    Andrzej Ludwikowski

  • About me?- http://aludwikowski.blogspot.com/- https://github.com/aludwiko- @aludwikowski- SoftwareMill

  • Why cassandra?- BigData!!!

    - Volume (petabytes of data, trillions of entities)- Velocity (real-time, streams, millions of transactions per second)- Variety (un-, semi-, structured)

    - Near-linear horizontal scaling (in proper use cases)- Fully distributed, with no single point of failure

    - Data replication- By default

  • What is cassandra vs CAP?- CAP Theorem - pick two

  • What is cassandra vs CAP?- CAP Theorem - pick two

  • What is cassandra vs CAP?- CAP Theorem - pick two

  • Origins?

    2010

  • Name?

  • Name?

  • Write path

    Node 1

    Node 2

    Node 3

    Node 4

    Client (driver)

  • Write path

    Node 1

    Node 2

    Node 3

    Node 4

    Client (driver)

    - Any node can coordinate any request (NSPOF)

  • - Any node can coordinate any request (NSPOF)- Replication Factor

    Write path

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=3

  • - Any node can coordinate any request (NSPOF)- Replication Factor- Consistency Level

    Write path

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=3

    CL=2

  • - Token ring from -2^63 to 2^64

    Write path - consistent hashing

    Node 1

    Node 2

    Node 3

    Node 4

    0100

  • - Token ring from -2^63 to 2^64 - Partitioner: partition key -> token

    Write path - consistent hashing

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    Partitioner

    0-25

    25-5051-75

    76-10077

  • - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token

    Write path - consistent hashing

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    Partitioner

    0-25

    25-5051-75

    76-100

    77

  • - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token

    Write path - consistent hashing

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    Partitioner

    0-25

    25-5051-75

    76-100

    77

    77

    77

  • - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token

    Write path - consistent hashing

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-25

    Partitioner

    77

    25-5051-75

    76-100

    77

    77

  • DEMO

  • Write path - problems?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-2577

    25-5051-75

    76-100

    77

    77

  • - Hinted handoff

    Write path - problems?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-2577

    25-5051-75

    76-100

    77

    77

  • - Hinted handoff- Retry idempotent inserts

    - build-in policies

    Write path - problems?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-2577

    25-5051-75

    76-100

    77

    77

  • - Hinted handoff- Retry idempotent inserts

    - build-in policies

    - Lightweight transactions (Paxos)

    Write path - problems?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-2577

    25-5051-75

    76-100

    77

    77

  • - Hinted handoff- Retry idempotent inserts

    - build-in policies

    - Lightweight transactions (Paxos)- Batches

    Write path - problems?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    0-2577

    25-5051-75

    76-100

    77

    77

  • Write path - node level

  • Write path - why so fast?- Commit log - append only

  • Write path - why so fast?

  • Write path - why so fast?

    50,000 t/s 50 t/ms 5 t/100us 1 t/20us

  • Write path - why so fast?- Commit log - append only- Periodic (10s) or batch sync to disk

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=2

    CL=2

  • Dasdd Rack 2

    Rack 1

    Write path - why so fast?- Commit log - append only- Periodic or batch sync to disk- Network topology aware

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=2

    CL=2

  • Write path - why so fast?

    Client

    - Commit log - append only- Periodic or batch sync to disk- Network topology aware

    Asia DC

    Europe DC

  • - Most recent win- Eager retries- In-memory

    - MemTable- Row Cache- Bloom Filters- Key Caches- Partition Summaries

    - On disk- Partition Indexes- SSTables

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=3

    CL=3

    Read path

    timestamp 67

    timestamp 99

    timestamp 88

  • Immediate vs. Eventual Consistency- if (writeCL + readCL) > replication_factor then immediate consistency- writeCL=ALL, readCL=1- writeCL=1, readCL=ALL- writeCL,readCL=QUORUM- If "stale" is measured in milliseconds,

    how much are those milliseconds worth?

    Node 1

    Node 2

    Node 3

    Node 4

    Client

    RF=3

  • Modeling - new mindset- QDD, Query Driven Development- Nesting is ok- Duplication is ok- Writes are cheap

  • QDD - Conceptual model- Technology independent- Chen notation

  • QDD - Application workflow

  • QDD - Logical model

    - Chebotko diagram

  • QDD - Physical model

    - Technology dependent- Analysis and validation (finding problems)- Physical optimization (fixing problems)- Data types

  • Physical storage

    - Primary key- Partition key

    CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id));

    id | title | runtime | year----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994

    1title runtime year

    dzien swira 93 2002

    2title runtime year

    chlopaki... 96 2000

    3title runtime year

    psy 104 1992

    4title runtime year

    psy 2 96 1994

    SELECT FROM videosWHERE title = dzien swira

  • Physical storage

    CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year));

    - Primary key (could be compound)- Partition key- Clustering column (order, uniqueness)

    title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104

    godzilla1954 runtime

    98

    1998 runtime

    140

    2014 runtime

    123

    1992 runtime

    104psy

    SELECT FROM videos_with_clusteringWHERE title = godzilla

  • Physical storage

    CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)));

    - Primary key (could be compound)- Partition key (could be composite)- Clustering column (order, uniqueness)

    title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104

    godzilla:1954runtime

    93

    godzilla:1998runtime

    140

    godzilla:2014runtime

    123

    psy:1992runtime

    104

    SELECT FROM videos_with_composite_pkWHERE title = godzillaAND year = 1954

  • Modeling - clustering column(s)

    Q: Retrieve videos an actor has appeared in (newest first).

  • Modeling - clustering column(s)

    CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen, tags set, title text, user_id uuid, PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );

    Q: Retrieve videos an actor has appeared in (newest first).

  • Modeling - clustering column(s)

    CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen, tags set, title text, user_id uuid, PRIMARY KEY ((actor), added_date)) WITH CLUSTERING ORDER BY (added_date desc);

    Q: Retrieve videos an actor has appeared in (newest first).

  • Modeling - clustering column(s)

    CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen, tags set, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id)) WITH CLUSTERING ORDER BY (added_date desc);

    Q: Retrieve videos an actor has appeared in (newest first).

  • Modeling - clustering column(s)

    CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen, tags set, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name)) WITH CLUSTERING ORDER BY (added_date desc);

    Q: Retrieve videos an actor has appeared in (newest first).

  • Modeling - compound partition key

    CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );

    Q: Retrieve last 1000 measurement from given day.

  • Modeling - compound partition key

    CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time)) WITH CLUSTERING

Search related