architecture et modèle de données cassandra

71
2013 © Trivadis BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 2013 © Trivadis Architecture et modèle de données Cassandra Genève 26.01.2015 Ulises Fasoli Senior Consultant Trivadis AG January 2016 Architecture et modèle de données Cassandra 1

Upload: claude-alain-glauser

Post on 10-Feb-2017

358 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Architecture et modèle de données Cassandra

2013 © Trivadis

BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

2013 © Trivadis

Architecture et modèle de données Cassandra

Genève 26.01.2015

Ulises Fasoli

Senior Consultant

Trivadis AG

January 2016

Architecture et modèle de données Cassandra

1

Page 2: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

2

Page 3: Architecture et modèle de données Cassandra

2013 © Trivadis

History of Databases

1960s File-based, Network (CODASYL) and Hierarchical Databases

1970s Relational Database

1980 SQL became the standard query language

Early 1990 Object-Databases

Late 1990 XML Databases

2004 NoSQL Databases

January 2016

Architecture et modèle de données Cassandra

3

Page 4: Architecture et modèle de données Cassandra

2013 © Trivadis

What‘s wrong with Relational Databases ?

• SQL provides a rich, declarative query language

• Database enforce referential integrity

• ACID semantics

• Well understood by developers, database administrators

• Well supported by different languages, frameworks and tools

• Hibernate, JPA, JDBC, iBATIS, Entity Framework

• Well understood and accepted by operations people (DBAs)

• Configuration

• Monitoring

• Backup and Recovery

• Tuning

• Design

January 2016

Architecture et modèle de données Cassandra

4

They are great ….

Page 5: Architecture et modèle de données Cassandra

2013 © Trivadis

Relational Databases are great ... But!

New trends

Big Data

Concurrency

Connectivity

Diversity

P2P Knowledge

Cloud/Grid

January 2016

Architecture et modèle de données Cassandra

5

Page 6: Architecture et modèle de données Cassandra

2013 © Trivadis

Relational Databases are great ... But!

Problem: Complex Object Graphs

Object/Relational impedance mismatch

Complicated to map rich domain model

to relational schema

Performance issues

• Many rows in many tables

• Many joins

• Eager vs. lazy loading

ORDER

ADDRESS

CUSTOMER

ORDER_LINES

Order

ID: 1001

Order Date: 15.9.2012

Line Items

Customer

First Name: Peter

Last Name: Sample

Billing Address

Street: Somestreet 10

City: Somewhere

Postal Code: 55901

Name

Ipod Touch

Monster Beat

Apple Mouse

Quantity

1

2

1

Price

220.95

190.00

69.90

January 2016

Architecture et modèle de données Cassandra

6

Page 7: Architecture et modèle de données Cassandra

2013 © Trivadis

Relational Databases are great ... But!

Problem: Schema evolution

Adding attributes to an object => have to add columns to table

Expensive, if lots of data in that table

Holding locks on the tables for long time

What if new values should be mandatory, cannot enforce NOT NULL

constraint

Application downtime …

January 2016

Architecture et modèle de données Cassandra

7

Page 8: Architecture et modèle de données Cassandra

2013 © Trivadis

Relational Databases are great ... But!

Problem: Semi-structured data

Relational schema doesn‘t easily handle semi-structured data

Common solutions

Name/Value table

- Poor performance

- Lack of constraint

Serialize as Blob

- Fewer joins, but no query capabilities

January 2016

Architecture et modèle de données Cassandra

8

Page 9: Architecture et modèle de données Cassandra

2013 © Trivadis

RDBMS

Database

Relational Databases are great ... But!

Problem: Scaling

Scaling writes difficult/expensive/impossible => Big Data

Scaling a relational database:

Vertical scaling is limited and is expensive

Horizontal scaling is limited and is expensive

RDBMS

DatabaseRDBMS

Database

RDBMS

DatabaseRDBMS

Database

RDBMS

Database

Node

1

Node

2

P1 P2 P3

ClientClientClient Client

Single DB => Partitioned Table => Database Sharding => Database Cluster

January 2016

Architecture et modèle de données Cassandra

9

Page 10: Architecture et modèle de données Cassandra

2013 © Trivadis

So, what’s Wrong With RDBMS?

• Many programmers are already

familiar with it.

• Transactions and ACID make

development easy.

• Lots of tools to use.

• Rigid schema design.

• Harder to scale.

• Replication.

January 2016

Architecture et modèle de données Cassandra

10

Nothing

No one size fits all

Page 11: Architecture et modèle de données Cassandra

2013 © Trivadis

Solution: NoSQL ?

No standard definition of what NoSQL means

• Not Only SQL and not No SQL

• Not only relational would have been better

Term began in a workshop organized in 2009

Use the right tools (DBs) for the job

It is more like a feature set, or event the not of a feature set

January 2016

Architecture et modèle de données Cassandra

11

Page 12: Architecture et modèle de données Cassandra

2013 © Trivadis

Use Cases for NoSQL

• Massive write performance.

• Fast key value look ups.

• Flexible schema and data types.

• No single point of failure.

• Fast prototyping and development.

• Out of the box scalability.

• Easy maintenance.

January 2016

Architecture et modèle de données Cassandra

12

Page 13: Architecture et modèle de données Cassandra

2013 © Trivadis

Brewer's CAP Theorem

Any networked shared-data system can have at most two of the three

desirable properties:

Consistency

All of the nodes see the same data at

the same time, regardless of

where the data is stored

Availability

Node failures do not prevent

survivors from continuing to

operate

Network Partition tolerance

The system continues to operate

despite arbitrary message loss

January 2016

Architecture et modèle de données Cassandra

13

Availability

Consistency

Network

Partition

Tolerance

n/a

CA CP

AP

Page 14: Architecture et modèle de données Cassandra

2013 © Trivadis

Data Store Positioning

January 2016

Architecture et modèle de données Cassandra

14

Sca

lab

ility

Standardized Model, Tooling, Complexity

Key-value

Wide Column (Column Families / Extensible Records)

Document

Graph

Relational

SQL Comfort Zone

Multi Dimensional

Page 15: Architecture et modèle de données Cassandra

2013 © Trivadis

Polyglot Persistence

In 2006, Neal Ford coined the term Polyglot

Programming

Applications should be written in a mix of

languages to take advantage of the fact

that different languages are suitable for

tackling different problems

Polyglot Persistence defines a a hybrid

approach to persistence

Using multiple data storage technologies

Selected based on the way data is being

used by individual applications

Why store binary images in RDBMs, when

there are better storage systems?

January 2016

Architecture et modèle de données Cassandra

15

Polyglot Programmer

Page 16: Architecture et modèle de données Cassandra

2013 © Trivadis

Polyglot Persistence

Today we use the same

database for all kind of data

• Business transactions, session

management data, reporting,

logging information, content

information, ...

No need for same properties of

availability, consistency or

backup requirements

Polyglot Data Storage Usage

allows to mix and match

Relational and NoSQL data

stores

January 2016

Architecture et modèle de données Cassandra

16

Polygot Persistence Model

E-commerce Application

Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order

Key-Value RDMBS Document Graph

„Traditional“ Persistence Model

E-commerce Application

RDBMS

Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order

Page 17: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

17

Page 18: Architecture et modèle de données Cassandra

2013 © Trivadis

Definition of Cassandra

Apache Cassandra™ is a free

• Distributed…

• High performance…

• Extremely scalable…

• Fault tolerant (i.e. no single point of failure)…

post-relational database solution.

Cassandra can serve as both real-time Datastore (the "system of record")

for online/transactional applications, and as a read-intensive database for

business intelligence systems.

January 2016

Architecture et modèle de données Cassandra

18

Page 19: Architecture et modèle de données Cassandra

2013 © Trivadis

History of Cassandra

January 2016

Architecture et modèle de données Cassandra

19

Bigtable Dynamo

Page 20: Architecture et modèle de données Cassandra

2013 © Trivadis

Architecture Overview

Cassandra was designed with the understanding that system/hardware

failures can and do occur :

• Peer-to-peer, distributed system

• All nodes the same

• Data partitioned among all nodes in the cluster

• Custom data replication to ensure fault tolerance

• Read/Write-anywhere design

January 2016

Architecture et modèle de données Cassandra

20

Page 21: Architecture et modèle de données Cassandra

2013 © Trivadis

Big Data Scalability

• Capable of comfortably scaling to petabytes

• New nodes = Linear performance increases

• Add new nodes online

January 2016

Architecture et modèle de données Cassandra

21

Page 22: Architecture et modèle de données Cassandra

2013 © Trivadis

Who is using Cassandra?

January 2016

Architecture et modèle de données Cassandra

22

Largest publicly known cluster has over 300 TB of data spanning 400

machines

Page 23: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

23

Page 24: Architecture et modèle de données Cassandra

2013 © Trivadis

Why Cassandra?

Tunable data consistency

Flexible schema design

Data Compression

CQL language (like SQL)

Support for key languages and

platforms

No need for special hardware or

software

Gigabyte to Petabyte scalability

Linear performance gains through

adding nodes

No single point of failure

Easy replication / data distribution

Multi-data center and Cloud

capable

No need for separate caching layer

January 2016

Architecture et modèle de données Cassandra

24

Page 25: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Use Cases

Product Catalog / Playlists

Personalization

• Ads

• Recommendations

• Ratings

Fraud Detection

Time Series

• Finance

• Smart Meter

IoT / Sensor Data

Graph / Network data

January 2016

Architecture et modèle de données Cassandra

25

Page 26: Architecture et modèle de données Cassandra

2013 © Trivadis

DataStax Enterprise Edition (DSE)

January 2016

Architecture et modèle de données Cassandra

26

Page 27: Architecture et modèle de données Cassandra

2013 © Trivadis

Datastax OpsCenter

January 2016

Architecture et modèle de données Cassandra

27

Page 28: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

28

Page 29: Architecture et modèle de données Cassandra

2013 © Trivadis

Architecture Overview

Each node communicates with each other through the Gossip protocol,

which exchanges information across the cluster every second

A commit log is used on each node to capture write activity. Data durability

is assured

Data also written to an in-memory structure (memtable) and then to disk

once the memory structure is full (an SSTable)

January 2016

Architecture et modèle de données Cassandra

29

Page 30: Architecture et modèle de données Cassandra

2013 © Trivadis

No Single Point of Failure

All nodes the same

Customized replication affords tunable

data redundancy

Read/write from any node

Can replicate data among different

physical data center racks

January 2016

Architecture et modèle de données Cassandra

30

Page 31: Architecture et modèle de données Cassandra

2013 © Trivadis

Easy Replication / Data Distribution

Transparently handled by

Cassandra

Multi-data center capable

Exploits all the benefits of Cloud

computing

Able to do hybrid Cloud/On-

premise setup

January 2016

Architecture et modèle de données Cassandra

31

Page 32: Architecture et modèle de données Cassandra

2013 © Trivadis

Partitioning

• Nodes are logically structured in Ring Topology.

• Hashed value of key associated with data partition is used to assign it to

a node in the ring.

• Lightly loaded nodes moves position to alleviate highly loaded nodes.

January 2016

Architecture et modèle de données Cassandra

32

Page 33: Architecture et modèle de données Cassandra

2013 © Trivadis

Data Replication

Replication for high availability and data durability

• Replication factor N: each row is replicated at N nodes

• Each row key k is assigned to a coordination node

• The coordinator node is responsible for replicating the rows within its

key range

January 2016

Architecture et modèle de données Cassandra

33

Page 34: Architecture et modèle de données Cassandra

2013 © Trivadis

Partitioning and Replication

January 2016

Architecture et modèle de données Cassandra

34

01

1/2

F

E

D

C

B

A N=3

h(key2)

h(key1)

Page 35: Architecture et modèle de données Cassandra

2013 © Trivadis

Data Replication

Each data item is replicated at N (replication factor) nodes.

Different Replication Policies

Rack Unaware – replicate data at N-1 successive nodes after its

coordinator

Rack Aware – uses 'Zookeeper' to choose a leader which tells nodes

the range they are replicas for

Datacenter Aware – similar to Rack Aware but leader is chosen at

Datacenter level instead of Rack level.

January 2016

Architecture et modèle de données Cassandra

35

Page 36: Architecture et modèle de données Cassandra

2013 © Trivadis

Write Path

When a write occurs, Cassandra stores the data in a structure in memory,

the Memtable, and also appends writes to the commit log on disk,

providing configurable durability.

January 2016

Architecture et modèle de données Cassandra

36

Page 37: Architecture et modèle de données Cassandra

2013 © Trivadis

Write Requests

Coordinator sends a write request to all replicas that own the row being

written

January 2016

Architecture et modèle de données Cassandra

37

Page 38: Architecture et modèle de données Cassandra

2013 © Trivadis

Write Consistency

The consistency level for writing to Cassandra specifies how many replicas

the write must succeed before returning an ACK to the client

• Quorum: (replication_factor / 2) + 1

January 2016

Architecture et modèle de données Cassandra

38

Page 39: Architecture et modèle de données Cassandra

2013 © Trivadis

Read Path

When a read request for a row

comes in to a node, the row

must be combined from all

SSTables on that node that

contain columns from the row in

question

as well as from any unflushed

memtables, to produce the

requested data

January 2016

Architecture et modèle de données Cassandra

39

Page 40: Architecture et modèle de données Cassandra

2013 © Trivadis

Read Requests

There are two types of read requests that a coordinator can send to a

replica:

• A direct read request

• A background read repair request

The number of replicas contacted by a direct read request is determined by

the consistency level specified by the client.

January 2016

Architecture et modèle de données Cassandra

40

Page 41: Architecture et modèle de données Cassandra

2013 © Trivadis

Read Consistency

The consistency level for reading from Cassandra specified how many

replicas must respond before a result is returned to the client

• Quorum: (replication_factor / 2) + 1

January 2016

Architecture et modèle de données Cassandra

41

Page 42: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

42

Page 43: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Data Model

• Table is a multi dimensional map indexed by key (row key).

• Columns are grouped into Column Families

• Dynamic schema design allows for much more flexible data storage

than rigid RDBMS

• Each Column has

- Name

- Value

- Timestamp

January 2016

Architecture et modèle de données Cassandra

43

Page 44: Architecture et modèle de données Cassandra

2013 © Trivadis

How Cassandra stores data

• Model brought from Google Bigtable

• Row Key and a lot of columns

• Column names sorted (UTF8, Int, Timestamp, etc.)

January 2016

Architecture et modèle de données Cassandra

44

Column Name … Column Name

Column Value Column Value

Timestamp Timestamp

TTL TTL

Row Key

1 2 Billion

Bil

lio

n o

f R

ow

s

Page 45: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Data Model

January 2016

Keyspace

Architecture et modèle de données Cassandra

45

Column Family Column Family

Page 46: Architecture et modèle de données Cassandra

2013 © Trivadis

Row, row key, column key, and column value

January 2016

Architecture et modèle de données Cassandra

46

row key

va

cola

vb

colb

vc

colc

vd

cold

Column keys (or column names)Row

Column values (or cells)

• Rows: individual rows constitute a column family

• Row key: uniquely identifies a row in a column family

• Row: stores pairs of column keys and column values

• Column key: uniquely identifies a column value in a row

• Column value : stores one value or a collection of values

Page 47: Architecture et modèle de données Cassandra

2013 © Trivadis

Static vs. Dynamic Column Family

Static column family (skinny rows)

• Contains a predefined set of columns with metadata

• Number of columns can vary across multiple rows within the column family

• Similar to RDMBS, except no NULL values

January 2016

Architecture et modèle de données Cassandra

47

John Lennon

1940

born

England

country

1980

died

Rock

style

artist

type

The Beatles

England

country

1957

founded

Rock

style

band

type

Page 48: Architecture et modèle de données Cassandra

2013 © Trivadis

What is a wide row?

Rows may be described as “skinny” or “wide”

Wide row – has a relatively large number of column keys (hundreds or

thousands); this number may increase as new data values are inserted

- For example, a row that stores all bands of the same style

- The number of such bands will increase as new bands are formed

Note that column values do not exist in this example

- The column key – in this case a band name – stores all the data desired

- Could have stored the number of albums, or year founded, etc., as column

values

©2014 DataStax Training. Use only with permission.

Slide 48

Rock

The Animals The Beatles...

...

...

...

...

...

Page 49: Architecture et modèle de données Cassandra

2013 © Trivadis

What are composite row key and

composite column key?

Composite row key – multiple components separated by colon

‘Revolver’ and 1966 are the album title and year

‘tracks’ value is a collection (map)

Composite column key – multiple components separated by colon

Composite column keys are sorted by each component

©2014 DataStax Training. Use only with permission.

Slide 49

Revolver:1966

Rock

genre

The Beatles

performer

{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}

tracks

Revolver:1966

Taxman

1:title

Eleanor Rigby

2:title

Tomorrow Never Knows

14:title...

...

Page 50: Architecture et modèle de données Cassandra

2013 © Trivadis

Data Modelling with Cassandra

• De-normalize, De-normalize, De-normalize

• Forget about old-school 3NF

• De-normalize wherever you can for quicker retrieval and let application logic

handle the responsibility of reliably updating redundancies

• Rows are gigantic and sorted

• Giga-sized rows (2 billion columns max) can be used to store sortable and

sliceable columns

• Comments by timestamp, ordered bids by quoted price, Ratings by product, ..

• One row, one machine

• Each row stays on one machine

• Rows are not shared across nodes

• Beware of this, don't create hotspots with a high demand row!

January 2016

Architecture et modèle de données Cassandra

50

From Query to Model

Page 51: Architecture et modèle de données Cassandra

2013 © Trivadis

Remember this

• Cassandra finds rows fast

• Cassandra scans columns fast

• Cassandra does not scan rows

January 2016

Architecture et modèle de données Cassandra

51

Page 52: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

52

Page 53: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra API – Thrift vs. CQL

Thrift

• exposes the internal storage structure of Cassandra pretty much directly

• Complicated, low-level, full control

• legacy

CQL

• New way to go

• Provides thin abstraction layer over Cassandra's internal structure

• Hides some distracting and useless implementation details

• Allows to provide native syntax for common encodings/idioms (like

collections) instead of letting each client (library) re-implement them in their

own, different and thus incompatible way

January 2016

Architecture et modèle de données Cassandra

53

Page 54: Architecture et modèle de données Cassandra

2013 © Trivadis

CQL Language

Very similar to RDBMS SQL syntax

Create objects via DDL (e.g. CREATE…)

Core DML commands supported: INSERT, UPDATE, DELETE

Query data with SELECT

Current version is CQL3

January 2016

Architecture et modèle de données Cassandra

54

Page 55: Architecture et modèle de données Cassandra

2013 © Trivadis

CQL Shell for Apache Cassandra

cqlsh is the command line utility for execution CQL commands (think of

SQL*Plus for Cassandra)

CQL3 is default since Cassandra 1.2

January 2016

Architecture et modèle de données Cassandra

55

$ cqlsh

Connected to DataStaxCluster at localhost:9160.

[cqlsh 4.1.0 | Cassandra 2.0.5.24 | CQL spec 3.1.1 | Thrift

protocol 19.39.0]

Use HELP for help.

cqlsh>

Page 56: Architecture et modèle de données Cassandra

2013 © Trivadis

The CQL/Cassandra Mapping – Static Table

January 2016

name | age | role

-----+-----+-----

john | 37 | dev

eric | 38 | ceo

age role

john 37 dev

Eric 38 ceo

CREATE TABLE employee (

name text PRIMARY KEY,

age int,

role text);

Architecture et modèle de données Cassandra

56

Page 57: Architecture et modèle de données Cassandra

2013 © Trivadis

Create a Dynamic table (wide-row) Employee

A Dynamic Table is also created with the CREATE TABLE statement but

using a composite primary key

January 2016

Architecture et modèle de données Cassandra

57

cqlsh:training> CREATE TABLE employees (

company text,

name text,

age int,

role text,

PRIMARY KEY (company,name)

);

Page 58: Architecture et modèle de données Cassandra

2013 © Trivadis

The CQL/Cassandra Mapping – Dynamic Table

January 2016

company | name | age | role

--------+------+-----+-----

OSC | eric | 38 | ceo

OSC | john | 37 | dev

RKG | anya | 29 | lead

RKG | ben | 27 | dev

RKG | chad | 35 | ops

eric:age eric:role john:age john:role

OSC 38 dev 37 dev

anya:age anya:role ben:age ben:role chad:age chad:role

RKG 29 lead 27 dev 35 ops

CREATE TABLE employees (

company text,

name text,

age int,

role text,

PRIMARY KEY (company,name)

);

Architecture et modèle de données Cassandra

58

Page 59: Architecture et modèle de données Cassandra

2013 © Trivadis

Insert data into Employee

The INSERT command is similar to the SQL counterpart

Major difference is that the PRIMARY KEY is always required

If the same statement is executed twice, there will be no error

if same PRIMARY KEY value is reused with different other column value,

then the last one wins!

January 2016

Architecture et modèle de données Cassandra

59

cqlsh:training> INSERT INTO employee (name, age, role)

VALUES ('john', 37, 'dev');

cqlsh:training> INSERT INTO employee (name, age, role)

VALUES ('eric', 38, 'ceo');

Page 60: Architecture et modèle de données Cassandra

2013 © Trivadis

Retrieving data from Employee table (II)

Restriction on column other than PRIMARY KEY won't work

Can be solved with an Index (but be careful, better use de-normalization)

January 2016

Architecture et modèle de données Cassandra

60

cqlsh:training> SELECT * FROM employee

WHERE age = 37;

Bad Request: No indexed columns present in by-columns clause

with Equal operator

cqlsh:training> CREATE INDEX employee_age_idx

ON employee (age);

cqlsh:training> SELECT * FROM employee

WHERE age = 37;

name | age | role

------+-----+------

john | 37 | dev

(1 rows)

Page 61: Architecture et modèle de données Cassandra

2013 © Trivadis

Update data in Employee

The UPDATE statement is similar to the SQL UPDATE command

Just as with the INSERT, the PRIMARY KEY column must be specified as

part of the UPDATE

In CQL the UPDATE does not check for the existence of the row, if it does

not exist, CQL will just create it

January 2016

Architecture et modèle de données Cassandra

61

cqlsh:training> UPDATE employee SET age = 38

WHERE name = 'john';

Page 62: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Data Types

January 2016

Architecture et modèle de données Cassandra

62

Category CQL Data Type Description

String ascii US-ASCII character string

text UTF-8 encoded string, used most of the time for

storing String data.

varchar UTF-8 Strings.

inet Used for storing IP addresses

Numeric int 32-bit signed integer

float 32-bit IEEE-754 floating point

double 64-bit IEEE-754 floating point

varint Arbitrary precision integers

bigint 64-bit number, equivalent to long.

decimal Variable-precision decimal

counter Distributed counter value (64-bit long)

Page 63: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Data Types (II)

January 2016

Architecture et modèle de données Cassandra

63

Category CQL Data Type Description

UUIDs uuid A UUID in standard UUID format

timeuuid Type 1 UUID only, for storing unique time-base

IDs

Collections list Ordered collection of one or more elements

map Collection of arbitrary key-value pairs

set Unordered collection of one or more unique

elements

Miscellaneous boolean Boolean (true/false)

blob Used for storing binary data written in

hexadecimal

timestamp Date/Time

Page 64: Architecture et modèle de données Cassandra

2013 © Trivadis

Cassandra Data Types (III)

TimeUUID

• Have a few extra functions, that allow extracting the time information

• now() returns a new TimeUUID with the time of the current timestamp,

ensures globally unique values

• minTimeuuid() and maxTimeuuid() are used when querying ranges of

TimeUUIDs

Counter

• Cannot mix counter columns with other types

• Value can not be set, only incremented/decremented by specified amount

• Counters may not be part of the PRIMARY KEY of the table

January 2016

Architecture et modèle de données Cassandra

64

WHERE event_time > maxTimeuuid('2013-01-01 00:05+0000')

AND event_time < minTimeuuid('2013-02-02 10:00+0000')

Page 65: Architecture et modèle de données Cassandra

2013 © Trivadis

Collections

CQL3 also supports collections for storing complex data structures

• Set {value,…}, List [value,…], Map {key:value,…}

January 2016

Architecture et modèle de données Cassandra

65

cqlsh:training> CREATE TABLE collection_sample(

id int PRIMARY KEY,

string_set set<text>,

string_list list<text>,

string_map map<text, text>);

cqlsh:training> INSERT INTO coll

(id, string_set, string_list, string_map)

VALUES (1,

{'text1','text2','text1'},

['text1','text2','text1'],

{'key1':'value1'});

Page 66: Architecture et modèle de données Cassandra

2013 © Trivadis

Collections (II)

January 2016

Architecture et modèle de données Cassandra

66

cqlsh:training> SELECT * FROM collection_sample;

id | string_list | string_map | string_set

----+-----------------------------+--------------------+--------------------

1 | ['text1', 'text2', 'text1'] | {'key1': 'value1'} | {'text1', 'text2'}

(1 rows)

Page 67: Architecture et modèle de données Cassandra

2013 © Trivadis

Counter Columns

Create a Counter Column Table that counts “favorite” events

January 2016

Architecture et modèle de données Cassandra

67

cqlsh:training> CREATE TABLE favorites (

product_id int,

month int,

number COUNTER,

PRIMARY KEY (product_id, month));

cqlsh:training> UPDATE favorites SET number = number + 1

WHERE product_id = 4910 AND month = 06;

cqlsh:training> SELECT * FROM favorites;

product_id | month | number

------------+-------+--------

4910 | 6 | 1

Page 68: Architecture et modèle de données Cassandra

2013 © Trivadis

Time-to-Live (TTL) on Insert

Insert a row with a TTL in seconds (30s) – after that the row is deleted

January 2016

Architecture et modèle de données Cassandra

68

cqlsh:training> INSERT INTO employee (name, age, role)

VALUES ('bob', 29, 'dev')

USING TTL 30;

cqlsh:training> SELECT TTL(role)

FROM employee WHERE name='bob';

ttl(role)

-----------

22

cqlsh:training> SELECT TTL(role) FROM employee WHERE

name='bob';

(0 rows)

Page 69: Architecture et modèle de données Cassandra

2013 © Trivadis

Agenda

1. Introduction to NoSQL datastores and Polyglot Persistence

2. What is Apache Cassandra?

3. Why Cassandra, What is DataStax?

4. Cassandra Architecture

5. Cassandra Data Model

6. Cassandra Query Language (CQL)

7. Cassandra/DataStax @ Trivadis

January 2016

Architecture et modèle de données Cassandra

69

Page 70: Architecture et modèle de données Cassandra

2013 © Trivadis

Trivadis / DataStax Partnership

• Since December 2014 we are a DataStax silver partner

• DataStax Partner Network (DSPN)

• Available certifications

• Admin

• Developer

• Architect

• Currently only one other partner in Switzerland: Intersys

• http://www.datastax.com/partners

January 2016

Architecture et modèle de données Cassandra

70

Page 71: Architecture et modèle de données Cassandra

2013 © Trivadis

Questions and answers ...

2013 © Trivadis

BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

Ulises Fasoli

Senior consultant

+41 21 321 47 00

[email protected]

January 2016

Architecture et modèle de données Cassandra

71