coursera cassandra driver

Post on 06-Jan-2017

468 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Coursera, Cassandra, Java Drivers

Biography

Daniel Chia @DanielJHChia

Software Engineer, Infrastructure Team

2

1 Introduction

2 Why We Chose Cassandra

3 Example Use Cases

4 Pain Points

5 Java Drivers

Coursera

4

5

6

Web iOS Android

Why Cassandra

7

Coursera Tech Stack

• 100% AWS • MySQL + Cassandra • Service-oriented

8

Consistently Fast Latencies

9

Availability

10

Scalability

11

Use Case #1

• Resume video where you left off • High write volume • TTL data

12

13

CREATE TABLE video_progress_kvs_basic ( user_id int, course_id varchar, video_id varchar, viewed_up_to bigint, updated_at bigint PRIMARY KEY ((user_id, course_id, video_id)));

Use Case #2: Media Asset Service

14

15

16

Use case #3: Video Workflows

17

Input.mp4

Step 1: Audio

Step 2: Low Res Video

Step 3: High Res Video

Assembly 1: Crash

Assembly 2: Ok

Assembly 3: Crash

Assembly 4: Ok

Assembly 5: Ok

18

CREATE TABLE transloadit_workflow ( workflow_id text, step_id text, assembly_id text, step_details text, step_payload map<text, text>, step_status text, PRIMARY KEY (workflow_id, step_id, assembly_id))

19

20

Looking Back

Cassandra - Initial Pain Points

• Can’t execute arbitrary queries • Filtering, sorting, etc.

• Can’t be abused as an OLAP database

• Worries about ‘eventual’ consistency

21

Gotchas

• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)

• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance

22

Helpful Things

• Data modeling consulting

• Monitoring

• Data access layer for common use cases

23

24

25

Java Drivers

Best Practices

• Driver Choice • Cluster / Connection Setup • Executing Queries

27

28

Datastax Java Drivers

29

public class Scratch { static Cluster cluster;

public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra") .build();

readRow("asset:QoMqLLyCEeSOi3paAormVw");

cluster.close(); }

static void readRow(String id) { Session session = cluster.connect("asset");

ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one()); session.close(); }}

30

cluster = Cluster.builder() .addContactPoint("cassandra") .build();

31

LoadBalancingPolicy policy = new TokenAwarePolicy( new DCAwareRoundRobinPolicy());

cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy) .build();

32

cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy)

.withRetryPolicy(retryPolicy) .build();

Default Retry Policy

• Retries read if enough replicas alive, but data fetch failed. • Retries write only for batched writes. • Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709)

33

Share Session!

34

public static void main(String args[]) { cluster = Cluster.builder()

.addContactPoint(“cassandra”).build();

readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

cluster.close(); }

static void readRow(String id) { Session session = cluster.connect("asset");

ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one()); session.close(); }

35

public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra").build();

session = cluster.connect();

readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

session.close(); cluster.close();}

static void readRow(String id) { ResultSet result = session.execute( "SELECT * from asset.asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one());}

Use prepared statements

• If doing query more than once • Better performance • Token aware routing

36

37

static PreparedStatement statement;

public static void main(String args[]) { …

session = cluster.connect(); statement = session.prepare( "SELECT * from asset.asset_kvs_timestamp where part_key = ?")

readRow("asset:QoMqLLyCEeSOi3paAormVw");

… }

static void readRow(String id) { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound);

System.out.println(result.one()); }

There Be Dragons.. JAVA-420

statement = session.prepare( "SELECT part_key, time_key, content from asset.asset_kvs_timestamp where part_key = ?")

38

Always specify columns explicitly for prepared statements!

Consider Async

static List<String> readRows(List<String> ids) { return ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); return result.one().getString("c_enc"); }).collect(Collectors.toList());}

39

Async..

static ListenableFuture<List<String>> readRowsAsync(List<String> ids) { List<ListenableFuture<String>> futures = ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSetFuture future = session.executeAsync(bound);

return Futures.transform(future, (ResultSet result) -> result.one().getString(“c_enc"));

}).collect(Collectors.toList());

return Futures.allAsList(futures);}

40

http://www.datastax.com/dev/blog/java-driver-async-queries

Thank you

Cassandra Summit 2016 September 7-9 San Jose, CA

Get 15% Off with Code: MeetupPromo Cassandrasummit.org

top related