how we made greenplum open source - andreas scherbaum · why go open source? customers more and...

91
1 © 2016 Pivotal Software, Inc. All rights reserved. 1 How we made Greenplum Open Source Andreas Scherbaum Pivotal

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

1 © 2016 Pivotal Software, Inc. All rights reserved. 1

How we made Greenplum Open Source Andreas Scherbaum

Pivotal

Page 2: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

2 © 2016 Pivotal Software, Inc. All rights reserved.

Andreas Scherbaum �  Works with databases since ~1997, with PostgreSQL since ~1998

�  Founding member of PGEU

�  Board of Directors: PGEU, Orga team for pgconf.[eu|de], FOSDEM

�  PostgreSQL Regional Contact for Germany

�  Ran my own company around PostgreSQL for 7+ years

�  Joined EMC in 2011, then Pivotal, then EMC, then Pivotal, working on PostgreSQL and Greenplum projects

�  Run PostgreSQL Meetups in Palo Alto, CA

Page 3: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

3 © 2016 Pivotal Software, Inc. All rights reserved.

Social

� Twitter: @ascherbaum

� Google+: https://plus.google.com/106626124305916746070

� Blog: http://andreas.scherbaum.la/

Page 4: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

4 © 2016 Pivotal Software, Inc. All rights reserved.

2015-02-17: Announcement

� February 17: Pivotal announces that it will Open Source most of it’s products

Page 5: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

5 © 2016 Pivotal Software, Inc. All rights reserved.

2015-10-29: Greenplum Open Source

Page 6: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

6 © 2016 Pivotal Software, Inc. All rights reserved.

What happened between February 2015 and October 2015?

Page 7: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

7 © 2016 Pivotal Software, Inc. All rights reserved. 7

Agenda

Page 8: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

8 © 2016 Pivotal Software, Inc. All rights reserved.

Agenda

� What is Greenplum?

� History of Greenplum (aligned to PostgreSQL)

� Why Open Source?

� Challenges

� Lessons Learned

� Q & A

Page 9: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

9 © 2016 Pivotal Software, Inc. All rights reserved. 9

What is Greenplum? Time for Marketing …

Page 10: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

10 © 2016 Pivotal Software, Inc. All rights reserved.

What is Greenplum Database?

� One of these PostgreSQL forks

� Diverged away a long time ago (2007) –  And now wants to merge again (2015 - ???)

Page 11: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

11 © 2016 Pivotal Software, Inc. All rights reserved.

Why another database?

� PostgreSQL is good for OLTP –  It does not scale well, it does not scale across dozens of machines

� OLAP systems scale to 100s of TBs, or even PBs

� Huge amounts of data must be processed –  Contrary to “hot spots” in OLTP –  Full table/partition scans are the norm

� Trend from “Store-everything-Data-Warehouse” to “Analytics-Database-on-everything”

Page 12: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

12 © 2016 Pivotal Software, Inc. All rights reserved.

What is Greenplum Database?

� Massive-parallel, shared-nothing database

� Optimized for OLAP and Analytics workloads

Page 13: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

13 © 2016 Pivotal Software, Inc. All rights reserved.

Google Trends – Data Warehouse

Page 14: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

14 © 2016 Pivotal Software, Inc. All rights reserved.

Google Trends – Analytics Database

Page 15: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

15 © 2016 Pivotal Software, Inc. All rights reserved.

Why another database?

� Many vendors in the market –  Teradata (founded back in 1979) –  Greenplum (2003) –  Exadata (2008) –  Netezza (1999) –  …

Page 16: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

16 © 2016 Pivotal Software, Inc. All rights reserved.

What is Greenplum Database?

� Massive-parallel, shared-nothing database

� What?

Page 17: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

17 © 2016 Pivotal Software, Inc. All rights reserved.

Massive parallel

... ...

1 (2) Segment Server

n Segment Server

Every segment server holds n

segment databases

Master server controls

everything

Interconnect: Communication

between everyone

Page 18: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

18 © 2016 Pivotal Software, Inc. All rights reserved.

Shared nothing

... ...

Separate servers: Share no hardware or storage

Page 19: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

19 © 2016 Pivotal Software, Inc. All rights reserved.

Where is my data?

... ...

Data is spread across the segment databases (hopefully evenly distributed)

Segment database with biggest result set will need most time for

processing query

Page 20: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

20 © 2016 Pivotal Software, Inc. All rights reserved.

Where is my query executed?

... ...

All segment databases, in parallel, on their own potion of the data

100% CPU usage guaranteed1

(1: conditions apply)

Page 21: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

21 © 2016 Pivotal Software, Inc. All rights reserved.

But what about joins?

... ...

GPDB handles the joins: •  Co-located joins •  Broadcast joins •  Redistribute joins

Page 22: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

22 © 2016 Pivotal Software, Inc. All rights reserved.

But what about joins?

... ...

Optimal case: Data is co-located on

the same segment databases

No network traffic

Page 23: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

23 © 2016 Pivotal Software, Inc. All rights reserved.

And without co-location?

... ...

Data is broadcasted or redistributed over the

Interconnect

Redistribution: Small tables

Broadcast: Big tables Greenplum is handling the Join

Page 24: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

24 © 2016 Pivotal Software, Inc. All rights reserved.

Expansion adds new resources

... ...

Expansion = new hardware

More hardware: = more CPU cores = more RAM = more disk space = more I/O

Page 25: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

25 © 2016 Pivotal Software, Inc. All rights reserved.

What else?

� Number of Big Data features –  Polymorphic storage (row or column based tables) –  Partitions and dynamic partition elimination –  Many Windowing Functions –  Parallel data loading –  Resource queues (manage number users, CPU and RAM

resources)

Page 26: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

26 © 2016 Pivotal Software, Inc. All rights reserved.

What else?

� Number of Big Data features

Page 27: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

27 © 2016 Pivotal Software, Inc. All rights reserved.

Polymorphic storage

TABLE ‘CUSTOMER’ Mar ‘11 Apr ‘11 May

‘11 Jun ‘11 Jul ‘11 Aug ‘11 Sept

‘11 Oct ‘11 Nov ‘11

Row-oriented for HOT DATA Column-oriented for COLD DATA

Plus compression

Page 28: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

28 © 2016 Pivotal Software, Inc. All rights reserved.

Partitioning (including proper SQL Syntax)

DATA SET Segment 1A

Segment 1C

Segment 1D

Segment 2A

Segment 2B

Segment 2C

Segment 2D

Segment 3A

Segment 3B

Segment 3C

Segment 3D

Jan 2007 Feb 2007 Mar 2007 Apr 2007 May 2007 Jun 2007 Jul 2007 Aug 2007 Sep 2007 Oct 2007 Nov 2007 Dec 2007

Segment 1B Node 1

Node 2

Node 3

Page 29: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

29 © 2016 Pivotal Software, Inc. All rights reserved.

Parallel Data Loading

External Sources

Loading, streaming, etc.

gNet Network Interconnect

... ...

... ... Master Servers

Query planning & dispatch

Segment Servers

Query processing & data storage

SQL

ETL File Systems

Page 30: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

30 © 2016 Pivotal Software, Inc. All rights reserved.

Resource Queues

Queries are evaluated by First In, First Out

Limit resources: •  Number queries •  RAM •  CPU •  Bypass short

queries

Page 31: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

31 © 2016 Pivotal Software, Inc. All rights reserved. 31

History of Greenplum Aligned to PostgreSQL timeline

Page 32: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

32 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Future

Page 33: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

33 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2003

2003

September 2003: Greenplum founded

November 2003: PostgreSQL 7.4

released

Page 34: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

34 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum founded

� Merger of two companies: –  Didera (DC) + Metapa (NY) –  Luke Lonergar + Scott Yara –  Company moved to San Mateo

� Chief Architect: Chuck McDevitt –  Employee #20 at Teradata

� Products like “BizGres” –  Discussion: Single node OS, multi node commercial

Page 35: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

35 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum – 2004, 2005

2003 2004

January 2005: PostgreSQL 8.0

released

2005

November 2005: PostgreSQL 8.1

released

Page 36: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

36 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2006

2003 2004

December 2006: PostgreSQL 8.2

released

2005 2006

Page 37: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

37 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2007

2003

2007: Greenplum 3.0

released

2004 2005 2006 2007

Stopped merging with PostgreSQL

Page 38: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

38 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2008

2003

2008: Greenplum 3.1

released

2004 2005 2006 2007 2008

February 2008: PostgreSQL 8.3

released

Page 39: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

39 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum 3.x

� Logical replication –  PostgreSQL got replication 3 years later, with v9.0

� Cooperation with Sun –  GPDB on Sun hardware, running on Solaris –  Some customers use this up to today

� Stopped merging with PostgreSQL –  GPDB up to today shows “version 8.2.14”

Page 40: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

40 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2009

2003 2004 2005 2006 2007 2008

July 2009: PostgreSQL 8.4

released

2009

Page 41: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

41 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2010

2003

May 2010: Greenplum 4.0

released

2004 2005 2006 2007 2008

September 2010: PostgreSQL 9.0

released

2009 2010

July 2010: Greenplum

acquired by EMC

Page 42: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

42 © 2016 Pivotal Software, Inc. All rights reserved.

Acquisition by EMC

� Already teams in: New Zealand, Australia, China, Silicon Valley, Virginia, Israel, India

� Data Computing Appliance (DCA) v1 released

Page 43: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

43 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum 4.0

� Filesystem based replication –  Every writing file I/O is replicated from primary to mirror –  Before that: single node failure renders the cluster read only

� Data Science Team

Page 44: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

44 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2011

2003

Mar 2011: Greenplum 4.1

released

2004 2005 2006 2007 2008

September 2011: PostgreSQL 9.1

released

2009 2010 2011

November 2011: Greenplum 4.2

released

Page 45: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

45 © 2016 Pivotal Software, Inc. All rights reserved.

2011

� MADlib –  Mad Skills paper “MAD Skills: New Analysis Practices for Big Data”

Google Trends: Data Science

Page 46: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

46 © 2016 Pivotal Software, Inc. All rights reserved.

2011

� Greenplum Hadoop distribution

�  Interconnect switches to UDP –  TCP does not scale –  Packet verification on top of UDP

Page 47: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

47 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2012

2003

November 2012: Greenplum 4.2.3

released

2004 2005 2006 2007 2008

September 2012: PostgreSQL 9.2

released

2009 2010 2011 2012

Page 48: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

48 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2013

2003

February 2013: Greenplum 4.2.4

released

2004 2005 2006 2007 2008

September 2013: PostgreSQL 9.3

released

2009 2010 2011 2012 2013

April 2013: Greenplum 4.2.5

released

Page 49: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

49 © 2016 Pivotal Software, Inc. All rights reserved.

2013

� April 2013: EMC & VMware spin off Pivotal

Page 50: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

50 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2014

2003

January 2014: Greenplum 4.3

released

2004 2005 2006 2007 2008

December 2014: PostgreSQL 9.4

released

2009 2010 2011 2012 2013 2014

Page 51: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

51 © 2016 Pivotal Software, Inc. All rights reserved.

2014

� Append-only tables -> Append-optimized tables –  Allows update in append-only tables

� WAL Replication backported from PostgreSQL –  For Master -> Slave replication –  Primary -> Mirror still uses file replication

Page 52: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

52 © 2016 Pivotal Software, Inc. All rights reserved.

History of Greenplum - 2015

2003

Open Source

2004 2005 2006 2007 2008

2009 2010 2011 2012 2013 2014 2015

Page 53: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

53 © 2016 Pivotal Software, Inc. All rights reserved.

2015

� February 17: Pivotal announces that it will Open Source most of it’s products

Page 54: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

54 © 2016 Pivotal Software, Inc. All rights reserved.

Page 55: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

55 © 2016 Pivotal Software, Inc. All rights reserved.

And after the announcements?

� Silence for several months …

Page 56: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

56 © 2016 Pivotal Software, Inc. All rights reserved.

And after the announcements?

� Silence for several months …

� But in the background, a lot happened –  Number of PostgreSQL contributors join Pivotal: –  Heikki Linnakangas, Daniel Gustafsson, Dave Cramer, Atri Sharma

Page 57: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

57 © 2016 Pivotal Software, Inc. All rights reserved.

And after the announcements?

� Silence for several months …

� But in the background, a lot happened –  Number of PostgreSQL contributors join Pivotal: –  Heikki Linnakangas, Daniel Gustafsson, Dave Cramer, Atri Sharma

�  June 2015: Apache Geode (Gemfire)

Page 58: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

58 © 2016 Pivotal Software, Inc. All rights reserved.

And after the announcements?

� Silence for several months …

� But in the background, a lot happened –  Number of PostgreSQL contributors join Pivotal: –  Heikki Linnakangas, Daniel Gustafsson, Dave Cramer, Atri Sharma

� Pivotal is busy releasing all products into Open Source –  June 2015: Apache Geode (Gemfire) –  October 2015 (pgconf.eu): Greenplum Database –  January 2016: Apache HAWQ (SQL on Hadoop)

Page 59: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

59 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum Open Source

Page 60: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

60 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum Open Source

� Lives on GitHub

� Based on PostgreSQL 8.2 commits –  Preserves the history

Page 61: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

61 © 2016 Pivotal Software, Inc. All rights reserved.

Page 62: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

62 © 2016 Pivotal Software, Inc. All rights reserved.

Future of Greenplum - 2016

2003

Merge with PostgreSQL 8.3

going on

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

January 2016: PostgreSQL 9.5

released

Page 63: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

63 © 2016 Pivotal Software, Inc. All rights reserved.

Future of Greenplum

2003

Merge with recent PostgreSQL

versions

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Future

Page 64: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

64 © 2016 Pivotal Software, Inc. All rights reserved. 64

Why Open Source?

Page 65: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

65 © 2016 Pivotal Software, Inc. All rights reserved.

Why go Open Source?

�  “Opening things up is incredibly healthy for the company” –  For all products –  Forced us to clean things up –  Having to justify the decisions help in better communicating

internally and externally

Page 66: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

66 © 2016 Pivotal Software, Inc. All rights reserved.

Why go Open Source?

� Customers more and more request Open Source products in RFPs

� Customers do not want to be locked in into one vendor or product (anymore)

� A healthy community improves the development process

� Restructure of the code to provide better development workflow

Page 67: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

67 © 2016 Pivotal Software, Inc. All rights reserved.

Why go Open Source?

� And of course: People love and live Open Source –  Pivotal Labs was a company living Open Source

Page 68: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

68 © 2016 Pivotal Software, Inc. All rights reserved. 68

Challenges Roadblocks along the way …

Page 69: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

69 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� How much time do you have?

Page 70: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

70 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� License –  Which license is compatible and sufficient –  Settled with Apache License: PostgreSQL License has no patent

protection

� Patents –  Existing patents can’t be ignored

� Project owner / Governance model –  Make it an ASF project? –  Give it to the PostgreSQL Project? (they refused)

Page 71: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

71 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� Customer names in code –  Quite a challenge with ~1000 current customers

� Dependencies on internal build system –  Only available in VPN and office network

� One single repository –  Perforce, Copies in Atlassian Stash

Page 72: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

72 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� Test systems –  Test systems incomplete –  Too many test systems –  Test systems to complex –  Test systems too big

▪  tests run for several hours ▪  GBs of test data

–  Most test systems only accessible for Pivotal

Page 73: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

73 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� CVEs –  Followed up with each and every PostgreSQL CVEs

� Static code scans –  Two different products –  Plenty of false positives

Page 74: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

74 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� Black Duck scans –  Scans for license violations

� Software with incompatible license –  One of our compression algorithm allows GPL (invasive), or

commercial license

Page 75: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

75 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� Caring parents … –  EMC is not known for being an Open Source company

� Split time between going open source and handling customer escalations

� Open Source as a strategy, business model

� What is the IP portfolio?

Page 76: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

76 © 2016 Pivotal Software, Inc. All rights reserved.

Challenges

� Tension between commercial interests and open source interests

� PostgreSQL

Page 77: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

77 © 2016 Pivotal Software, Inc. All rights reserved.

Page 78: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

78 © 2016 Pivotal Software, Inc. All rights reserved.

Ongoing challenges

� Documentation is partly open, partly closed –  Updating documentation is a challenge

� Culture change: employee -> community –  Have every product related discussion in the public

� Have code open for inspection by customers, competitors –  They can find problems, bugs, or just copy ideas –  Discussion about a bug bounty program

Page 79: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

79 © 2016 Pivotal Software, Inc. All rights reserved.

Lawyers

�  “Programmers all think they are lawyers”

� You need them in the process. Yes, really. –  Talk with them, early, and often

Page 80: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

80 © 2016 Pivotal Software, Inc. All rights reserved. 80

Lessons learned History repeats itself ...

Page 81: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

81 © 2016 Pivotal Software, Inc. All rights reserved.

Think twice before you fork

� Overwhelming opinion today is that forking PostgreSQL (and not merging new versions) was a mistake –  Over time, more and more desired features showed up in

PostgreSQL

Page 82: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

82 © 2016 Pivotal Software, Inc. All rights reserved.

Think early about where you go

� Choice between Apache Software Foundation and PostgreSQL Project –  ASF is not a good fit –  PG did not want to “adopt” GP –  Quote: in PG everything is just opinion (“503 in Canada”)

Page 83: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

83 © 2016 Pivotal Software, Inc. All rights reserved.

Development workflow

� GitHub is cool, but workflow is a bit different –  PostgreSQL (and Greenplum) do not merge –  The “Merge” button is in the way, too many people just hit it

� We missed setting up the Contributor License Agreement (CLA) process –  Explain the CLA in detail! –  This is not about signing over code to Pivotal exclusively.

Page 84: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

84 © 2016 Pivotal Software, Inc. All rights reserved.

CLA not provided

Page 85: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

85 © 2016 Pivotal Software, Inc. All rights reserved.

CLA provided

Page 86: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

86 © 2016 Pivotal Software, Inc. All rights reserved.

The small things

� Domain(s) –  We literally got greenplum.org back a few days before the release

� Website –  You think it’s still time – until it’s too late

� Mailinglists –  Which ones, and how many –  Have everyone in your company subscribed to the lists

� Talk with your lawyers, frequently

Page 87: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

87 © 2016 Pivotal Software, Inc. All rights reserved.

Go public

� We decided early to announce at pgconf.eu 2015 –  Choice between pgconfSV and pgconf.eu –  Gives you a fixed timeline – which is good and bad

� Have swag ;-) –  Marketing supported the Open Source announcement with nice

merchandising articles

Page 88: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

88 © 2016 Pivotal Software, Inc. All rights reserved. 88

Q & A Time for your questions ...

Page 89: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

89 © 2016 Pivotal Software, Inc. All rights reserved.

Thanks to (in no particular order)

� Mike Waas

� Lyublena Antova

�  John Eshleman

�  Ivan Novick

� Roman Shaposhnik

� Elisabeth Hendrickson

� Caleb Welton

� Ed Espino

� George Tuma

Page 90: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into

90 © 2016 Pivotal Software, Inc. All rights reserved.

Upcoming conferences

� PostgreSQL Conference Europe –  Tallinn, Estonia –  November 1 – 4, 2016 –  Radisson Blu Hotel Olümpia –  http://2016.pgconf.eu/

Page 91: How we made Greenplum Open Source - Andreas Scherbaum · Why go Open Source? Customers more and more request Open Source products in RFPs Customers do not want to be locked in into