I attended the Cassandra Summit 2015 in Silicon Valley a few weeks ago and wanted to highlight some takeaways before I forgot about everything other than the crazy ax routine at the keynote presentation. The keynote was a wild opening for the summit in true Silicon Valley style. There were two plywood panels with several Raspberry Pis attached to each. Each wood panel represented a physical data center. Each Raspberry Pi was a node in the Cassandra cluster. A giant dashboard showed the status of each node along with the health of the cluster. Nodes in each cluster were "disabled" by various humorous events including ax swings, coffee spills and power outages. The point of the demonstration was to illustrate the resiliency of the cluster. Unfortunately the final ax swing took out the last two nodes instead of one leaving the entire cluster nonfunctional. Apparently the next release of Cassandra will be able to run on 0 nodes!

The conference had over 6,000 attendees with over 150 sessions. There was no way I was able to attend all the sessions I was interested in. I tried to attend a sampling of sessions across the various tracks:
  • Use Cases
  • Architecture
  • Analytics
  • Development
  • Advanced Operations
  • Cassandra for Relational Developer
  • Beginner Operations
  • Tools, Internals & Theory
Cassandra, and NoSQL tools have come a long way since their inception and are quickly evolving from bleeding edge technologies to enterprise ready platforms. This is definitely evident by the number people and breadth of experience that attended the summit.

I can in no way summarize the entire summit; however, I want to highlight some of my key takeaways based on the sessions I attended.

Improving analytics capabilities

Relational data models support ad-hoc (e.g. unanticipated data access paths) and aggregate data analysis out of the box, typically at the expense of performance. NoSQL platforms gain performance benefits from a distributed architecture and data structures that are developed for certain data access paths. The inability to easily query Cassandra limited it's adoption to niche implementations. DataStax is mitigating this limitation by integrating Cassandra with other components that provide these capabilities. Specifically, Solr is used for ad-hoc search and Spark/Hadoop is used for aggregate analytics. This will pave the way for more general adoption of Cassandra.

Need to focus more on ROI

Most of the content was technically focused and not business focused. General adoption of Cassandra and NoSQL tools in general will only happen when there is clear business value added. Next year I hope to see sessions about how to make a business case for adopting Cassandra and some uses that show clear ROI. There was a use case track; however, most of the sessions focused on lessons learned.

Enterprise hardening

Relational technology has been evolving since the 1970s. That is over 40 years of enterprise hardening (e.g. data management functions) or baggage, depending on whether you are a developer or data architect, that comes along with most RDMBs like Oracle, DB2 and Teradata. Most NoSQL platforms began life as developer-centric persistence platforms with minimal build in data management functions. This is changing as NoSQL platforms gain more general adoption. Functions like security, materialized views, transactions, consistency, management consoles, metadata management and standard APIs are being implement in NoSQL tools. I wonder how long it will be before they become bloated like the current RDBMS platforms!

Common Use Case Themes

The two most common use cases for Big Data platforms are (1) reduction of operational licensing costs and (2) minimize time from data ingestion to analytics.

SQL Interfaces

There is a huge push for SQL like interfaces to Big Data platforms because it makes them available to a much larger audience. CQL 3.0 provides a very familiar SQL like interface to Cassandra and Spark has Spark SQL which minimizes the need to know Scala or Java.

Data Modeling is not dead!

There is already an official methodology and data modeling tool. Check it out it here.

In summary, NoSQL and Cassandra are here to stay. Companies like DataStax are providing enterprise ready distributions with support. This is fueling the early adoption by nimble enterprises. These organizations demand additional functionality like security, data management and automated operational functions which make the platform more robust. This in turn makes it more likely that less nimble enterprises will embrace NoSQL technologies. We started with mainframe shared tenancy and flat file storage systems with batch processing. This evolved into relational database management systems which became bloated and did not scale. This resulted in the evolution of NoSQL platforms which focuses on scalability and fault tolerance. Will history repeat itself?