NoSQL, Big Data, Polyglot Persistence… A New Era

I have recently stumbled on a couple posts from Martin Fowler related to a NoSQL Distilled joint book authoring effort. The book announcement excites me as I am looking forward to reading it for a several reasons. Primarily because I have been like a overindulged my inner geek this past year – playing and getting a taste of what I call modern persistence systems. Learning and thinking about Brewer’s CAP theorem and its implications while setting up and trying a few things out with Cassandra and Hbase has been very fun to say the least. However, one thing has bothered me… what do we call this new era of persistence methods?
Today, I came across another post from Fowler where he discussed the term “NoSQL“, what it means in today’s persistence movements, and how he will be using it in his new book. This made me think about all the terms I have heard recently other than NoSQL. Here are a few others:
  • Polyglot Persistence – Another term I learned from reading Flower’s Bliki at least six months ago or more.
  • Big Data - Which is typically used to describe the problem of large data sets which cannot typically be accommodated by today’s traditional relational database systems due to cost, performance, and/or other operational concerns.

Another interesting discussion point, is the interest in other data store models some of which academic circles have previously explored. The recent interest, in my opinion is probably due to the changes in computers and networks since the time when the RDBMS became the preferred standard for data persistence many years ago. Ayende Rahien talks about this on a blog post. Some of different store models available (or in development) current NoSQL systems include:

There are many more discussions taking place in today challenging how we think about persistence. Even traditional physical persistence mechanism is in question. A few years ago the idea of using anything other than a hard disk or a SAN for persistent storage would have been absurd. However, a physically distributed in-memory non-relational database is a viable option (most likely this option will be geographically distributed).  Other topics of discussion include:

  • Clustering techniques
  • Distributed systems

Truth is I don’t know what to call this current persistence movement. To me it seems obvious that terms like “NoSQL”, “Big data”, and even “Polyglot Persistence” are insufficient. We are definitely now in a different era  where a new persistence paradigm has begun. The relational database management systems no longer the only option. While those systems continue to have their place in today’s computing era, their market share is starting to shrink. Relational databases may even evolve to take advantage of the advancements made in this area.

The development, discussion, and research of new persistence solutions is happening now! Challenges issued to older traditional persistence systems are inspiring the revitalization, evolution, and enhancement of the existing software. Some of the old and new will survive, a few will thrive, and many will die. These are exciting times!

Fork me on GitHub

Running Apache Cassandra 0.8.2 on Windows 7

I needed to get Apache Cassandra running locally on my Windows 7 box for development purposes. This post will cover what I did to get it running.

Prepare the Run Time Environment

Cassandra was developed in Java. This means that my environment needs to have a functional Java run time environment. According to the Cassandra documentation at the time of installation, it requires the most stable version of Java 1.6. I figured since I am a developer and our current project may need some Java development in the near future then I may as well have an up to date Java development environment. So I surfed my way to Oracle’s web site and got the latest and greatest Java Development Kit (Java Platform (JDK) 7). After downloading and running the installer. I was ready to get Cassandra installed.

Running Cassandra Locally

After getting my Java environment in order it was time to get Cassandra running. The first thing I did was to download the binary package for the latest stable Cassandra release (version 0.8.2).

I then realized that I didn’t have a was to extract the contents on this current box. So the next step was to get a utility that was capable of extracting from a g-zipped tar ball. There are plenty of utilities that do this and I just grabbed 7-Zip since its free and works.

Now back to business. The next step was to extract the Cassandra binary files. I extracted them to c:\dev\cassandra\apache-cassandra-0.8.2 using the 7-zip utility. NOTE: Make sure there are no spaces in the path as this may cause problems later.

Now for everything to work correctly, you need to update the JAVA_HOME and CASSANDRA_HOME system variables. In my case, I needed to create them. To this, I did the following:

  1. Click the Start Menu
  2. Right click Computer
    1. Click Properties (this opens the Control Panel in the System and Security > System view)
  3. Click Advanced system settings on the left side (this opens the System Properties dialog in the Advanced tab)
  4. Click the Environment Variables button (this opens Environment Variables  dialog)
  5. Under the System variablesgroup:
    1. Click New… (This opens the New System Variable dialog)
    2. Input JAVA_HOME for the Variable name text box
    3. Input the path to your java installation for the Variable value text box (I entered C:\Program Files\Java\jdk1.7.0)
    4. Click OK
  6. Under the System variablesgroup:
    1. Click New… (This opens the New System Variable dialog)
    2. Input CASSANDRA_HOME for the Variable name text box
    3. Input the path to your Cassandra extraction for the Variable value text box (I entered C:\dev\cassandra\apache-cassandra-0.8.2)
    4. Click OK
  7. Click OK
  8. Click OK
  9. Close the Control Panel

We are almost ready. Before running Cassandra, the storage configuration must be modified to make sure that any UNIX style paths are replaced with their corresponding Windows style paths. This is done by opening up the cassandra.yaml file in your favorite text editor and looking for those paths. My file was located in C:\dev\cassandra\apache-cassandra-0.8.2\conf\cassandra.yaml. I had to make the following changes:

  1. On line 72, I changed /var/lib/cassandra/data to C:\dev\cassandra\apache-cassandra-0.8.2\data
  2. On line 75, I changed /var/lib/cassandra/commitlog to C:\dev\cassandra\apache-cassandra-0.8.2\commitlog
  3. On line 78, I changed /var/lib/cassandra/saved_caches to C:\dev\cassandra\apache-cassandra-0.8.2\saved_caches

At this point you should be able to start up and run Cassandra. This can be done via the command prompt like this:

cd \dev\cassandra\apache-cassandra-0.8.2\bin
cassandra.bat

You can also start the client from the command prompt like this:

cd \dev\cassandra\apache-cassandra-0.8.2\bin
cassandra-cli.bat
connect localhost/9160;

What’s next? Now you can do your development tasks and use your local Cassandra instance as needed. I will probably use Topshelf or something similar to get Cassandra running as a service so I don’t have to start it up manually every time I need it. Another option for this is to use RunAsAService.

References

Here are some links that I used to learn how to do this: