Home > Structured Storage > Yet Another Guide to NoSQL Databases

Yet Another Guide to NoSQL Databases

Key Characteristics of NoSQL Databases:

  • Non-relational;
  • Distributed and highly scalable to huge data;
  • Handling high traffic and streaming data;
  • No ACID guarantees.

NoSQL Wikipedia page at http://en.wikipedia.org/wiki/NoSQL has covered the history and basic concepts of NoSQL databases. Please read it quickly and in this article, I will not repeat the basic concept again. Other than Wikipedia, the most valuable pages I can find so far on the web are:

According to [1], there are currently 122+ known NoSQL databases. The development of NoSQL databases are usually motivated by the practical need in the industry, so most NoSQL products are firstly the internal projects in companies and then open-sourced to the remaining world. Thus, there is no common standard among these products, and nearly all NoSQL database are task-specific — they are limited to fulfill their specific aim and not designed to fit all the need in non-relational world. So the careful classification of current NoSQL databases becomes valuable to generate an overview, which is the aim of this article.

All the links I give above show some categories for NoSQL databases. While sharing some common, the problem is they do not agree with each other. To obtain a better classification based on them, I follow several rules:

  1. Each category should be disjointed with others to the best;
  2. The number of products in each category is better to be balanced;
  3. Each category should be considerably imperative in whole NoSQL world.

The following table shows the result of my classification. The introductions are tailored from [1]. To the broad sense, NoSQL has seven categories. But for a narrow sense, NoSQL generally means only the first four:

  1. Tabular Store;
  2. Document Store;
  3. Key-value Store;
  4. Graph Store.

That’s because object database, XML database and multi-value database belong to the old world, and the key characteristics shown in the beginning are not very obviously reflected in them.

Tabular Store

Hadoop/HBase

API: Java; Query Method: MapReduce Java/any exec; Replication: HDFS Replication; Written in: Java

Cassandra

API: many; Query Method: MapReduce; Written in: Java; Consistency: eventually consistent; initiated by Facebook

Hypertable

API: Apache Thrift (Java, PHP, Perl, Python, Ruby, etc.); Query Method: HQL, native Thrift API; Replication: HDFS Replication; Concurrency: MVCC; Consistency Model: Fully consistent

Cloudata

A Distributed Large scale Structured Data Storage, and an open source project implementing Google’s Bigtable.

Cloudera

Professional Software & Services for solving business problems based on Hadoop

SciDB

A Data Management and Analytics Software System, optimized for data management of big data and for big analytics

HPCC

HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems

Stratosphere

Massive parallel & flexible execution, Map/Reduce generalization and extension, consists of the PACT Programming Model and the Nephele Execution Engine.

Document Store

MongoDB

API: BSON; Protocol: lots of langs; Query Method: dynamic object-based language & MapReduce; Replication: Master Slave & Auto-Sharding; Written in: C++; Concurrency: Update in Place.

CouchDB

API: JSON; Protocol: REST; Query Method: MapReduceR of JavaScript Funcs; Replication: Master Master; Written in: Erlang; Concurrency: MVCC

Terrastore

API: Java & http; Protocol: http; Language: Java; Querying: Range queries; Predicates; Replication: Partitioned with consistent hashing; Consistency: Per-record strict consistency; Misc: Based on Terracotta

ThruDB

Uses Apache Thrift to integrate multiple backend databases such as as BerkeleyDB, MySQL

OrientDB

Languages: Java; Schema: Has features of an Object-Database, DocumentDB, GraphDB or Key-Value DB; Written in: Java; Query Method: Native and SQL; Misc: really fast, lightweight, ACID with recovery

Key-value Store

Amazon Dynamo

Misc: not open source, eventually consistent

Voldemort

Open-Source implementation of Amazons Dynamo Key-Value Store.

Dynomite

Open-Source implementation of Amazons Dynamo Key-Value Store. written in Erlang.

KAI

Open Source Amazon Dnamo implementation

Azure Table Storage

Collections of free form entities (row key, partition key, timestamp). Blob and Queue Storage available. Accessible via REST or ATOM.

MEMBASE

API: Memcached API, most languages; Protocol: Memcached REST interface for cluster; Written in: C/C++, Erlang (clustering); Replication: Peer to Peer; fully consistent

Riak

API: JSON; Protocol: REST; Query Method: MapReduce term matching; Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent

Redis

API: many languages; Written in: C; Concurrency: in memory and saves asynchronous disk after a defined time. Replication: Master / Slave

LevelDB

Fast & Batch updates, from Google

Chordless

API: Javap; Query Method: Map/Reduce inside value objects; Scaling: every node is master for its slice of namespace; Written in: Java; Concurrency: serializable transaction isolation

Graph Store

Neo4J

API: lots of langs; Protocol: Java embedded / REST; Query Method: SparQL, nativeJavaAPI, JRuby; Replication: typical MySQL style master/slave; Written in: Java; Concurrency: non-block reads, writes locks involved nodes/relationships until commit; Misc: ACID possible

Infinite Graph

API: Java; Protocol: Direct Language Binding; Query Method: Graph Navigation API; Written in: Java (Core C++); Data Model: Labeled Directed Multi Graph; Concurrency: Update locking on subgraphs

DEX

API: Java;  Protocol: Java Embedded; Query Method: Java API; Written in: Java/C++; Data Model: Labeled Directed Attributed Multigraph; Concurrency: yes

InfoGrid

API: Java; http/REST; Protocol: API + XPRISO, OpenID, RSS, Atom, JSON, Java embedded; Query Method: Web user interface with html, RSS, Atom, JSON output, Java native; Replication: peer-to-peer; Written in: Java; Concurrency: concurrent reads; write lock within one MeshBase

HyperGraphDB

API: Java; Written in: Java;  Query Method: Java or P2P; Replication: P2P; Concurrency: STM; Misc: especially for AI and Semantic Web.

Trinity

API: C#; Protocol: C# Language Binding; Query Method: Graph Navigation API; Replication: P2P with Master Node; Written in: C#; Concurrency: Yes Misc: distributed in-memory storage; parallel graph computation

AllegroGraph

API: Java, Python, Ruby, C#, Per, Lisp; Protocol: REST; Query Method: SPARQL and Prolog; Libraries: Social Networking Analytics & GeoSpatial; Written in: Common Lisp

Object Database

db4o

API: Java, C#; Query Method: QBE (by Example), Native Queries, LINQ (.NET);  Replication: db4o2db4o; Written in: Java;  Cuncurrency: ACID serialized; Misc: embedded lib

Versant

Languages/Protocol: Java, C#, C++, Python; Schema: language class model; Replication: synchronous fault tolerant and peer to peer asynchronous. Concurrency:  optimistic and object based locks. Scaling: can add physical nodes on fly for scale out/in and migrate objects between nodes without impact to application code. Misc: MapReduce via parallel SQL like query across logical database groupings

Objectivity

Languages: Java, C#, C++, Python, Smalltalk, SQL access through ODBC; Schema: native language class model; direct support for references; interoperable across all language bindings. Modes: always consistent (ACID);  Concurrency: locks at cluster of objects (container) level. Scaling: unique distributed architecture, dynamic addition/removal of clients & servers, cloud environment ready. Replication: synchronous with quorum fault tolerant across peer to peer partitions

Starcounter

API: C# (.NET languages); Schema: Native language class model; Query method: SQL; Concurrency: Fully ACID compliant; Storage: In-memory with transactions secured on disk; Reliability: Full checkpoint recovery

Perst

API: Java, Java M, C#, Mono. Query method: OO via Perst collections, QBE, Native Queries, LINQ, native full-text search, JSQL. Replication: Async+sync (master-slave) Written in: Java, C#. Caching: Object cache (LRU; weak; strong), page pool, in-memory database. Index types: Many tree models & Time Series. Misc.: Embedded lib., encryption, automatic recovery, native full text search, on-line or off-line backup

ZODB

API:  Python; Protocol:  Internal ZEO; Query Method: Direct object access; Written in:  Python, C; Concurrency:  MVCC; License: Zope Public License; Misc:Used in production since 1998

XML Database

EMC xDB

API: Java, XQuery; Protocol: WebDAV, web services; Query method: XQuery, XPath, XPointer; Replication: lazy primary copy replication (master/replicas); Written in: Java; Concurrency: concurrent reads, writes with lock; Misc: Fully transactional persistent DOM, versioning. multiple index types, metadata and non-XML data support, unlimited horizontal scaling

eXist

API: XQuery, XML:DB API, DOM, SAX; Protocols: HTTP/REST, WebDAV, SOAP, XML-RPC, Atom; Query Method: XQuery; Written in: Java (open source), Concurrency: Concurrent reads, lock on write; Misc: Entire web applications can be written in XQuery, using XSLT, XHTML, CSS, and Javascript (for AJAX)

Sedna

ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.

BaseX

a fast, powerful, lightweight XML database system and XPath/XQuery processor with highly conformant support for the latest W3C Update and Full Text Recommendations. Client/Server architecture, ACID transaction support, user management, logging, Open Source, BSD-license, written in Java.

Berkeley DB XML

API: Many languages; Written in: C++; Query Method: XQuery; Replication: Master / Slave; Concurrency: MVCC

Multivalue Databases

U2

Data Structure: MultiValued, Supports nested entities, Virtual Metadata; API: BASIC, InterCall, Socket, .NET and Java API’s; Scalability: automatic table space allocation; Protocol: Client Server, SOA,  Terminal Line, X-OFF/X-ON; Written in: C; Query Method: Native mvQuery and SQL; Replication: yes, Hot standby; Concurrency: Record and File Locking

OpenInsight

API:  Basic, .Net, COM, Socket, ODBC; Protocol: TCP/IP, Named Pipes, Telnet; Query Method: RList, SQL & XPath; Written in: Native 4GL, C, C++, Basic+, .Net, Java;  Replication: Hot Standby; Concurrency: table &/or row locking, optionally transaction based commit & rollback; Data structure: Relational &/or MultiValue, supports nested entities; Scalability: rows and tables size dynamically

OpenQM

Supports nested data. Fully automated table space allocation. Concurrency control via task locks, file locks & shareable/exclusive record locks. OO programming integrated into QMBasic. QMClient connectivity from Visual Basic, PowerBasic, Delphi, PureBasic, ASP, PHP, C and more. Extended multivalue query language.

References

[1] List of NoSQL Databases. http://nosql-database.org/

Categories: Structured Storage Tags:
  1. November 28, 2011 at 4:19 PM

    The way I see it, NoSQL is any type of database that is not relational. It doesn’t necessarily have to be ObjectOriented. As long as it is not relational, it qualifies for NoSQL stamp. However, Key-Value Stores (another name for NoSQL) essentially store Objects as blobs :)

    They could have called it Distributed, Highly-Scalable Object Stores. But that doesn’t sound as cool as NoSQL :)

    NoSQL is not a new concept. Just a new name. In the 90s I used ObjectivityDB, a true OODB, for my research project. It was capable of handling huge amount of data in a distributed fashion. ObjectivityDB, by today’s definition, would be NoSQL :)

  1. November 25, 2011 at 6:19 PM
  2. November 30, 2011 at 11:00 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: