Home > Structured Storage > Yet Another Guide to NoSQL Databases

Yet Another Guide to NoSQL Databases

Key Characteristics of NoSQL Databases:

  • Non-relational;
  • Distributed and highly scalable to huge data;
  • Handling high traffic and streaming data;
  • No ACID guarantees.

NoSQL Wikipedia page at http://en.wikipedia.org/wiki/NoSQL has covered the history and basic concepts of NoSQL databases. Please read it quickly and in this article, I will not repeat the basic concept again. Other than Wikipedia, the most valuable pages I can find so far on the web are:

According to [1], there are currently 122+ known NoSQL databases. The development of NoSQL databases are usually motivated by the practical need in the industry, so most NoSQL products are firstly the internal projects in companies and then open-sourced to the remaining world. Thus, there is no common standard among these products, and nearly all NoSQL database are task-specific — they are limited to fulfill their specific aim and not designed to fit all the need in non-relational world. So the careful classification of current NoSQL databases becomes valuable to generate an overview, which is the aim of this article.

All the links I give above show some categories for NoSQL databases. While sharing some common, the problem is they do not agree with each other. To obtain a better classification based on them, I follow several rules:

  1. Each category should be disjointed with others to the best;
  2. The number of products in each category is better to be balanced;
  3. Each category should be considerably imperative in whole NoSQL world.

The following table shows the result of my classification. The introductions are tailored from [1]. To the broad sense, NoSQL has seven categories. But for a narrow sense, NoSQL generally means only the first four:

  1. Tabular Store;
  2. Document Store;
  3. Key-value Store;
  4. Graph Store.

That’s because object database, XML database and multi-value database belong to the old world, and the key characteristics shown in the beginning are not very obviously reflected in them.

Tabular Store


API: Java; Query Method: MapReduce Java/any exec; Replication: HDFS Replication; Written in: Java


API: many; Query Method: MapReduce; Written in: Java; Consistency: eventually consistent; initiated by Facebook


API: Apache Thrift (Java, PHP, Perl, Python, Ruby, etc.); Query Method: HQL, native Thrift API; Replication: HDFS Replication; Concurrency: MVCC; Consistency Model: Fully consistent


A Distributed Large scale Structured Data Storage, and an open source project implementing Google’s Bigtable.


Professional Software & Services for solving business problems based on Hadoop


A Data Management and Analytics Software System, optimized for data management of big data and for big analytics


HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems


Massive parallel & flexible execution, Map/Reduce generalization and extension, consists of the PACT Programming Model and the Nephele Execution Engine.

Document Store


API: BSON; Protocol: lots of langs; Query Method: dynamic object-based language & MapReduce; Replication: Master Slave & Auto-Sharding; Written in: C++; Concurrency: Update in Place.


API: JSON; Protocol: REST; Query Method: MapReduceR of JavaScript Funcs; Replication: Master Master; Written in: Erlang; Concurrency: MVCC


API: Java & http; Protocol: http; Language: Java; Querying: Range queries; Predicates; Replication: Partitioned with consistent hashing; Consistency: Per-record strict consistency; Misc: Based on Terracotta


Uses Apache Thrift to integrate multiple backend databases such as as BerkeleyDB, MySQL


Languages: Java; Schema: Has features of an Object-Database, DocumentDB, GraphDB or Key-Value DB; Written in: Java; Query Method: Native and SQL; Misc: really fast, lightweight, ACID with recovery

Key-value Store

Amazon Dynamo

Misc: not open source, eventually consistent


Open-Source implementation of Amazons Dynamo Key-Value Store.


Open-Source implementation of Amazons Dynamo Key-Value Store. written in Erlang.


Open Source Amazon Dnamo implementation

Azure Table Storage

Collections of free form entities (row key, partition key, timestamp). Blob and Queue Storage available. Accessible via REST or ATOM.


API: Memcached API, most languages; Protocol: Memcached REST interface for cluster; Written in: C/C++, Erlang (clustering); Replication: Peer to Peer; fully consistent


API: JSON; Protocol: REST; Query Method: MapReduce term matching; Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent


API: many languages; Written in: C; Concurrency: in memory and saves asynchronous disk after a defined time. Replication: Master / Slave


Fast & Batch updates, from Google


API: Javap; Query Method: Map/Reduce inside value objects; Scaling: every node is master for its slice of namespace; Written in: Java; Concurrency: serializable transaction isolation

Graph Store


API: lots of langs; Protocol: Java embedded / REST; Query Method: SparQL, nativeJavaAPI, JRuby; Replication: typical MySQL style master/slave; Written in: Java; Concurrency: non-block reads, writes locks involved nodes/relationships until commit; Misc: ACID possible

Infinite Graph

API: Java; Protocol: Direct Language Binding; Query Method: Graph Navigation API; Written in: Java (Core C++); Data Model: Labeled Directed Multi Graph; Concurrency: Update locking on subgraphs


API: Java;  Protocol: Java Embedded; Query Method: Java API; Written in: Java/C++; Data Model: Labeled Directed Attributed Multigraph; Concurrency: yes


API: Java; http/REST; Protocol: API + XPRISO, OpenID, RSS, Atom, JSON, Java embedded; Query Method: Web user interface with html, RSS, Atom, JSON output, Java native; Replication: peer-to-peer; Written in: Java; Concurrency: concurrent reads; write lock within one MeshBase


API: Java; Written in: Java;  Query Method: Java or P2P; Replication: P2P; Concurrency: STM; Misc: especially for AI and Semantic Web.


API: C#; Protocol: C# Language Binding; Query Method: Graph Navigation API; Replication: P2P with Master Node; Written in: C#; Concurrency: Yes Misc: distributed in-memory storage; parallel graph computation


API: Java, Python, Ruby, C#, Per, Lisp; Protocol: REST; Query Method: SPARQL and Prolog; Libraries: Social Networking Analytics & GeoSpatial; Written in: Common Lisp

Object Database


API: Java, C#; Query Method: QBE (by Example), Native Queries, LINQ (.NET);  Replication: db4o2db4o; Written in: Java;  Cuncurrency: ACID serialized; Misc: embedded lib


Languages/Protocol: Java, C#, C++, Python; Schema: language class model; Replication: synchronous fault tolerant and peer to peer asynchronous. Concurrency:  optimistic and object based locks. Scaling: can add physical nodes on fly for scale out/in and migrate objects between nodes without impact to application code. Misc: MapReduce via parallel SQL like query across logical database groupings


Languages: Java, C#, C++, Python, Smalltalk, SQL access through ODBC; Schema: native language class model; direct support for references; interoperable across all language bindings. Modes: always consistent (ACID);  Concurrency: locks at cluster of objects (container) level. Scaling: unique distributed architecture, dynamic addition/removal of clients & servers, cloud environment ready. Replication: synchronous with quorum fault tolerant across peer to peer partitions


API: C# (.NET languages); Schema: Native language class model; Query method: SQL; Concurrency: Fully ACID compliant; Storage: In-memory with transactions secured on disk; Reliability: Full checkpoint recovery


API: Java, Java M, C#, Mono. Query method: OO via Perst collections, QBE, Native Queries, LINQ, native full-text search, JSQL. Replication: Async+sync (master-slave) Written in: Java, C#. Caching: Object cache (LRU; weak; strong), page pool, in-memory database. Index types: Many tree models & Time Series. Misc.: Embedded lib., encryption, automatic recovery, native full text search, on-line or off-line backup


API:  Python; Protocol:  Internal ZEO; Query Method: Direct object access; Written in:  Python, C; Concurrency:  MVCC; License: Zope Public License; Misc:Used in production since 1998

XML Database


API: Java, XQuery; Protocol: WebDAV, web services; Query method: XQuery, XPath, XPointer; Replication: lazy primary copy replication (master/replicas); Written in: Java; Concurrency: concurrent reads, writes with lock; Misc: Fully transactional persistent DOM, versioning. multiple index types, metadata and non-XML data support, unlimited horizontal scaling


API: XQuery, XML:DB API, DOM, SAX; Protocols: HTTP/REST, WebDAV, SOAP, XML-RPC, Atom; Query Method: XQuery; Written in: Java (open source), Concurrency: Concurrent reads, lock on write; Misc: Entire web applications can be written in XQuery, using XSLT, XHTML, CSS, and Javascript (for AJAX)


ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.


a fast, powerful, lightweight XML database system and XPath/XQuery processor with highly conformant support for the latest W3C Update and Full Text Recommendations. Client/Server architecture, ACID transaction support, user management, logging, Open Source, BSD-license, written in Java.

Berkeley DB XML

API: Many languages; Written in: C++; Query Method: XQuery; Replication: Master / Slave; Concurrency: MVCC

Multivalue Databases


Data Structure: MultiValued, Supports nested entities, Virtual Metadata; API: BASIC, InterCall, Socket, .NET and Java API’s; Scalability: automatic table space allocation; Protocol: Client Server, SOA,  Terminal Line, X-OFF/X-ON; Written in: C; Query Method: Native mvQuery and SQL; Replication: yes, Hot standby; Concurrency: Record and File Locking


API:  Basic, .Net, COM, Socket, ODBC; Protocol: TCP/IP, Named Pipes, Telnet; Query Method: RList, SQL & XPath; Written in: Native 4GL, C, C++, Basic+, .Net, Java;  Replication: Hot Standby; Concurrency: table &/or row locking, optionally transaction based commit & rollback; Data structure: Relational &/or MultiValue, supports nested entities; Scalability: rows and tables size dynamically


Supports nested data. Fully automated table space allocation. Concurrency control via task locks, file locks & shareable/exclusive record locks. OO programming integrated into QMBasic. QMClient connectivity from Visual Basic, PowerBasic, Delphi, PureBasic, ASP, PHP, C and more. Extended multivalue query language.


[1] List of NoSQL Databases. http://nosql-database.org/

Categories: Structured Storage Tags:
  1. 2011-11-28 at 4:19 PM

    The way I see it, NoSQL is any type of database that is not relational. It doesn’t necessarily have to be ObjectOriented. As long as it is not relational, it qualifies for NoSQL stamp. However, Key-Value Stores (another name for NoSQL) essentially store Objects as blobs πŸ™‚

    They could have called it Distributed, Highly-Scalable Object Stores. But that doesn’t sound as cool as NoSQL πŸ™‚

    NoSQL is not a new concept. Just a new name. In the 90s I used ObjectivityDB, a true OODB, for my research project. It was capable of handling huge amount of data in a distributed fashion. ObjectivityDB, by today’s definition, would be NoSQL πŸ™‚

  1. 2011-11-25 at 6:19 PM
  2. 2011-11-30 at 11:00 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: