In the previous post, we discuss what is the graph database and what benefits it can provide. In this post, we’ll present brief history of databases and compassions of the graph database with other NoSQL databases.
The seminal paper of Dr. E.F. Codd, “A Relational Model of Data for Large Shared Data Banks”
(image from http://se-thoughtograph.blogspot.com/2011/07/sql-past-present-and-future.html)
Before the relational model, IBM’s mainframe uses the hierarchical or network data model. Some people say that the network data model in the 1970s is similar to the current graph model. But the network model is not the same with the current graph model, because back then there was no declarative query language for the network model. So developers had to know how the data is stored physically and make programs to access the data. And in 1980, the relational model came out and it gave a level of data independence that allowed users to access information without knowing the physical structure of a database. After several years, SQL was invented in IBM and it has been the standard query language of the relational database for decades. And in the 1990s there were several proposals for the web data standard from W3C like XML and RDF, and these data formats are actually graph data model in their basis. And in 1998 NoSQL boom started and there are hundreds of NoSQL databases now. In 2000 Neo technolgies was founded and started to develop their graph database, Neo4j. And they also made the declarative query language for graph model, Cypher, The Cypher borrows some concepts, like the graph pattern matching from SPARQL.
And there are currently several database vendors for graph databases including Neo4j, DSE Graph (Datastax Enterprise Graph), ArangoDB and OrientDB. Actually, ArangoDB and OrientDB are actually document stores but they added some graph database functionalities to their databases. And DSE Graph from Datastax is based on Titan and uses Cassandra as their backend storage. Neo4j’s storage engine uses fixed-size arrays to store the graph data, and can search nodes and relationships in O(1) time. This can be achieved by using arry structures not by indexes. And Neo4j insists that Neo4j is a native graph database and the others are not, because the other systems use other storage engines. And there are other graph databases which support the RDF graph data. But these are not quite popular compared to property graph databases. And there are also many graph analytics system.
Now let’s compare the graph database with other NoSQL databases. There are four categories of NoSQL databases. Document store’s data model is very simple. They deal with data as documents. And the key/value store stores data as key/value pairs. Column-family stores are alike with relational tables but they group columns into column families and support a large number of columns, which is not supported by relational databases. These NoSQL databases are designed to get maximum scalability and availability. And they sacrifice many database features like transactions, declarative query languages and join operators. And the data model is also simplified like document or key/value pairs. These are all because of the scalability.
On the other hand, graph database is somewhat different from these motivations. Its motivation is to give abilities to handle relationships as the first class citizen of the database. Its motivation is to use more intuitive and expressive data model than the relational data model. But other NoSQL database’s primary goal is not the data model but scalability. And the data model of NoSQL databases are forced to be simple in order to gain the scalability. Of course, thing have been changing and this categorization also becomes somewhat obsolete. Some NoSQL databases introduce their own declarative query languages like Couchbase’s N1QL and Cassadra’s CQL. So NoSQL databases keep evolving and maybe they can add the graph model in the future like ArangoDB and OrientDB. But I think that Graph database has a unique motivation and architecture, which is different from other NoSQL database’s.
I like this diagram because it shows the different motivations very well. This shows that graph database is for data complexity not for data size. It is for handling complex relationships whose size is relatively small. But I think the graph database could be expanded to the relational database’s area in the future because of its simple data model.
BITNINE GLOBAL INC., THE COMPANY SPECIALIZING IN GRAPH DATABASE
비트나인, 그래프 데이터베이스 전문 기업