Value of the Graph Database #3. “Query Processing Performance”
In an organizational data operation environment, you can easily see a surging volume of information and an increasing number of complicated queries on new jobs, which is likely to end up with deteriorating performance.
What if you need an advanced analysis beyond simple information queries on a large amount of data?
What if you are searching for insight through the relationships between data?
Graph database achieves a fast query processing performance through its graph data structure.
In this third posting on the value of graph database, we would like to introduce the query processing performance, which does not require a separate join operation according to the network structure centered on relationship information.
In this era of big data and AI, the process of finding “why” (causality) rather than “what” (simple fact) from numerous unstructured data is increasingly important. Even if what you want to get is a simple result, a myriad of relationship analyses should be conducted internally.
Then, how is graph database (GDB) different from conventional relational database (RDB) requiring full data scans to analyze relationship information?
Graph database with a high query processing performance; why is that?
In order to find relationship information in an existing RDB, “join” operation should be conducted. Join is an operation process that uses a specific value in one table to find a value in another table.
For instance, in Fig. 1 below, the “Customer Information” table contains customer names and IDs. The customer IDs are also used in the “Purchase Info” table to indicate which customer purchased which product. Here, if you want to see all the products a customer has purchased so far, you need to perform join operation between the two tables. During this join operation in RDB, all of the individual tables are searched (full or indexed). You may think that the example below is rather simple for now. However, the amount of data will soon pile up and the number of target tables to be checked will multiply, dragging down overall performance.
[Fig. 1_ Tables on relational database]
In GDB, however, data is represented by vertices and edges, showing explicit relationships between the objects. To create relationships among the stored data, GDB simply links them with lines instead of performing the table join operation as in RDB.
For instance, in Fig. 2 below, based on the lines (edges) linking customers with products, you can quickly query a full list of the products that has been purchased by a specific customer.
[Fig. 2_ Data structure on graph database]
Unlike RDB that requires scanning of all tables with linked relationships and that demonstrates slower processing time with join relationships growing, GDB uses a traverse method that traverses vertices and relationship lines (edges) when processing users’ queries. Yes, GDB is able to get the desired result in a faster and more efficient manner than RDB.
By performing a label search to find the initial start point, it queries data according to the relationships among the respective data. It enables you to quickly deliver what the users want. even if you saw an increasing amount of data to query and it were getting more complicated.
In addition to this query method, GDB features an optimized data traversal algorithm with an optimized storage. Therefore, GDB users can enjoy a faster query processing time, compared to RDB users, especially when the data contains complicated relationships.
If you have any further questions, please feel free to contact us at firstname.lastname@example.org. 🙂