What is a Knowledge Graph?
A ‘knowledge’ often depended on people’s memories. As people began to record what they remember or memorized, such records were saved in a computer, and documents were created and shared infinitely. There are many IoT devices available in the world that has access to these large pools of knowledge available in the net.
The knowledge graph is a hot topic in the current market. Large IT techs such as Google and Microsoft are using knowledge graphs to manage the information produced from human knowledge accurately.
Prior to understanding the knowledge graph, we must begin with the knowledge base first. The knowledge base is a database that stores domain information known by individuals or used by companies, expertise accumulated by experts in a specific field, and data for problem-solving. The knowledge base expresses knowledge in data structures, which becomes more expandable whenever new knowledge is added. Such a knowledge base system built in a graph is known as the knowledge graph.
The knowledge graph can gather fragmented data and form a graph modeling and it can also connect and store the knowledge base in a network-type structure. This technology integrates data using a graph data model or topology. It converts, accumulates, and extracts various types of data into the knowledge graph by connecting nodes and edges for quick knowledge transfer. In addition, the knowledge graph is a representative method for creating artificial intelligence and it is widely used in intelligent services that require a combination of various technologies such as ML/DL and cloud computing.
4 Steps to Designing a Knowledge Graph
There are four stages to constructing a knowledge graph. First, the purpose of the knowledge graph needs to be set, and the knowledge structure designed.
Secondly, data must be processed. However, since most of the knowledge information is in text, proceed with natural language processing and tagging. In addition to the text data, other data such as transactional details must also be pre-processed so that they can be included in knowledge. After all processing is done, reconstruct the data structure.
Thirdly, once the data structure is secured, go through data modeling. When the connected relationship of data is loaded via modeling, applying a prediction algorithm may lead up to a new discovery of relationship(s).
Finally, run a test on a knowledge graph by searching for information in need. Consider visualizing a knowledge graph before launching it at a service stage.
Representing Knowledge Graph in RDF & Graph DB
The two most representative knowledge graphs are RDF and graph DB. However, these two are applied for different purposes. Below are the comparison chart and graph modeling to show the difference between the two.
The difference between RDF and graph DB is summarized in the following table.
Category |
RDF |
GDB |
Purpose |
Easily storable in RDB when expressing connected data | handles semantic web efficiently |
Data Model |
Triplet Schema | Property Graph Model |
Performance |
gets slower as depth increases due to recursive table search | graph path algorithm optimizes graph structure |
Example of Data Model |
– More attributes mean more complex the set of data gets
– all data must be stored as nodes or edges |
– Using properties makes it systematically light
– able to express logically and intuitively |
Pros & Cons |
– Store natural language efficiently
– limited to expressing graph structure – since it wasn’t developed in the form of a database, it lacks as a management system |
– provides service based on pattern search
– Schemaless storing method allows flexible handling even when new data are added – Relational modeling is complex and difficult to configure |
The RDF is generally suitable when there is a fixed amount of data (example: academic purposes). The graph DB is efficient in environments where data is newly generated (example: business).
Additionally, the RDF has to store all data in separate nodes and edges because it does not have the properties and label features like that of graph DB. Due to the lack of property setting, RDF is at a disadvantage when data increases rapidly. As the number of nodes and edges increases, the modeling will become more complex. As more depths increase, performance degradation may occur.
The difference between the RDF and graph DB is visibly clear when expressed in a graph structure. Let’s take a look at the knowledge graph modeling below.
In the image above, an RDB table is shown consisting of wine sample data. From left to right, each column is showing the wine number, type, importer, vintage, country of origin, and price of the wines.
Below the table and to the left, a knowledge graph with RDF is shown. Wine 1 has five columns of data which are expressed in five separate edges. RDF can be easy to see when there is limited data, but imagine if the RDB table had 100 or more wine data instead of four. Even now, it seems the above image is sufficiently full, but if more attributes are expanded, the modeling can get more complicated.
In the knowledge graph with graph DB, two attributes (vintage, price) are included in the wine 1 node as properties. The graph DB excels in reducing edges, making overall modeling cleaner and more concise than the RDF. Also, if there is a need to see connection of wines with common importers or country of origin, these properties can be separated into other nodes. The beauty of a knowledge graph with graph DB is that it can be expressed in various layouts.
Further application on the Wine dataset
The payment data is added in the knowledge graph above. The payment data in the scenario contains data of purchased wine and the food that paired with that specific wine. As you can see, wine 1 and wine 2 are served with steaks. One can get a few insights based on such results. Recommending a red wine to a steak eater or other categories of red wines to those who order steak can prove to be an effective service.
Through the knowledge graph with reasoning and much more detailed datasets, restaurants, business, and institutions, etc will be able to provide helpful service to their customers or buyers more efficiently.