We introduced Graph Analysis Algorithm and now we will go forth explaining Graph Analysis Platform.
See link below ‘Graph Analysis Algorithm’
What is Graph analysis platform?
This section introduces the tools that exploit above graph and network analytic techniques. Big Data is very popular in these days and most platforms are required to be scalable, because of its size. The Hadoop framework is the prevalent one. Then how about big graph data? Let us look at some graph analysis platforms briefly.
▲ Feature 1. TinkerPop characters (pic from tinkerpop.apache.org)
Apache TinkerPop is an open source Graph Computing Framework. TinkerPop is structured down to Blueprints, Pipes, Gremlin, Frames, Furnace and Rexter. Blueprints is renamed to Gremlin Structure API. This is a Java API that binds to graph backends. Many popular graph databases; Neo4j, Titan, OrientDB, etc. connected to Blueprints.
Other modules provide functions for graph dataflow, graph traverse, graph computing with Gremlin graph traversal language. TinkerPop has also hadoop plugin Hadoop-Gremlin. When the data in the Hadoop cluster represents a TinkerPop graph, then the plugin can be used to process the graph.
▲ Feature 2. GraphX logo (pic from http://spark.apache.org)
Apache Spark is a general engine for the large-scale data processing. Spark was developed from Hadoop and much faster than Hadoop MapReduce. That performance was possible because Spark removes disk I/O. With minimum disk I/O, Spark still secures data durability with RDD.
The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.
GraphX is the one of the components in Spark for graphs and graph-parallel computation. GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.
Mazerunner is a Neo4j unmanaged extension and distributed graph processing platform that extends Neo4j to do big data graph processing jobs while persisting the results back to Neo4j. Mazerunner uses a message broker to distribute graph processing jobs to Apache Spark’s GraphX module.
Pregel is a model suitable for large-scale graph computing and a description of its production quality, scalable, fault-tolerant implementation. This paper suggests the main idea super-step. With this idea, many graph analysis algorithms are able to transform for distributed systems.
BITNINE GLOBAL INC., THE COMPANY SPECIALIZING IN GRAPH DATABASE
비트나인, 그래프 데이터베이스 전문 기업