This time, we take a moment to briefly introduce the IBM Watson Explorer Content Analytics and discuss how data flows in the Watson Explorer (WEX) architecture.
Please refer to the URL link below for an introduction to the previously written IBM Watson Explorer and Use Case.
IBM Watson Explorer Introduction: http://bitnine.net/blog-useful-information/introduction-about-ibm-watson-explorer/
IBM Watson Explorer Use Case: http://bitnine.net/blog-computing/real-world-use-cases-of-ibm-watson-explorer/
IBM Watson Explorer High Quality Services: http://bitnine.net/blog-computing/high-quality-services-of-ibm-watson-explorer/
[About IBM Watson Explorer (WEX) Content Analytics]
- Collects/Analyzes structured and unstructured data from document, email, database, website and other storage.
- Creates contents that can crawl, import, parse and search.
- Platform is provided for data to be used in data/text analysis and search.
- Content Analytics & Enterprise search are combined* Enterprise search: Enables enterprise users to search content with a variety of structured and unstructured data.
- * Content Analytics: Understands new business from unstructured data by analyzing content.
- Summary: Analysis and search are the product that supply users with both unintentional and intentional information from structured and unstructured data.
Collection: Collection crawling, parsing, indexing, and searching data source.
Crawler: Software Program collecting information from multiple data (WebSite, document, RDBMS, social media data, csv file ).
Annotator: Software Program components that create and record Annotation and perform specific language analysis tasks.
Annotation: Information about the range of text used in the analysis.
Unstructured Information Management Architecture (UIMA) : Open Platform to implement system by analyzing unstructured data.
Common Analysis Structure (CAS) : A structure that stores all the analysis results generated by the text analysis engine.
(All data exchanges during document analysis are processed in the form of CAS)
Please refer to the URL link below for displaying some other terms.
Although the diagram may look complicated, it briefly explains that the IBM Watson Explorer Content Analytics can collect multiple data using crawlers.
It is completely possible for each crawler used in IBM Watson Explorer to collect data such as web sites, social media data, RDBMS, HDFS, document etc. However, since the crawler has crawlers that cannot be used for each version of Watson Explorer, it is necessary to refer to related documents. The collected data is processed and analyzed as a pipeline to the UMIA at the Document Processor; at this stage, the CAS type data are analyzed as input or output. After indexing analyzed data, it derives insight by providing user with search function or exporting analyzed contents to a file. The figure below briefly shows the flow of such data.
For more information about the IBM Watson Explorer, please refer to the URL below for more detailed technical data.
IBM Content Analytics Red Book (http://www.redbooks.ibm.com/abstracts/sg247877.html?Open). IBM Knowledge Center (https://www.ibm.com/support/knowledgecenter/).
BITNINE GLOBAL INC., THE COMPANY SPECIALIZING IN GRAPH DATABASE
비트나인, 그래프 데이터베이스 전문 기업