image description

Knowledge Graph Summarization

Querying heterogeneous and large-scale knowledge graphs is expensive. This project studies a graph summarization framework to facilitate knowledge graph search. (1) We introduce a class of reduced summaries. Characterized by approximate graph pattern matching, these summaries are capable of summarizing entities in terms of their neighborhood similarity up to a certain hop, using small and informative graph patterns. (2) We study a diversified graph summarization problem. (a) We show that there exists a 2-approximation algorithm to discover diversified summaries. We further develop an anytime sequential algorithm which discovers summaries under resource constraints. (b) We present a new parallel algorithm with quality guarantees. (3) We also develop a summary-based query evaluation scheme, which only refers to a small number of summaries. Using real-world knowledge graphs, we experimentally verify the effectiveness and efficiency of our summarization algorithms, and query processing using summaries.

What created the need for knowledge graph summarization?

Knowledge graphs are routinely used to represent entities and their relationships in knowledge bases. Unlike relational data, real-world knowledge graphs lack the support of well-defined schema and typing system. It is hard for end-users to precise queries that will lead to meaningful answers without any prior knowledge of the underlying data graph. Querying such knowledge graphs is challenging due to the ambiguity in queries, the inherent computational complexity (e.g., subgraph isomorphism) and resource constraints (e.g., data allowed to be accessed, response time) large knowledge graphs.

KGSum Example

Graph patterns can benefit knowledge search by suggesting (and can be directly queried as) highly interpretable “views”. In addition, such summaries can help users in understanding complex knowledge graphs without inspecting a large amount of data, explaining facts with interpretable evidences, and suggesting meaningful queries in mining tasks.
We introduce a new class of graph patterns, namely, reduced d-summaries, to summarize entities in terms of their neighborhood simi- larity up to a bounded hop d with minimized summary size.

How to get these summaries?

We study a diversified graph summarization problem. Given a knowledge graph, it is to discover top-k summaries that maximize a bi-criteria function, characterized by both informativeness and diversity. The diversified summarization problem remains to be NP-hard for reduced summaries. We show that diversified summarization is feasible with a single processor and with multiple processors.

KGSum algorithm

How to use these summaries?

Reduced summaries can be used in knowledge search, query suggestion and fact checking in knowledge graphs.

KGSum cross domain

Datasets

We conducted experiments using real-world graphs, including DBPedia, Yago 2.0, and Freebase (version 14-04-14). We also use BSBM e-commerce benchmark to generate synthetic knowledge graphs over products with different types, related vendors, consumers, and views. The generator is controlled by the number of nodes (up to 60 M), edges (up to 152 M), and labels drawn from an alphabet of 3080 labels. All the datasets are stored in RDF in the N-triples format, a RDF importer is used to import the data into Neo4j. The details of these datasets are shown below.

Datasets DBPedia Yago Freebase BSBM_10M BSBM_20M BSBM_30M BSBM_40M BSBM_50M BSBM_60M
# of nodes 4.86M 1.54M 40.32M 10M 20M 30M 40M 50M 60M
# of edges 15M 2.37M 63.2M 27M 50M 77M 102M 126M 152M
# of labels 676 324K 9630 520 1100 1640 2260 2860 3080

Publications

  1. Mining Summaries for Knowledge Graph Search. [Paper]
    IEEE Transactions on Knowledge and Data Engineering(TKDE), Vol.30 (Issue No. 10), 1887-1900, 2018.
    Qi Song, Yinghui Wu, Peng Lin, Luna Xin Dong, Hui Sun

  2. Parallel Graph Summarization for Knowledge Search. [Paper][Poster]
    International Workshop on Mining and Learning with Graphs (MLG), 2017.
    Qi Song, Mohammad Hossein Namaki, Peng Lin, Yinghui Wu

  3. Mining Summaries for Knowledge Graph Search. [Paper][Full Version][Slide]
    IEEE International Conference on Data Mining (ICDM), 2016.
    Qi Song, Yinghui Wu, Luna Xin Dong

Source Code

The source code of KGSum framework in Neo4j is available at: https://github.com/songqi1990/KnowGraphSum.

Acknowledgements

This project is supported in part by NSF IIS-1633629 and Google Faculty Research Award.