How to Identify and Visualize Clusters in Knowledge Graphs
In this blog post, we will delve into the fascinating world of identifying and visualizing different clusters of cancer types through the analysis of disease ontology as a knowledge graph. By setting up Neo4j in a docker container, importing the ontology, generating graph clusters and embeddings, and using dimension reduction techniques, we can plot these clusters and derive insights. While disease_ontology serves as our example, the steps outlined can be applied to explore any ontology or graph database.
In a graph database, data is stored as nodes and relationships between nodes, allowing us to visualize connections that are not explicitly mentioned in the data. For instance, melanoma and carcinoma are subcategories of cell type cancer tumor, indicating a relationship between these cancer types.
Ontologies, formalized sets of concepts and relationships, play a crucial role in biological sciences. The disease ontology showcases the interrelations between different disease types, aiding in data extraction and interpretation.
Neo4j, a powerful tool for managing graph databases, can be easily set up using a docker container, simplifying the process for analysis.
docker run \
-it -rm \
-publish=7474:7474 -publish=7687:7687 \
-env NEO4J_AUTH=neo4j/123456789 \
-env NEO4J_PLUGINS='["graph-data-science","apoc","n10s"]' \
neo4j:5.17.0
Once Neo4j is up and running, you can import the disease ontology using the n10s plugin, enabling you to explore the ontology or embed your data within it.
To continue reading and uncover the intriguing insights from the clusters and embeddings generated, visit the full code at GitHub.