- How to load not only categories but also pages (as in Wikipedia articles) and enrich the graph by querying DBpedia. In doing this I’ll describe some advanced usage of APOC procedures.
- How to batch load the whole Wikipedia hierarchy of categories into Neo4j
Say we have a dataset of multi-tagged items: books with multiple genres, articles with multiple topics, products with multiple categories… We want to organise logically these tags -the genres, the topics, the categories…- in a descriptive but also actionable way. A typical organisation will be hierarchical, like a taxonomy.
But rather than building it manually, we are going to learn it from the data in an automated way. This means that the quality of the results will totally depend on the quality and distribution of the tagging in your data, so sometimes we’ll produce a rich taxonomy but sometimes the data will only yield a set of rules describing how tags relate to each other.
Finally, we’ll want to show how this taxonomy can be used and I’ll do it with an example on content recommendation / enhanced search. Continue reading “QuickGraph#5 Learning a taxonomy from your tagged data”
As in previous posts, for those of you less familiar with the differences and similarities between RDF and the Property Graph, I recommend you watch this talk I gave at Graph Connect San Francisco in October 2016.
In the previous post on this series, I showed the most basic way in which a portion of your graph can be exposed as RDF. That was identifying a node by ID or URI if your data was imported from an RDF dataset. In this one, I’ll explore a more interesting way by running Cypher queries and serialising the resulting subgraph as RDF. Continue reading “Neo4j is your RDF store (part 2)”
If you want to understand the differences and similarities between RDF and the Labeled Property Graph implemented by Neo4j, I’d recommend you watch this talk I gave at Graph Connect San Francisco in October 2016.
Let me start with some basics: RDF is a standard for data exchange, but it does not impose any particular way of storing data.
What do I mean by that? I mean that data can be persisted in many ways: tables, documents, key-value pairs, property graphs, triple graphs… and still be published/exchanged as RDF. Continue reading “Neo4j is your RDF store (part 1)”
For this example I am going to use my browser history data. Most browsers store this data in SQLite. This means relational data, easy to access from Neo4j using the apoc.load.jdbc stored procedure. Continue reading “QuickGraph#4 Explore your browser history in Neo4j”
For this example, I am going to use a sample movie dataset from the Cayley project. It’s a set of half a million triples about actors, directors and movies that can be downloaded here. Continue reading “QuickGraph#3 A step-by-step example of RDF to Property Graph transformation”
For this QuickGraph I’ll use data about Wikipedia Categories. You may have noticed at the bottom of every Wikipedia article a section listing the categories it’s classified under. Every Wikipedia article will have at least one category, and categories branch into subcategories forming overlapping trees. It is sometimes possible for a category (and the Wikipedia hierarchy is an example of this) to be a subcategory of more than one parent category, so the hierarchy is effectively a graph. Continue reading “QuickGraph#2 How is Wikipedia’s knowledge organised”
The first of a series of quick graphs in Neo4j built from public data. Watch this space! I’ll analyse a dataset on European politics by building a graph and querying across a number of dimensions. Continue reading “QuickGraph #1 European Politics from DBpedia. Loading data from an RDF triple store into Neo4j via SPARQL”