QuickGraph#10 Enrich your Neo4j Knowledge Graph by querying Wikidata

Wikidata is a collaboratively edited knowledge base. It is a source of open data that you may want to use in your projects. Wikidata offers a query service for integrations. In this QuickGraph, I will show how to use the Wikidata Query Service to get data into Neo4j. Continue reading “QuickGraph#10 Enrich your Neo4j Knowledge Graph by querying Wikidata” →

QuickGraph#9 The fashion Knowledge Graph. Inferencing with Ontologies in Neo4j

Last winter I had the opportunity to meet Katariina Kari at a Neo4j event in Helsinki. We had a conversation about graphs, RDF, LPG… we agreed on some things… and disagreed on others 🙂 but I remember telling her that I had found very interesting a post she had published on how they were using Ontologies to drive semantic searches on the Zalando web site.

I’ll use her example from that post and show how you can implement semantic searches/recommendations in Neo4j and leverage existing Ontologies (public standards or your own). That’s what this QuickGraph is about.

I assume you have some level of familiarity with RDF and semantic technologies. Continue reading “QuickGraph#9 The fashion Knowledge Graph. Inferencing with Ontologies in Neo4j” →

QuickGraph#8 Cloning subgraphs between Neo4j instances with Cypher+RDF

I have two Neo4j instances: let’s call them instance-one and instance-two. My problem is simple, I want an easy way to copy fragments of the graph stored in instance-one to instance-two. In this post, I’ll explain here how to use:

Cypher to define the subgraph to be cloned and
RDF as the model for data exchange (serialisation format)

All with the help of the neosemantics plugin. Continue reading “QuickGraph#8 Cloning subgraphs between Neo4j instances with Cypher+RDF” →

QuickGraph#7 Creating a schema.org linked data endpoint on Neo4j

In this instalment of the QuickGraph series, I’ll show how to map a graph stored in Neo4j to an ontology (or schema, or vocabulary…) using the neosemantics extension. Continue reading “QuickGraph#7 Creating a schema.org linked data endpoint on Neo4j” →

Neo4j is your RDF store (part 3) : Thomson Reuters’ OpenPermID

If you’re new to RDF/LPG, here is a good introduction to the differences between both types of graphs.

For the last post in this series, I will work with a larger public RDF dataset in Neo4j. We’ve already seen a few times that importing an RDF dataset into Neo4j is easy, so what I will focus on in this post is what I think is the more interesting part, which is what comes after the data import, here are some highlights:

Applying transformations to the imported RDF graph to make it benefit from the LPG modelling capabilities and enriching the graph with additional complementary data sources.
Querying the graph to do complex path analysis and use graph patterns to detect data quality issues like data duplication and also to profile your dataset
Integrate Neo4j with standard BI tools to build nice charts on the output of Cypher queries on your graph.
Building an RDF API on top of your Neo4j graph.

All the code I’ll use is available on GitHub. Enjoy!

Continue reading “Neo4j is your RDF store (part 3) : Thomson Reuters’ OpenPermID” →

QuickGraph#6 Building the Wikipedia Knowledge Graph in Neo4j (QG#2 revisited)

After last week’s Neo4j online meetup, I thought I’d revisit QuickGraph#2 and update it a bit to include a couple new things:

How to load not only categories but also pages (as in Wikipedia articles) and enrich the graph by querying DBpedia. In doing this I’ll describe some advanced usage of APOC procedures.
How to batch load the whole Wikipedia hierarchy of categories into Neo4j

Continue reading “QuickGraph#6 Building the Wikipedia Knowledge Graph in Neo4j (QG#2 revisited)” →

QuickGraph#5 Learning a taxonomy from your tagged data

The Objective

Say we have a dataset of multi-tagged items: books with multiple genres, articles with multiple topics, products with multiple categories… We want to organise logically these tags -the genres, the topics, the categories…- in a descriptive but also actionable way. A typical organisation will be hierarchical, like a taxonomy.

But rather than building it manually, we are going to learn it from the data in an automated way. This means that the quality of the results will totally depend on the quality and distribution of the tagging in your data, so sometimes we’ll produce a rich taxonomy but sometimes the data will only yield a set of rules describing how tags relate to each other.

Finally, we’ll want to show how this taxonomy can be used and I’ll do it with an example on content recommendation / enhanced search. Continue reading “QuickGraph#5 Learning a taxonomy from your tagged data” →

Neo4j is your RDF store (part 2)

As in previous posts, for those of you less familiar with the differences and similarities between RDF and the Property Graph, I recommend you watch this talk I gave at Graph Connect San Francisco in October 2016.

In the previous post on this series, I showed the most basic way in which a portion of your graph can be exposed as RDF. That was identifying a node by ID or URI if your data was imported from an RDF dataset. In this one, I’ll explore a more interesting way by running Cypher queries and serialising the resulting subgraph as RDF. Continue reading “Neo4j is your RDF store (part 2)” →

Graph DB + Data Virtualization = Live dashboard for fraud analysis

The scenario

Retail banking: Your graph-based fraud detection system powered by Neo4j is being used as part of the controls run when processing line of credit applications or when accounts are provisioned. It’s job is to block -or at least to flag- potentially fraudulent submissions as they come into your systems. It’s also sending alarms to fraud operations analysts whenever unusual patterns are detected in the graph so they can be individually investigated ASAP.

This is all working great but you want other analysts in your organisation to benefit from the super rich insights that your graph database can deliver, people whose job is not to react on the spot to individual fraud threats but rather understand the bigger picture. They are probably more strategic business analysts, maybe some data scientists doing predictive analysis too and they will typically want to look at fraud patterns globally rather than individually, combine the information in your fraud detection graph with other datasources (external to the graph) for reporting purposes, to get new insights, or even to ‘learn’ new patterns by running algorithms or applying ML techniques.

In this post I’ll describe through an example how Data Virtualization can be used to integrate your Neo4j graph with other data sources providing a single unified view easy to consume by standard analytical/BI tools. Continue reading “Graph DB + Data Virtualization = Live dashboard for fraud analysis” →

Neo4j is your RDF store (part 1)

If you want to understand the differences and similarities between RDF and the Labeled Property Graph implemented by Neo4j, I’d recommend you watch this talk I gave at Graph Connect San Francisco in October 2016.

Intro

Let me start with some basics: RDF is a standard for data exchange, but it does not impose any particular way of storing data.

What do I mean by that? I mean that data can be persisted in many ways: tables, documents, key-value pairs, property graphs, triple graphs… and still be published/exchanged as RDF. Continue reading “Neo4j is your RDF store (part 1)” →