If you’re new to RDF/LPG, here is a good introduction to the differences between both types of graphs.

For the last post in this series, I will work with a larger public RDF dataset in Neo4j. We’ve already seen a few times that importing an RDF dataset into Neo4j is easy, so what I will focus on in this post is what I think is the more interesting part, which is what comes after the data import, here are some highlights:

Applying transformations to the imported RDF graph to make it benefit from the LPG modelling capabilities and enriching the graph with additional complementary data sources.
Querying the graph to do complex path analysis and use graph patterns to detect data quality issues like data duplication and also to profile your dataset
Integrate Neo4j with standard BI tools to build nice charts on the output of Cypher queries on your graph.
Building an RDF API on top of your Neo4j graph.

All the code I’ll use is available on GitHub. Enjoy!

Continue reading “Neo4j is your RDF store (part 3) : Thomson Reuters’ OpenPermID” →

The scenario

Retail banking: Your graph-based fraud detection system powered by Neo4j is being used as part of the controls run when processing line of credit applications or when accounts are provisioned. It’s job is to block -or at least to flag- potentially fraudulent submissions as they come into your systems. It’s also sending alarms to fraud operations analysts whenever unusual patterns are detected in the graph so they can be individually investigated ASAP.

This is all working great but you want other analysts in your organisation to benefit from the super rich insights that your graph database can deliver, people whose job is not to react on the spot to individual fraud threats but rather understand the bigger picture. They are probably more strategic business analysts, maybe some data scientists doing predictive analysis too and they will typically want to look at fraud patterns globally rather than individually, combine the information in your fraud detection graph with other datasources (external to the graph) for reporting purposes, to get new insights, or even to ‘learn’ new patterns by running algorithms or applying ML techniques.

In this post I’ll describe through an example how Data Virtualization can be used to integrate your Neo4j graph with other data sources providing a single unified view easy to consume by standard analytical/BI tools. Continue reading “Graph DB + Data Virtualization = Live dashboard for fraud analysis” →