QuickGraph#9 The fashion Knowledge Graph. Inferencing with Ontologies in Neo4j

Last winter I had the opportunity to meet Katariina Kari at a Neo4j event in Helsinki. We had a conversation about graphs, RDF, LPG… we agreed on some things… and disagreed on others 🙂 but I remember telling her that I had found very interesting a post she had published on how they were using Ontologies to drive semantic searches on the Zalando web site.

I’ll use her example from that post and show how you can implement semantic searches/recommendations in Neo4j and leverage existing Ontologies (public standards or your own). That’s what this QuickGraph is about.

I assume you have some level of familiarity with RDF and semantic technologies.

The dataset

Let’s start with the data we’ll use. We need a set of clothes and accessories from any online retailer. For that, I wrote a quick python script to ‘harvest’ some data from the NEXT UK web site. All web sites of fashion retailers are pretty similar in structure, so it should be relatively straightforward to adapt my script to your favourite online store.

The data extracted is a listing of products from the retailer’s catalogue in the form of a simple table (CSV file). For each item, we have information like the name, the URL, the brand, the department, the category and the materials it’s made of. You can have a look at a snippet of the file here. We’ll look at how to import it into Neo4j further down.

The Clothing Materials Ontology

The Clothing Materials Ontology describes a hierarchy of materials commonly used in the clothing industry. It’s described using W3C’s OWL and it defines a multilingual taxonomy of materials. I’ve developed it using WebProtege (see screen capture below).

Screenshot 2019-11-13 at 10.25.40.png

We’re going to use this Ontology to enable multilingual semantic searches on the product catalogue. And by ‘semantic’ we mean using the explicit knowledge it contains. In this case, this knowledge is the hierarchical classification of categories.

Loading the Ontology into Neo4j

The first thing we’ll do is loading the ontology from its public URL. We’ll use the neosemantics plugin to do this.

Before we start using the plugin we need to create a constraint and initialise the Graph. Nothing to worry about, just two simple instructions. Check the n10s manual for more details on this.

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;

CALL n10s.graphconfig.init({ handleVocabUris: "IGNORE", keepLangTag: true, handleMultival: "ARRAY"});

Once this is done, we can import the ontology with a simple procedure call (onto.import) taking the url of the ontology and the serialisation format as parameters:

CALL n10s.onto.import.fetch(
    "http://www.nsmntx.org/2019/10/clothingMaterials",
    "Turtle");

This procedure loads the Ontology into Neo4j, creating a Class node for each owl:Class definition and connecting nodes related through rdfs:subClassOf with :SCO relationships.  Here’s a capture of the resulting taxonomy:

Screenshot 2019-11-13 at 11.47.12.png

The categories (classes) defined in the Ontology are described in multiple languages and we want to keep all these details in Neo4j when we run the import.

The config parameters that we need to set on the graph config procedure are { keepLangTag: true, handleMultival: 'ARRAY'}  and they specify that we want to keep all values of literal properties along with their language tags. More detail on importing Ontologies and RDF are available in the Neosemantics documentation.

Let’s look at an example of the result. Below is a fragment of the Ontology containing the definition of the Leather category and next to it, how it’s loaded into Neo4j. We can see that the multiple values for the rdfs:label property are stored in an array in Neo4j keeping the language tag annotation.

Screenshot 2019-11-13 at 11.40.12

All the details on how to use the ontologyImport procedure (and many others) can be found in the neosemantics documentation.

We can get all synthetic materials in our ontology in any of the available languages now using the utility functions in Neosemantics for handling language tags:

UNWIND ['es','en','fr'] AS lang
MATCH (w:Class { name: 'SyntheticFibre'})<-[:SCO*]-(woolVariant) 
RETURN lang, COLLECT(n10s.rdf.getLangValue(lang,woolVariant.label)) as syntheticMaterials

Screenshot 2019-11-25 at 12.32.03

Loading the data into Neo4j

Once we are done importing the Ontology, we’ll go ahead and load the product catalogue we described earlier. This is trivial using LOAD CSV:

LOAD CSV WITH HEADERS FROM "file:///next_products.csv"  AS row
MERGE (b:Brand { brandName : row.brandName })
MERGE (dep:Department { deptName: row.itemDepartmemnt })
MERGE (cat:Category { catName: row.itemCategory })
MERGE (i:Item { itemId: row.itemId }) 
      ON CREATE SET i.itemName = row.itemName, 
                    i.composition = row.itemComposition, 
                    i.url = row.url
MERGE (i)-[:IN_CAT]->(cat)
MERGE (i)-[:IN_DEPT]->(dep)
MERGE (i)-[:BY]->(b)

Once loaded, we need to link the catalogue items to the Ontology describing the materials. This cypher fragment does the job:

MATCH (c:Class) UNWIND c.label as langLabel
WITH collect( {key: toLower(n10s.rdf.getValue(langLabel)), classNode: c }) as termToClassMap
MATCH (i:Item)
FOREACH (material IN [x in termToClassMap where toLower(i.composition) contains x.key | x.classNode ] | MERGE (i)-[:CONTAINS]->(material))

For this post, I’m working with a small catalogue of 12 thousand products but if yours is in the millions you may want to batch this step using, for example, APOC’s periodic.commit procedure.

The model combining the product catalogue linked to the ontology looks something like this:

Screenshot 2019-11-13 at 10.18.10

Querying the graph

A simple cypher query on the graph could get us the products of a given category for a given brand… for instance, fleeces by Columbia.

MATCH (:Category { catName: "Fleeces"})<-[:IN_CAT]-(i:Item)-[:BY]->(:Brand { brandName: "Columbia"}) 
RETURN i.itemId as id, i.itemName as name, i.url as url, i.composition as composition

Screenshot 2019-11-13 at 23.17.59

Or we can similarly get the brands producing a certain type of product. The following query shows the top five vendors with the largest range of hoodies.

MATCH (:Category { catName: "Hoodies"})<-[:IN_CAT]-(i:Item)-[:BY]->(b:Brand) 
RETURN b.brandName as brand, count(i) as productCount ORDER BY productCount DESC LIMIT 5

Screenshot 2019-11-13 at 23.18.14

That was easy, now let’s look at some more interesting queries, involving inferencing using the Ontology.

Ontology-driven inferences

Let’s say we want to write a query that lists all leather products. But we want our query to be intelligent, “semantic”. We don’t want it to just return those products that contain the term “leather” in their description, that would be too easy ...(i:Item) WHERE toLower(i.itemName) contains "leather" ....

Even returning those that explicitly declare themselves as being composed (totally or partially) of leather would be too easy too, we just need to use the pattern ...(x)-[:CONTAINS]->(:Class { name: "Leather"})....

What we want is our query to also return products that are composed of other materials that are variants of leather, like nubuck, suede… or any other, but we don’t want to have to enumerate them because they are dynamic and will change over time and because that’s exactly the knowledge that the Ontology contains.

In an ontology, we express this notion using the rdfs:subClassOf statement, so when we state that  A rdfs:subClassOf B we mean that any ‘thing’ of type A is also of type B. The ClothingMaterial Ontology states that :Suede rdfs:subClassOf :Leather , which can be read as: things made up of suede are also made up of leather because suede is a specific type of leather. This also means that suede products or nubuck products are semantically valid search results when we query for leather products. And let’s remember, that semantically here means whatever is explicitly stated in the Ontology. There’s no black magic.

Here is what a semantic search on leather items looks like:

MATCH (leather:Class { name: "Leather"})
CALL n10s.inference.nodesInCategory(leather, { inCatRel: "CONTAINS" }) yield node AS product
WITH product MATCH (product)-[:BY]->(b:Brand)
return product.itemName AS product, b.brandName AS brand, product.composition AS composition

We first get all leather products and then we get their brand to produce the output. We are using the semantics.inference.nodesInCategory procedure. This procedure operates exactly as described in the previous paragraph, it returns all nodes explicitly or implicitly in a given category, by leveraging the explicit semantics in an Ontology. The inference.nodesInCategory procedure needs to be configured to work in our model hence the additional parameters. A detailed description of the procedure, how to configure it and additional examples is available in the neosemantics manual.

The query returns a few hundred products some explicitly described as composed of leather, others where we’ve derived the fact that they’re made of leather because their components are subcategories of leather. The following image shows some examples, along with the fragment of the Ontology used to derive the fact that they contain leather

Screenshot 2019-11-14 at 00.04.59

The interesting thing is that this query will still work and return as leather products others that contain for instance nappa, kidskin, or others as we add them to our catalogue. All we’ll need to do is to extend the definition of the concept leather in the Ontology by adding the new subcategories as they appear.

What other things can we do?

Intelligent recommendations

Let’s say we want to make our semantic search system aware of specific categories that are relevant to our customers. I heard about vegan clothes for the first time when reading Katariina Kari’s blog post: vegan clothes are those that contain no animal-based materials. Let’s use this example and create our own Ontology with a custom set of categories, for instance, “animal-based material”. Then let’s see how we can use it to enhance the search results.

In the ClothingMaterials Ontology we find a number of them that are animal-based, namely silk, wool and leather. We can define a new custom category called AnimalBasedMaterial that groups them together. Here’s how:

Screenshot 2019-11-22 at 15.54.33

The complete extended ontology is available in GitHub with the rest of the elements used this post.

We can import the fragment above with the following cypher fragment:

WITH '@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix clmat: <http://www.nsmntx.org/2019/10/clothingMaterials#> .
@prefix ccat: <http://www.nsmntx.org/customCats#> .

ccat:AnimalBasedMaterial
a owl:Class ;
rdfs:label "Animal-based material", "Materiales de origen animal"@es, "matière dorigine animale"@fr .

clmat:Leather rdfs:subClassOf ccat:AnimalBasedMaterial .

clmat:Silk rdfs:subClassOf ccat:AnimalBasedMaterial .

clmat:Wool rdfs:subClassOf ccat:AnimalBasedMaterial .
' 
AS onto
CALL n10s.onto.import.inline(onto,"Turtle") YIELD terminationStatus, triplesLoaded 
RETURN terminationStatus, triplesLoaded

Note that in this case, we are passing the Ontology fragment as payload instead of giving a URL like we did before with the ClothingMaterials Ontology. Also worth mentioning that we could have extended the ontology directly with Cypher and without the need to formalise it in OWL.

We can visualise what we are doing as adding a custom layer to our knowledge graph on top of the ClothingMaterials ontology.

Screenshot 2019-11-22 at 22.59.46.png

We can now return results of a search on trainers by Converse, Nike or NewBalance  that are vegan:

MATCH (:Category {catName:"Trainers"})<-[:IN_CAT]-(item:Item)-[:BY]->(b:Brand), (ab:Class { name: "AnimalBasedMaterial"})
WHERE b.brandName IN ["Converse","New Balance","Nike"]
AND NOT n10s.inference.inCategory(item,ab,{ inCatRel: "CONTAINS" })
RETURN item.url, item.itemName, item.composition

No trace of products with animal-based materials in their composition. Vegan-friendly result list thanks to the inferencing run on the fly based on our Ontology. Neat!

Screenshot 2019-11-25 at 09.41.29.png

Again, no change to the dataset, no reprocessing of your data to annotate it according to the new concepts, all we need to do is make our semantics explicit in the ontology and have our generic reasoners derive new facts based on these explicit semantics.  Extending the ontology with custom categories

You’ll find other definitions in the custom Ontology like seasonal categories like ‘WinterMaterials’ or ‘Summer materials’, you can try to write queries to use them or even better, create your own custom categories.

What’s interesting about this QuickGraph?

In this QuickGraph, we have shown how to leverage existing Ontologies in Neo4j, using them to run powerful inferences that can enhance semantic search engines and smart recommendations.

We have shown how to encode knowledge in an Ontology, combine it with other knowledge fragments, and shown queries that have this explicit knowledge used by the inferencing procedures in Neosemantics to easily add powerful semantic capabilities to search and recommendation tasks.

When your data is connected to, or described in terms of an Ontology, you can modify the behaviour of your queries without having to annotate/update every data point in your graph. All you need to do is add explicit semantic definitions of your data.

Give it a try (all the code is in Github) and leave your feedback,  and see you in the next QuickGraph!

4 thoughts on “QuickGraph#9 The fashion Knowledge Graph. Inferencing with Ontologies in Neo4j

  1. Hi, Jesús
    I met some problem when loading ontology into Neo4j.
    I’m not sure why I only got below object, not the multiple languages like yours.
    {“name”:”Leather”,”uri”:”http://www.nsmntx.org/2019/10/clothingMateria│
    │ls#Leather”,”label”:”Cuir”}

    Thank you for your article. 🙂

    Like

    1. Hi William, looks like you have not set the parameters to keep all values and their language tags
      { keepLangTag: true, handleMultival: ‘ARRAY’}

      You can find the code here:
      https://github.com/jbarrasa/neosemantics-python-examples/blob/master/inferencing/cypher/data_load.cypher#L8

      Also probably this kind of comments are better raised as issues in the GitHub repo: https://github.com/jbarrasa/neosemantics-python-examples/issues or in the community portal : https://community.neo4j.com/c/integrations/linked-data-rdf-ontology

      Thanks,

      JB.

      Like

Leave a comment