A few months ago I gave a presentation in the Connections: Life Sciences & Healthcare virtual event. It was about building a Knowledge Graph using public RDF resources. You can watch the recording here or even reproduce the whole session following the instructions in this repository.
I went through the content again recently and I found one particular bit of that session that was specially interesting and worth spending a QuickGraph on. I’m talking of course of the reconciliation of taxonomies. Let’s dive in.
Continue reading “QuickGraph#19 Taxonomy reconciliation”
In this post i’ll give you an overview of some similarity metrics I’ve discovered when working with WordNet. Even though they were originally proposed as linguistic similarity metrics, I thought it would make sense to explore their behaviour if we generalise their use to a taxonomy-annotated dataset.
I will use public data from Wikipedia and what topic to choose on the week that Percy landed in Mars? No other than the rich domain of uncrewed spacecraft. Follow me!
Continue reading “QuickGraph#18 Semantic similarity metrics in taxonomies: A wikipedia example on uncrewed spacecraft”
Roughly a year and a half ago I posted QuickGraph#8 on how to copy all or part of your graph between neo4j DBs by serialising it as RDF with Neosemantics. It concluded on a sad note though, something along these lines: “Relationship properties will be lost in this process because RDF does not allow the representation of properties in edges”. Well, now we have RDF-star and the problem is solved. This is a brief update to that post where I explain how to overcome that hurdle.
Continue reading “QuickGraph#8 revisited: LossLess graph copy between Neo4j DBs with RDF-star”
In this second post on WordNet on Neo4j I will be focusing on querying and analysing the graph that we created in the previous post. I’ll leave for a third instalment some more advanced analysis and maybe integrations with NLTK or RDF.
Remember that you can test all the examples in this post directly on the demo server. The access credentials are wordnet/wordnet (also you’ll need to select the database of the same name). I’ve also put the queries in a Colab python notebook if you prefer to run them from there.
Let’s crack on.
Continue reading “QuickGraph#17 The English WordNet in Neo4j (part 2)”
English WordNet is a representation of the English language a lexical network. It groups words into synsets and links them according to semantic relationships such as hypernymy, antonymy and meronymy. You can actually browse through its content from the English Wordnet website. Wordnet is often used in natural language processing (NLP) applications (but also many others) and provides deep lexical information about the English language as a graph. As a graph… that sounds interesting, definitely worth a QuickGraph.
Because this is a particularly rich case I’ll break it down in at least two instalments. In the first one I’ll explain the construction of the graph in Neo4j and in the second one I’ll show some interesting ways of using it. I hope you’ll enjoy it.
Continue reading “QuickGraph#16 The English WordNet in Neo4j (part 1)”
You’ve probably heard that there are billions of pages on the web that embed structured data describing products, events, people, organisations… One of the most popular mechanisms for doing this is JSON-LD which is one of the many ways of serialising triples. Since you’re here, I’m sure you know that triples form graphs and that I like exploring graphy things…
In this QuickGraph I’ll have a look at the brand new White House pages and use Neo4j and neosemantics to analyse the structured data they embed.
Continue reading “QuickGraph#15 Analysing the structured data embedded in web pages”
Neosemantics (n10s) has been supporting RDF* for a few months now (from release 4.1.0, Sep 2020). Around the time of the release we did a live coding session going over some of the new features, one of which was RDF*. I thought I’d put a couple of examples in a quick graph similar to the ones in the video session to make it easier for people to find and give it a try. This is what you’re reading right now.
Continue reading “QuickGraph#14 Using RDF* with Neo4j”
The TESEO database is an online repository containing the details of all PhD thesis from Spanish universities. It offers an html/form based search interface where you can look up theses by author, topic, university, etc. As a UI it is rather painful to use and quite limited, I must say, but that’s another story. While we wait for an open data version of this public content we have to find workarounds to query and analyse it. This is what this QuickGraph is about.
Continue reading “QuickGraph#13 Using a SKOS taxonomy for semantic search on a document repository”
The UNESCO Thesaurus is a controlled and structured list of terms in the areas of education, culture, natural sciences, social and human sciences, communication and information. It’s used used to annotate documents and publications like the ones in the UNESDOC digital library.
The Thesaurus is available as a multilingual SKOS concept scheme and at the time of writing, the available languages were English, Spanish, French, Russian and Arabic (download link).
Continue reading “QuickGraph#12 Working with a Multilingual Thesaurus”
It’s this time of the year… when heads of state address their nations with messages of hope and reflect on the past year and the challenges ahead. I was looking for a data set to do some text analysis and I thought this could be an interesting one. I collected a few Christmas messages from some of Europe’s heads of state (to be more precise, the English translations available).
Continue reading “QuickGraph#11 The Christmas messages graph”