The Europeana SPARQL endpoint
The Europeana SPARQL endpoint#
Europeana is an initiative of the European Union aimed at enhancing the digital access to the collections of more than 3000 cultural heritage institutions acress Europe. Over the past few years, more than 50 million objects have been accumulated from these institutions. On the platform, users can find detailed information about a wide variety of cultural heritage artefacts, including paintings, books, music, and videos and audio files. The metadata that have been collected can also be searched using SPARQL on Europeana’s SPARQL endpoint, available at http://sparql.europeana.eu/.
The cell below defines the SPARQL endpoint and creates a function to run queries against this endpoint.
from SPARQLWrapper import SPARQLWrapper, JSON
from IPython.core.display import display, HTML
import pandas as pd
import re
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
endpoint = 'http://sparql.europeana.eu/'
sparql = SPARQLWrapper(endpoint)
def run_query(sparql_query):
sparql.setQuery(sparql_query)
sparql.setReturnFormat(JSON)
data = sparql.query().convert()
df = pd.json_normalize(data['results']['bindings'])
return df
def print_results(df):
for i,row in df.iterrows():
for column in df.columns:
if re.search('value$' , column):
print( f'{column} => {row[column].strip()}' )
print('-------')
Before you start to work with the Europeana data, it is useful to familiarise yourself with the Europeana data model.
The model can be visualised as follows:
A central resource in the datamodel is edm:providedCHO
.
This edm:providedCHO
is part of two ore:Aggregations
. There is a local ore:Aggregation
, next to an edm:EuropeanaAggregation
. Europeana makes a distinction between the metadata that are supplied by the local data provider on the one hand, and the metadata that has been added by Europeana on the other.
The term ‘Aggregation’ is taken from the Object Reuse and Exchange vocabulary, which was developed to describe compound digital objects.
Both aggregations contain ore:Proxy
resources. These ore:Proxy
resources contain most of the descriptive metadata about the heritage objects. The ‘local’ Proxy (i.e. the ore:Proxy
resource connected to ore:Aggregation
) contains most of the metadata that have been supplied by the contributing instution. The ore:Proxy
resource connected to the edm:EuropeanaAggregation
contains some additional metadata.
More detailed information about the Europeana Data Model can be found in the Primer.
The explanation of this model may have seemed slightly complicated, but, to work with the Europeana data, you mainly need to know the predicates that are available for these various resources.
The ore:Proxy
in the ore:Aggregation
is described using the following properties:
dc:title
dc:creator
dc:relation
dc:type
dcterms:created
dcterms:extent
dc:publisher
dc:identifier
dc:created
dc:source
edm:type
dc:coverage
dc:description
dc:format
dc:language
dc:subject
dcterms:isPartOf
dcterms:spatial
You can use these predicates to search metadata harvested from the local data provider.
The query below selects the title, the type and the date of creation of a small list of ten objects.
query = '''
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT ?title ?type ?date
WHERE {
?object dc:title ?title .
?object dc:type ?type .
?object dcterms:created ?date .
}
ORDER BY ?title
LIMIT 10
'''
df = run_query(query)
The query shownabove searches in all the 50 million objects available in Europeana.
You can also search more specifically for certain types of objects. You can state that the objects should be of type ‘IMAGE’, for example. You can do this by making use of the dc:type property.
query = '''
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT ?title ?type ?date
WHERE {
?object dc:title ?title .
?object dc:date ?date .
?object dc:type ?type .
?object dc:identifier ?id
OPTIONAL{
?object dc:publisher ?pub .
?object dc:created ?cr . } .
?object dc:type 'IMAGE' .
}
'''
df = run_query(query)
It must be noted, however, that the values in the dc:type predicate have not been entered consistently by all institutions. As a results of this, you will not necessarily receive all images.
The ore:Proxy
object in the local ore:Aggregation has descriptive metadata supplied by the data provided, as was mentioned. Data about the data provider itself and about the rights associated with the object can be found in the ore:Aggregation
Resource this local Proxy belongs to. The ore:Proxy
is connected to the ore:Aggregation
via the ore:proxyIn
predicate.
The local ore:Aggregation
is described using the following properties:
edm:dataProvider
edm:rights
edm:isShownAt
edm:isShownBy
edm:object
edm:provider
query = '''
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT *
WHERE {
?object dc:title ?title .
?object dc:date ?date .
?object dc:type ?type .
?object dc:identifier ?id .
?object ore:proxyIn ?aggr .
?aggr edm:dataProvider ?institution .
?aggr edm:rights ?rights .
}
LIMIT 10
'''
df = run_query(query)
Once you have set up a connection between a local ore:Proxy
and a local ore:Aggregation
, you can search for objects contributed by a specific data provider, such as the Rijksmuseum.
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
SELECT ?title ?object ?local_aggr
WHERE {
?object dc:title ?title .
?object ore:proxyIn ?local_aggr .
?local_aggr edm:dataProvider "Rijksmuseum" .
}
LIMIT 40
"""
df = run_query(query)
You can also request the links of the digital objects. This link is usually provided as an object followung the edm:isShownAt
predicate.
edm:isShownAt
mostly contains a link to the landing page describing the object.
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
SELECT ?title ?url
WHERE {
?object dc:title ?title .
?object ore:proxyIn ?local_aggr .
?local_aggr edm:isShownAt ?url .
?local_aggr edm:dataProvider "Rijksmuseum" .
}
LIMIT 40
"""
df = run_query(query)
The ore:Proxy
resource that is connected to a ore:Aggregation
is also a proxy for an edm:aggregatedCHO
resource. This latter resource is also part of an edm:EuropeanaAggregation
. This resource can be described using the following predicates:
edm:collectionName
edm:country
edm:landingPage
edm:language
Once you have found the edm:EuropeanaAggregation
associated with an edm:aggregatedCHO
, you search for objects managed by a cultural heritage institution in a specific country, or for objects in a specific langauge.
The SPARQL query below searches for images supplied by an institution located in Belgium in the Dutch language.
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
SELECT ?title ?creator ?img_url ?eur_aggr ?local_aggr
WHERE {
?object edm:type "IMAGE" .
?object dc:title ?title .
?object dc:creator ?creator .
?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .
?local_aggr edm:provider ?glam_inst .
?eur_aggr edm:country 'belgium' .
?eur_aggr edm:language ?lang .
?eur_aggr edm:language 'nl' .
OPTIONAL{ ?local_aggr edm:object ?img_url . } .
}
"""
df = run_query(query)