1 Introduction
The CIDOC Conceptual Reference Model (CRM) is a conceptual data model used in the cultural heritage domain. The Resource Description Framework (RDF) is a graph-based data format. Both CRM and RDF have been created independently for integration of information. RDF is a good fit to express CRM in data and it has been used to do so. The expression of CRM in RDF is not trivial though, so some guidelines are needed. This document provides an application profile to best use CRM in RDF for integration with other RDF data. The primary use case is data integration into the Knowledge Graph of NFDI4Objects.
This document is still in an early draft. Contributions and feedback are very welcome! The document sources are managed in a git repository at https://github.com/nfdi4objects/crm-rdf-ap.
Expressing CRM and RDF
CRM defines abstract types of entities (CRM classes) such as events, measurements, places, and actors with relationship types (CRM properties) to connect instances of these entity types. RDF and its most common extensions define how to identify entities (resources), entity types (RDF classes) and relationship types (CRM properties) with URIs and values with Unicode strings (RDF literals) optionally having a language or a data type to encode values such as numbers and dates. RDF is used with ontologies that define RDF classes, properties, and constraints. CRM looks like an ontology or like it could directly be mapped to an RDF ontology, but this is not the case. CRM is agnostic to data formats: CRM classes are not RDF classes and CRM has no concept of data types and values, so any expression of CRM in RDF comes with choices of design. It is possible to express the same information modeled with CRM in different forms of RDF, so data cannot be integrated flawlessly.
Some recommendation exist to express CRM in RDF (Doerr, Light, and Hiebel (2020)) and to combine it with other ontologies (e.g. Nys, Ruymbeke, and Billen (2018)).
2 Primitive values
E59 Primitive Value and its subclasses are not expressed as RDF classes. Instead
instances of E62 String are expressed as RDF literals with optional language tag, and
instances of E60 Number are expressed as RDF literals with numeric data type such as
xsd:integer
The CRM classes E61 Time Primitive, E94 Space Primitive, and E95 Spacetime Primitive are both subclasses of E59 Primitive Value and of E41 Appellation, so the latter can be used when a mapping to established RDF data types is not applicable.
Temporal data
Temporal values (instances of of E61 Time Primitive and E52 Time-Span in CRM) SHOULD be expressed by RDF literals with one of the data types from XML Schema (XSD) Data Types or from Extended Date/Time Format (EDTF) listed in Table 1.
Datatype | Description | Example |
---|---|---|
xsd:date |
Year, month and day and optional time zone | 2010-12-17 |
xsd:time |
Time with o | 13:20:00-05:00 |
xsd:dateTime |
Full date, time, and optional time zone | 1912-04-15T02:38–05:18 |
xsd:dateTimeStamp |
Full date, time and mandatory time zone | 1912-04-15T02:38–05:18Z |
xsd:gYear |
Year and optional time zone | 2010 |
xsd:gYearMonth |
Year, month, and optional time zone | 2010-12 |
xsd:gMonth |
Month and optional time zone | --12 |
xsd:gMonthDay |
Month, day, and optional time zone | --12-17 |
xsd:gDay |
Day and optional time zone | ---17 |
edtf:EDTF |
Complex temporal value in EDTF syntax | 2024~ |
edtf:EDTF-level0 |
EDTF limited to level 1 features | 2010-12 |
edtf:EDTF-level1 |
EDTF limited to level 1 and 2 features | 2010-12? |
edtf:EDTF-level2 |
EDTF with all features up to level 2 | 15XX-?12 |
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix edtf: <http://id.loc.gov/datatypes/edtf/> .
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<TitanticSinking> a crm:E81_Transformation ;
crm:P124_transformed <RMSTitanic> ;
crm:P123_resulted_in <TitanticWreck> ;
crm:P4_has_time-span
"1912-04-15"^^xsd:date , # date only
"1912-04-15T02:38–05:18Z"^^xsd:dateTimeStamp , # precise date and time with timezone
"1912-04-15?"^^edtf:EDTF . # uncertain date with EDTF
More complex temporal values MAY be expressed as instance of time:TemporalEntity or its subclasses from Time Ontology as discussed by Nys, Ruymbeke, and Billen (2018) and described in EDTF in RDF. RDF literals are preferred because it is easer to derive time:TemporalEntity than the other way round.
<WW1> a time:Interval ; # instead of E52_Time-Span
edtfo:hasEDTFDateTimeDescription "1914/1918" ;
time:hasBeginning [ time:inXSDgYear "1914"^^xsd:gYear ] ;
time:hasEnd [ time:inXSDgYear "1918"^^xsd:gYear ] .
Support of simple temporal data with XSD data types is not part of SPARQL 1.2 specification (see proposal SEP-0002) but included in most SPARQL processors, so typed date values – in contrast to plain strings – can directly be calculated with.
Doerr, Light, and Hiebel (2020) recommended to use additional properties for temporal intervals (E52 Time-Span):
P81a_end_of_the_begin
together withP81b_begin_of_the_end
instead of P81 ongoing throughoutP82a_begin_of_the_begin
together withP82b_end_of_the_end
instead of P82 at some time within
The use of these properties may lead to false assumption of precision and it introduced a solitary solution to a problem also addressed outside of CIDOC. For this reasons these additional properties should not be used in favour of EDTF and/or Time Ontology.
Temporal CRM properties SHOULD be expressed with corresponding properties from Time Ontology:
Spatial data
Instances of E94 Space Primitive should be expressed using GeoSPARQL Ontology as instance of geo:Geometry, compatible with various geographic data formats (WKT, GeoJSON, GML…).1 CRM Property P168 place is defined by should be expressed with RDF property geo:hasGeometry
. CRM Properties P171 at some place within, and P172 contains can be used as RDF properties to link places (E53 Place) to outer and inner geometries but geo:hasBoundingBox
and geo:hasCentroid
should be preferred, if applicable.
The preferred serialization of spatial coordinates is WKT because this allows for spatial queries with GeoSPARQL. GeoJSON can be derived automatically for display in web applications. For simple WKT POINT coordinates in WGS 84 coordinate system, data providers MAY use the Basic Geo (WGS84 lat/long) Vocabulary in addition.
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<TitanticWreckLocation> a crm:E53_Place ;
crm:P89_falls_within <AtlanticOcean> ;
geo:hasGeometry [
a geo:Geometry ;
geo:asWKT "POINT (-49.946944 41.7325 -3803)"^^geo:wktLiteral ;
geo:asGeoJSON '{"type": "Point","coordinates": [-49.946944,41.7325,-3803]}' ;
wgs84:long "-49.946944" ; wgs84:lat "41.7325" ; wgs84:alt "-3803"
] .
GeoSPARQL properties geo:hasMetricSpatialResolution
and/or geo:hasSpatialAccuracy
can be used to indicate level of detail.
Geotemporal data
The CRM class E95 Spacetime Primitive and its corresponding property P169i spacetime volume is defined by MUST NOT be used in RDF. Their purpose in CRM is to define the time and place of an abstract E92 Spacetime Volume or one of its subclasses. In RDF this can be done by combination of P4 has time-span or P160 has temporal projection for time (see Temporal data) and P161 has spatial projection or geo:hasGeometry for place (see Spatial data):
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
<AssassinationOfArchdukeFranzFerdinand>
crm:P4_has_time-span "1914-06-28"^^xsd:date ;
geo:hasGeometry [
# 43°51'28.5"N 18°25'43.9"E
geo:asWKT "POINT (18.4283426 43.8576859)"^^geo:wktLiteral
] .
Applications MAY assume:
crm:E92_Spacetime_Volume rdfs:subClassOf
geosparql:SpatialObject ,
time:TemporalEntity .
The CRMgeo extension of CRM combines CRM and GeoSPARQL in a similar but more complex way (Hiebel2016?).
4 CRM Classes to use with caution
E58 Measurement Unit
Defintion of instances of E58 Measurement Unit should be avoided but either taken from an established vocabulary of units such as QUDT or expressed as RDF value with UCUM datatype.2
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix cdt: <https://w3id.org/cdt/> .
<TitanticSinking>
crm:P191_had_duration [ a crm:E54_Dimension ;
crm:P90_has_value 160 ; crm:P91_has_unit unit:MIN ; # value and QUDT unit
rdf:value "7 min"^^cdt:ucum # UCUM string
] .
E41 Appellation
E41 Appellation and its subclasses (E35 Title and E42 Identifier) should be avoided (see above for additional subclasses E61 Time Primitive, E94 Space Primitive, and E94 Space Primitive), unless a name cannot uniquely be identified with a sequence of Unicode characters and an optional language tag:
<RMSTitantic>
crm:P102_has_title "RMS Titanic"@en ;
crm:P1_is_identified_by [
rdfs:value "MGY" ;
crm:P2_has_type <http://www.wikidata.org/entity/Q353659> # call sign
] .
If there are multiple names with one preferred name per language and optional name alias, use skos:prefLabel
and skos:altLabel
:
<RMSTitantic>
skos:prefLabel "RMS Titanic"@en ;
skos:altLabel "Titanic"@en, "Royal Mail Steamship Titanic"@en .
The RDF property skos:prefLabel
should not be confused with [P48 has preferred identifier] to be used for identifiers only.
If information about the act of naming is required, use E13 Attribute Assignment for simple appelations or E15 Identifier Assignment for identifiers.
If an identifier E42 Identifier is an URI meant to identify an RDF resource, dont use plain strings but resource URIs in RDF. If a resource happens to have multiple equivalent URIs, choose a preferred URI and use owl:sameAs
to record aliases:
<RMSTitantic> a crm:E18_Physical Thing ;
owl:sameAs
<http://www.wikidata.org/entity/Q3018259> ,
<http://kbpedia.org/kko/rc/RMS-Titanic-TheShip> .
instead of
<RMSTitanic> a crm:E18_Physical Thing .
crm:P1_is_identified_by
[ a crm:E42_Identifier ;
crm:P190_has_symbolic_content "http://www.wikidata.org/entity/Q3018259" ] ,
[ a crm:E42_Identifier ;
crm:P190_has_symbolic_content "http://kbpedia.org/kko/rc/RMS-Titanic-TheShip" ] .
5 Deprecated CRM classes
CRM is constantly evolving, so some CRM classes have been renamed or replaced. Outdated classes and properties MUST be supported nevertheless to integrate data that has already been published.
See “Versions of the CIDOC-CRM” (2025) for a list of CRM versions.
6 Bibliographic References
The encoding of bibliographic data is out of the scope of CRM. LRMoo (formerly known as FRBRoo) extends CRM to express the IFLA Library Reference Model (LRM) for bibliographic information managed by libraries (Aalberg 2024). The model is based on four levels of description called WEMI (Work, Expression, Manifestation, Item) instead of one class, so the model is not practical for simple bibliographic references (citation data). As long as bibliographic entities are not the core object of investigation, it is enough to express publications as instance of E31 Document and express details with an established RDF ontologies for citation data. The preferred choice is the Bibliographic Ontology (BIBO). Additional ontologies exist for more details, for instance the Citation Typing Ontology (CiTO) for citations between publications.
Data providers MUST NOT create their own classes and properties to model bibliographic references with CRM but use BIBO. Applications MAY use the following CRM classes and statements to link BIBO and its corresponding ontologies with CRM:
dcterms:Agent rdfs:subClassOf crm:E77_Persistent_Item .
foaf:Agent rdfs:subClassOf crm:E77_Persistent_Item .
crm:E39_Actor rdfs:subClassof dcterms:Agent .
crm:E39_Actor rdfs:subClassof foaf:Agent .
foaf:Person owl:equivalentClass crm:E74_Person .
event:Event owl:equivalentClass crm:E5_Event .
bibo:Document rdfs:subClassof crm:E31_Document .
bibo:Collection rdfs:subClassof crm:E31_Document .
BIBO refers to individual classes and properties from other ontologies (FOAF, DCTERMS, PRISM…). Data providers MUST use these classes and properties in bibliographic references but they MAY include additional RDF statements with corresponding classes and properties from CRM.
Entities (authors, publishers…) SHOULD be referenced by established URI (DOI, ORCID, ROR…) like shown in the following example of a proceedings article:
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://doi.org/10.1145/3543873.3585579> a bibo:Article ;
dct:title "Wikidata: The Making Of" ;
dct:creator
<https://orcid.org/0000-0002-9593-2294> ,
<https://orcid.org/0000-0002-3939-2115> ,
<https://orcid.org/0000-0002-9172-2601> ;
bibo:authorList (
<https://orcid.org/0000-0002-9593-2294>
<https://orcid.org/0000-0002-3939-2115>
<https://orcid.org/0000-0002-9172-2601>
) ;
bibo:pages "615–624" ;
dct:isPartOf <https://doi.org/10.1145/3543873> .
<https://doi.org/10.1145/3543873> a bibo:Proceedings ;
dct:date "2023-03-30"^^xsd:date ;
dct:publisher <https://ror.org/03wsadn68> .
bibo:isbn13 "978-1-4503-9419-2" ;
dct:title "Companion Proceedings of the ACM Web Conference 2023" ;
dct:isPartOf <https://dblp.org/streams/conf/www> .
<https://ror.org/03wsadn68> a foaf:Organization ;
foaf:name "Association for Computing Machinery" .
<https://dblp.org/streams/conf/www> a bibo:Series ;
dct:title "WWW '23 Companion" .
<https://orcid.org/0000-0002-9593-2294> a foaf:Person ;
foaf:familyName "Vrandečić" ; foaf:givenName "Denny" .
<https://orcid.org/0000-0002-3939-2115> a foaf:Person ;
foaf:familyName "Pintscher" ; foaf:givenName "Lydia" .
<https://orcid.org/0000-0002-9172-2601> a foaf:Person ;
foaf:familyName "Krötzsch" ; foaf:givenName "Markus" .
If structured data is not available, bibliographic references can also be expressed with blank nodes having a plain string value rdfs:label
:
_:123 a bibo:Document
rdfs:label "D. Vrandečić, L. Pintscher, and M. Krötzsch. 2023. Wikidata ..." .
The citation management software Zotero can import a large number of formats and export BIBO RDF.
To link bibliographic references to other CRM entities use P70 documents.
7 Differences to the official encoding of CRM in RDF
An official encoding of CRM in RDF is published since version 7.1.2, managed at https://gitlab.isl.ics.forth.gr/cidoc-crm/cidoc_crm_rdf/ (Doerr, Light, and Hiebel (2020)). The encoding of CRM in RDF described in this document differes by introduction of SKOS:
E55 Type has been replaced by Concept and the latter is defined subclass of E28 Conceptual Object (instead of defining Concept a subclass of E55 Type).
E32 Authority Document has been replaced by ConceptScheme and the latter is defined subclass of E31 Document
P71 lists has been replaced by inScheme and the latter is defined subproperty of P67 refers to.
P172 has broader term has been replaced by broader (instead of defining the former superproperty of the latter)
P172i has narrower term has been replaced by narrower (instead of defining the former superproperty of the latter)
In addition the use of non-standard temporal properties such as P81a_end_of_the_begin
and P82b_end_of_the_end
will likely be discouraged in favour of Time Ontology and EDTF.
Rationales: integration with terminologies and simplification of queries.
8 References
Footnotes
See also CRM Geo draft at http://www.cidoc-crm.org/extensions/crmgeo/, defining superclasses of
geo:Geometry
.↩︎