CIDOC-CRM in RDF Application Profile
Background and Motivation
The CIDOC Conceptual Reference Model (CRM) is a conceptual data model used in the cultural heritage domain. The Resource Description Framework (RDF) is a graph-based data format. Both CRM and RDF have been created independently for integration of information. RDF is a good fit to express CRM in data and it has been used to do so. The expression of CRM in RDF is not trivial though, so some guidelines are needed. This document is going to provide an application profile to best use CRM in RDF for integration with other RDF data within NFDI4Objects. The document is going to be compared and aligned with similar recommendations such as Doerr, Light, and Hiebel (2020).
The document is managed in a git repository at https://github.com/nfdi4objects/crm-rdf-ap. Contributions and feedback is very welcome!
Difficulties in expressing CRM and RDF
CRM defines abstract types of entities (CRM classes) such as events, measurements, places, and actors with relationship types (CRM properties) to connect instances of these entity types. RDF and its most common extensions define how to identify entities (resources), entity types (RDF classes) and relationship types (CRM properties) with URIs and values with Unicode strings (RDF literals) optionally having a language or a data type to encode values such as numbers and dates. RDF is used with ontologies that define RDF classes, properties, and constraints. CRM looks like an ontology or like it could directly be mapped to an RDF ontology, but this is not the case. CRM is agnostic to data formats: CRM classes are not RDF classes and CRM has no concept of data types and values, so any expression of CRM in RDF comes with choices of design. It is possible to express the same information modeled with CRM in different forms of RDF, so data cannot be integrated flawlessly.
Guidelines
Primitive values
E59 Primitive Value and its subclasses are not expressed as RDF classes. Instead
instances of E62 String are expressed as RDF literals with optional language tag, and
instances of E60 Number are expressed as RDF literals with numeric data type such as
xsd:integer
The CRM classes E61 Time Primitive, E94 Space Primitive, and E95 Spacetime Primitive are both subclasses of E59 Primitive Value and of E41 Appellation, so the latter can be used when a mapping to established RDF data types is not applicable.
Instances of E61 Time Primitive and E52 Time-Span are better expressed as RDF literals of type xsd:date
, xsd:time
, xsd:dateTime
, or xsd:dateTimeStamp
if applicable. More complex time values should be expressed using the Extended Date/Time Format (EDTF) but there is no established method to calculate with dates in RDF yet.1 CRM includes its own classes and properties to model more complex temporal values so this has not been decided yet.
@prefix edtf: <http://id.loc.gov/datatypes/edtf/>
@prefix unit: <http://qudt.org/vocab/unit/> .
<TitanticSinking> a crm:E81_Transformation ;
crm:P124_transformed <RMSTitanic> ;
crm:P123_resulted_in <TitanticWreck> .
crm:P4_has_time-span
"1912-04-15"^^xsd:date , # or ^^edtf:EDTF (subsumes xsd:date)
[
a crm:E52_Time-Span ;
crm:P82_at_some_time_within "1912-04-15"^^xsd:date
] ;
# TODO: add exact time of sinking (02:38–05:18 GMT)
.
Instances of E94 Space Primitive should be expressed using GeoSPARQL Ontology as instance of geo:hasGeometry
, compatible with various geographic data formats (WKT, GeoJSON, GML…).2 CRM Property P168 place is defined by should be expressed with RDF property geo:hasGeometry
. CRM Properties P171 at some place within, and P172 contains can be used as RDF properties to link places (E53 Place) to outer and inner geometries but geo:hasBoundingBox
and geo:hasCentroid
should be preferred, if applicable.
<TitanticWreckLocation> a crm:E53_Place ;
crm:P89_falls_within <AtlanticOcean> ;
geo:hasGeometry [ a geo:Geometry ;
geo:asGeoJSON '{"type": "Point","coordinates": [-49.946944,41.7325,-3803]}' ;
geo:asWKT "POINT (-49.946944 41.7325 -3803)" ;
] .
GeoSPARQL properties geo:hasMetricSpatialResolution
and/or geo:hasSpatialAccuracy
can be used to indicate level of detail.
Instances of E95 Spacetime Primitive … (TODO)
CRM Classes to use with caution
E58 Measurement Unit
Defintion of instances of E58 Measurement Unit should be avoided but either taken from an established vocabulary of units such as QUDT or expressed as RDF value with UCUM datatype.3
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix cdt: <https://w3id.org/cdt/> .
<TitanticSinking>
crm:P191_had_duration [ a crm:E54_Dimension ;
crm:P90_has_value 160 ; crm:P91_has_unit unit:MIN ; # value and QUDT unit
rdf:value "7 min"^^cdt:ucum # UCUM string
] .
E41 Appellation
E41 Appellation and its subclasses (E35 Title and E42 Identifier) should be avoided (see above for additional subclasses E61 Time Primitive, E94 Space Primitive, and E94 Space Primitive), unless a name cannot uniquely be identified with a sequence of Unicode characters and an optional language tag:
<RMSTitantic>
crm:P102_has_title "RMS Titanic"@en ;
crm:P1_is_identified_by [
rdfs:value "MGY" ;
crm:P2_has_type <http://www.wikidata.org/entity/Q353659> # call sign
] .
If there are multiple names with one preferred name per language and optional name alias, use skos:prefLabel
and skos:altLabel
:
<RMSTitantic>
skos:prefLabel "RMS Titanic"@en ;
skos:altLabel "Titanic"@en, "Royal Mail Steamship Titanic"@en .
The RDF property skos:prefLabel
should not be confused with [P48 has preferred identifier] to be used for identifiers only.
If information about the act of naming is required, use E13 Attribute Assignment for simple appelations or E15 Identifier Assignment for identifiers.
If an identifier E42 Identifier is an URI meant to identify an RDF resource, dont use plain strings but resource URIs in RDF. If a resource happens to have multiple equivalent URIs, choose a preferred URI and use owl:sameAs
to record aliases:
<RMSTitantic> a crm:E18_Physical Thing ;
owl:sameAs
<http://www.wikidata.org/entity/Q3018259> ,
<http://kbpedia.org/kko/rc/RMS-Titanic-TheShip> .
instead of
<RMSTitanic> a crm:E18_Physical Thing .
crm:P1_is_identified_by
[ a crm:E42_Identifier ;
crm:P190_has_symbolic_content "http://www.wikidata.org/entity/Q3018259" ] ,
[ a crm:E42_Identifier ;
crm:P190_has_symbolic_content "http://kbpedia.org/kko/rc/RMS-Titanic-TheShip" ] .
Deprecated CRM classes
CRM is constantly evolving, so some CRM classes have been renamed or replaced. Outdated classes and properties MUST be supported nevertheless.
References
Footnotes
See discussion to extend SPARQL for simple dates and EDTF in RDF.↩︎
See also CRM Geo draft at http://www.cidoc-crm.org/extensions/crmgeo/, defining superclasses of
geo:Geometry
.↩︎