CIDOC-CRM in RDF Application Profile

Background and Motivation

The CIDOC Conceptual Reference Model (CRM) is a conceptual data model used in the cultural heritage domain. The Resource Description Framework (RDF) is a graph-based data format. Both CRM and RDF have been created independently for integration of information. RDF is a good fit to express CRM in data and it has been used to do so. The expression of CRM in RDF is not trivial though, so some guidelines are needed. This document is going to provide an application profile to best use CRM in RDF for integration with other RDF data within NFDI4Objects. The document is going to be compared and aligned with similar recommendations such as Doerr, Light, and Hiebel (2020).

The document is managed in a git repository at https://github.com/nfdi4objects/crm-rdf-ap. Contributions and feedback is very welcome!

Difficulties in expressing CRM and RDF

CRM defines abstract types of entities (CRM classes) such as events, measurements, places, and actors with relationship types (CRM properties) to connect instances of these entity types. RDF and its most common extensions define how to identify entities (resources), entity types (RDF classes) and relationship types (CRM properties) with URIs and values with Unicode strings (RDF literals) optionally having a language or a data type to encode values such as numbers and dates. RDF is used with ontologies that define RDF classes, properties, and constraints. CRM looks like an ontology or like it could directly be mapped to an RDF ontology, but this is not the case. CRM is agnostic to data formats: CRM classes are not RDF classes and CRM has no concept of data types and values, so any expression of CRM in RDF comes with choices of design. It is possible to express the same information modeled with CRM in different forms of RDF, so data cannot be integrated flawlessly.

Guidelines

Primitive values

E59 Primitive Value and its subclasses are not expressed as RDF classes. Instead

  • instances of E62 String are expressed as RDF literals with optional language tag, and

  • instances of E60 Number are expressed as RDF literals with numeric data type such as xsd:integer

The CRM classes E61 Time Primitive, E94 Space Primitive, and E95 Spacetime Primitive are both subclasses of E59 Primitive Value and of E41 Appellation, so the latter can be used when a mapping to established RDF data types is not applicable.

Instances of E61 Time Primitive and E52 Time-Span are better expressed as RDF literals of type xsd:date, xsd:time, xsd:dateTime, or xsd:dateTimeStamp if applicable. More complex time values should be expressed using the Extended Date/Time Format (EDTF) but there is no established method to calculate with dates in RDF yet.1 CRM includes its own classes and properties to model more complex temporal values so this has not been decided yet.

@prefix edtf: <http://id.loc.gov/datatypes/edtf/>
@prefix unit: <http://qudt.org/vocab/unit/> .

<TitanticSinking> a crm:E81_Transformation ;
  crm:P124_transformed <RMSTitanic> ;
  crm:P123_resulted_in <TitanticWreck> .
  crm:P4_has_time-span 
    "1912-04-15"^^xsd:date , # or ^^edtf:EDTF (subsumes xsd:date)
    [
      a crm:E52_Time-Span ;
      crm:P82_at_some_time_within "1912-04-15"^^xsd:date          
    ] ;
    # TODO: add exact time of sinking (02:38–05:18 GMT)
.

Instances of E94 Space Primitive should be expressed using GeoSPARQL Ontology as instance of geo:hasGeometry, compatible with various geographic data formats (WKT, GeoJSON, GML…).2 CRM Property P168 place is defined by should be expressed with RDF property geo:hasGeometry. CRM Properties P171 at some place within, and P172 contains can be used as RDF properties to link places (E53 Place) to outer and inner geometries but geo:hasBoundingBox and geo:hasCentroid should be preferred, if applicable.

<TitanticWreckLocation> a crm:E53_Place ;
  crm:P89_falls_within <AtlanticOcean> ;
  geo:hasGeometry [ a geo:Geometry ;
    geo:asGeoJSON '{"type": "Point","coordinates": [-49.946944,41.7325,-3803]}' ;
    geo:asWKT "POINT (-49.946944 41.7325 -3803)" ;
  ] .

GeoSPARQL properties geo:hasMetricSpatialResolution and/or geo:hasSpatialAccuracy can be used to indicate level of detail.

Instances of E95 Spacetime Primitive … (TODO)

Authority files and types

CRM class E32 Authority Document and CRM property P71 lists MUST NOT be used in RDF but corresponding SKOS RDF classes skos:ConceptScheme and skos:inScheme instead. Applications MAY define skos:ConceptScheme as subclass of E31 Document and skos:inScheme as subproperty of P67 refers to.

CRM also defines class E55 Type with properties P127 has broader term and P127i has narrower term. The class, used with CRM properties P2, P137, P177, P135, P125, P32, and P42, MUST NOT be used in RDF but mapped to

  • skos:Concept and skos:broader/skos:narrower or to
  • individual RDF classes, connected with rdfs:subClassOf.

CRM Classes to use with caution

E58 Measurement Unit

Defintion of instances of E58 Measurement Unit should be avoided but either taken from an established vocabulary of units such as QUDT or expressed as RDF value with UCUM datatype.3

@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix cdt: <https://w3id.org/cdt/> .

<TitanticSinking>
  crm:P191_had_duration [ a crm:E54_Dimension ;
    crm:P90_has_value 160 ; crm:P91_has_unit unit:MIN ;   # value and QUDT unit
    rdf:value "7 min"^^cdt:ucum                           # UCUM string
  ] .

E41 Appellation

E41 Appellation and its subclasses (E35 Title and E42 Identifier) should be avoided (see above for additional subclasses E61 Time Primitive, E94 Space Primitive, and E94 Space Primitive), unless a name cannot uniquely be identified with a sequence of Unicode characters and an optional language tag:

<RMSTitantic>
  crm:P102_has_title "RMS Titanic"@en ;
  crm:P1_is_identified_by [
    rdfs:value "MGY" ; 
    crm:P2_has_type <http://www.wikidata.org/entity/Q353659> # call sign
  ] .

If there are multiple names with one preferred name per language and optional name alias, use skos:prefLabel and skos:altLabel:

<RMSTitantic>
  skos:prefLabel "RMS Titanic"@en ;
  skos:altLabel "Titanic"@en, "Royal Mail Steamship Titanic"@en .

The RDF property skos:prefLabel should not be confused with [P48 has preferred identifier] to be used for identifiers only.

If information about the act of naming is required, use E13 Attribute Assignment for simple appelations or E15 Identifier Assignment for identifiers.

If an identifier E42 Identifier is an URI meant to identify an RDF resource, dont use plain strings but resource URIs in RDF. If a resource happens to have multiple equivalent URIs, choose a preferred URI and use owl:sameAs to record aliases:

  <RMSTitantic> a crm:E18_Physical Thing ;
  owl:sameAs
    <http://www.wikidata.org/entity/Q3018259> ,
    <http://kbpedia.org/kko/rc/RMS-Titanic-TheShip> .

instead of

<RMSTitanic> a crm:E18_Physical Thing .
  crm:P1_is_identified_by
    [ a crm:E42_Identifier ;
      crm:P190_has_symbolic_content "http://www.wikidata.org/entity/Q3018259" ] ,
    [ a crm:E42_Identifier ;
      crm:P190_has_symbolic_content "http://kbpedia.org/kko/rc/RMS-Titanic-TheShip" ] .

Deprecated CRM classes

CRM is constantly evolving, so some CRM classes have been renamed or replaced. Outdated classes and properties MUST be supported nevertheless.

References

Doerr, Martin, Richard Light, and Gerald Hiebel. 2020. “Implementing the CIDOC Conceptual Reference Model in RDF.” https://cidoc-crm.org/sites/default/files/Implementing%20the%20CIDOC%20Conceptual%20Reference%20Model%20in%20RDF.pdf.

Footnotes

  1. See discussion to extend SPARQL for simple dates and EDTF in RDF.↩︎

  2. See also CRM Geo draft at http://www.cidoc-crm.org/extensions/crmgeo/, defining superclasses of geo:Geometry.↩︎

  3. See cdt:ucum and QUDT.↩︎