Skip to end of metadata
Go to start of metadata

Imprecision in E60 Number and E61 Time Primitive

Vladimir's questions

Sent to CRM SIG crm-sig@ics.forth.gr
Very often in the museum domain measurements are imprecise, so dimensions must be expressed as an interval.

Imprecise Dimension

E54 Dimension says "The properties of the class E54 Dimension allow for expressing the numerical approximation of the values of an instance of E54 Dimension".
My understanding is that can only happen through: E54 Dimension. P90 has value: E60 Number
E60 Number says "... including intervals of these values to express limited precision".

Time Spans

Regarding time spans, CIDOC CRM allows imprecision to be expressed in two ways:

Imprecise Duration

E52 Time-Span. P83 had at least duration. E54 Dimension
E52 Time-Span. P84 had at most duration. E54 Dimension

IMHO this pair of properties is unnecessary, since:

  • E54 Dimension already accomodates (or should accommodate) imprecision, see 1
  • If we have this pair, then shouldn't we also split P43 has dimension in two (has minimum dimension, has maximum dimension)?
  • The pair allows "P91 has unit" of the two Dimensions to differ, which I think is unnecessary
    ("between 1 and 2 cm" is used often, but who'd say "between 1 cm and 1 meter"?)

Imprecise Start/End

As depicted in the CRM Tutorial (online at slide27@crmt) two properties allow to express the Outer & Inner bounds of a Time-Span:
E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)
E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)

Each of the bounds has start/end. This is confirmed by the spec:
E61 Time Primitive says "... interval logic to express date ranges"

RDFS/OWL implementations

Let's see what the current RDFS/OWL implementations of CIDOC CRM offer
(neither one allows E54 Dimension to express a numerical approximation, i.e. item 1):

OWL2 DL proposal

http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1.rdf

OWL DL

http://erlangen-crm.org/current
P90_has_value is a Data Property

RDFS

http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs
"The primitive values "E60 Number"... are interpreted as rdf: literal.

BMX

Seme4 defined a CRM extension for the British Museum (called BMX), see http://crm.rkbexplorer.com.
It defines several extension properties (prefix PX):

PX.min_value, PX.max_value

PX.min_value, PX.max_value are defined as subPropertyOf P90F.has_value.

  • If you assert e.g. min_value=35 and max_value=45, that would infer
    both has_value=35 and has_value=45, which I think is strange.
    Instead I'd leave has_value independent, and set it to the average of min_value and max_value using some calculation
  • This implements the requirement Imprecise Dimension, but is it faithful to CIDOC CRM?
    CIDOC CRM says the imprecision should be captured in the domain of P90.has_value, not through parallel properties

PX.time-span_earliest, PX.time-span_latest

PX.time-span_earliest, PX.time-span_latest are defined as properties of E52.Time-Span.

  • (Actually these are defined merely as rdf:Property and don't specify the domain and range).
  • these properties are superfluous, given P81 ongoing throughout and P82 at some time within
  • they don’t allow to capture outer & inner bound, as per 3
  • they are unrelated to CIDOC CRM properties, so the extension is not CRM Compatible.
    A compatibility condition from the CRM Intro is:
    "all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"

Looking for Solution

CIDOC CRM leaves an important question (imprecise dimensions) unspecified,
hidden in the scope notes of primitives E60 Number and E61 Time Primitive.
This shouldn't be dismissed as "mere RDF implemenattion issue" since it is important for practical CRM interoperability.

What would be the best way to represent imprecision?

Proposal

If we define E60 Number and E61 Time Primitive as RDF classes, that would imply minimal changes to CIDOC CRM.

  • E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value (average or expected)
  • E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and maybe rdf:value
  • (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration
    Is there a better way?

Disadvantage

I'm sure that people who expect P57 has number of parts. to be a simple xsd:integer
will be very unhappy to suddenly find a class E60.Number (and rightly so!)
But E60.Number also gives examples of complex numbers, 3D coordinates, etc... So it really is not a literal, it needs to be a class

Your comments/advice will be appreciated.
I googled "E60 site:http://lists.ics.forth.gr/pipermail/crm-sig" and couldn't find relevant discussion.

Martin's response

Dear Vladimir,
Thank you very much for your important questions. As a general remark I'd like to remind you that the CIDOC CRM as a standard is an ontology in the narrower sense, a formal model approximating a human conceptualization, and not a standard database schema. Any implementation, in particular any RDF Schema, is again an approximation of this conceptualization. The CRM has a much wider scope and longer life-cycle than RDF. In Relational Databases, quite different issues occur.
The Definition of the CIDOC CRM makes very clear that "Primitive Values" are dependent on the capabilities of the respective IT infrastructure.
These details cannot be standardized in the same way as the CRM, because the change in shorter periods of time than the ones for which we want to have conceptual interoperability, not bitwise interoperability.
Therefore the CRM refers loosely to concepts of time and number in a mathematical sense. So far, no database implementation is compatible with all mathematical numerical systems. Rather, we can make mathematical models of the database implementations and by that devise algorithms to mediate between different implementations.

E60 Number says "... including intervals of these values to express limited precision".

This means that you have, according to your application, to specialize the respective concepts and available primitive values. Different Dimensions need different numeric systems.

2. Imprecise Duration

E52 Time-Span. P83 had at least duration. E54 Dimension

E52 Time-Span. P84 had at most duration. E54 Dimension

IMHO this pair of properties is unnecessary, since:

- E54 Dimension already accomodates (or should accommodate) imprecision, see

Good point! We'll make this any issue.

If we have this pair, then shouldn't we also split P43 has dimension in two (has minimum dimension, has maximum dimension)?

Not so good, because "Dimension" is the concept of the actual dimension of something at some time, and the interval is the uncertainty about it. It is not, that the dimension itself would vary. In cases of multi-dimensional values, such as color vectors (HSI) etc., the uncertainty may be an odd area. Restricting that to minimum-maximum in the ontology, would make such more complex cases incompatible with the CRM. Time, in contrast, has one dimension (except for in science fiction).
The CRM follows the principle of "minimal commitment" by Thomas Gruber here.

3. Imprecise Start/End

E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)

E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)

Each of the bounds has start/end. This is confirmed by the spec:

E61 Time Primitive says "... interval logic to express date ranges"

There are several large-scale Relational Databases that have implemented precisely that.

4. OWL2 DL proposal

This is not work of CRM-SIG or ISO, but adequate, see below.

5. OWL DL

This is not work of CRM-SIG or ISO either

6. RDFS

"The primitive values "E60 Number"... are interpreted as rdf: literal.

This is work of CRM-SIG. It is adequate for data transport, because rdf does not have the necessary constructs, and in a literal we can encode any numbering system. This is the standard way how for instance xsd:DateTime is added to RDF.

Seme4 defined a CRM extension for the British Museum (called BMX). It defines several extension properties (prefix PX):

7. PX.min_value, PX.max_value as subPropertyOf P90F.has_value.

- If you assert e.g. min_value=35 and max_value=45, that would infer both has_value=35 and has_value=45, which I think is strange.

This is not strange. If you read careful the Definition of the CRM, it is clearly stated that in an implementation multiple values of the same unique property have to be interpreted as alternatives. Hence, the result is correct. Both values are possible, like multiple fathers...

Instead I'd leave has_value independent, and set it to the average of min_value and max_value using some calculation

An average of an uncertainty interval does in general not make sense. It makes only sense if an hypothesis about the nature of the deviation from the true value exists, which requires knowledge of the measurement process.

- This implements the requirement 1, but is it faithful to CIDOC CRM?

CIDOC CRM says the imprecision should be captured in the domain of P90.has_value, not through parallel properties

Sure, but we do not have (any more) the machines that provide interval values. Necessarily, we can only write transformation algorithms between different solutions. The CRM does not intend to standardize the impossible.

8. PX.time-span_earliest, PX.time-span_latest as properties of E52.Time-Span.

- they are unrelated to CIDOC CRM properties, so the extension is not CRM Compatible.

A compatibility condition from the CRM Intro is:

"all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"

The CRM does not prescribe any property. Not implementing inner bounds does not violate comaptibility.
Note, that the subsumption requirement ends at primitive values, because they are out of scope of the CRM (this should may be stated more explicitly).
"Subsumed by CRM properties" must be seen algorithmically, since the CRM is not bound to a particular KR language. We can write an algorithm, that transforms instances of pairs of PX.time-span_earliest, PX.time-span_latest into instances of P81, encoding the interval into a literal with the intended meaning of a Time Primitive. Thereby data transport and data transformation is supported.

If we want to query in addition a real database implementation for dates, we need a practical implementation.

CIDOC CRM leaves an important question (imprecise dimensions) unspecified, hidden in the scope notes of primitives E60 Number and E61 Time Primitive. This shouldn't be dismissed as "mere RDF implemenattion issue" since it is important for practical CRM interoperability.

Practical interoperability is a task of applications. The CRM-SIG does not "dismiss" that. It is highly interested in that. But it will definitely not propose a standard serving a particular encoding form and database, which causes then incompatibilities with other implementations.

It is a task for particular implementer communities to provide their solutions and suggest for adoption by others. If a consensus is achieved on this level, CRM-SIG will make recommendations.

If we define E60 Number and E61 Time Primitive as RDF classes, that would imply minimal changes to CIDOC CRM.

- E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value (average or expected)

- E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and maybe rdf:value

This causes the maximal number of joins, highly inefficient for querying and data entry, introducing at the end of the chain properties that have no possible subsumption with existing CRM properties. In my eyes the worst case, because retrieving with the query one instance of P90, but not being able to write a SPARQL condition directly on this value solves nothing except for the paper exercise of "minimal change" to the CRM.

- (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration

yes

10. I'm sure that people who expect P57 has number of parts. to be a simple xsd:integer will be very unhappy to suddenly find a class E60.Number (and rightly so!)

Please note that what the user finds in a user interface is explicitly not the concern of the CRM. ONLY because such concerns have been excluded, the CRM could ever be standardized.

Your GUI has to provide the adequate filters. The CRM has NEVER been recommended as a data entry form!

But E60.Number also gives examples of complex numbers, 3D coordinates, etc... So it really is not a literal, it needs to be a class

Exactly. This is why an implementation has to specialize E60 Number on a case by case basis. The CRM does not want to deal with that.

I googled "E60 site:http://lists.ics.forth.gr/pipermail/crm-sig" and couldn't find relevant discussion.

Most discussions are in the meetings. You may like to read the meeting minutes.

Best wishes and thank your for your comments!

Proposed recommendation

Martin: We do have a recommendation for RDF implementetations of P81, P82, which is out for vote.
See attached: Recommendation_time_spans.docx, time_spans.rdfs

How to represent start/finish (min/max) times of an Event

Using P116 starts (is started by)@crm is wrong. It just brings another E2 into the picture, without getting us closer to capturing the time. P116 says: "This property allows the starting point for a E2 Temporal Entity to be situated by reference to the starting point of another temporal entity of longer duration. This property is only necessary if the time span is unknown"

We have three options:

  1. Use "P82a begin of the begin", "P82b end of the end" (domain E52.Time-Span, range xsd:dateTime), as defined by Martin in response to my questions.
    It is important to note that xsd:dateTime can express years BC by allowing a negative sign
  2. Use PX.time-span_earliest_int, PX.time-span_latest_int (domain E52.Time-Span, range xsd:integer) from BMX.
    Inferior solution since the properties are not standard, nor is the used range
  3. Define a class Time Primitive with properties min_date, max_date (my proposal).
    Inferior solution, since it involves a useless level of indirection
  4. Mariana:
    Time edges
    crm:E2_Temporal_Entity crm:P116_starts crm:E2_Temporal_Entity .
    crm:E2_Temporal_Entity crm:P4_has_time_span crm:E52_Time-Span .
    crm:E52_Time-Span crm:P81_ongoing_throughout crm: E59_Time-Primitive/(String, or sxd:date) .
    crm:E2_Temporal_Entity crm:P115_finishes crm:E2_Temporal_Entity .
    crm:E2_Temporal_Entity crm:P4_has_time_span crm:E52_Time-Span .
    crm:E52_Time-Span crm:P81_ongoing_throughout crm: E59_Time-Primitive/(String, or sxd:date) .

no new properties have been introduced.

Guess we'll go with 1

Adopted Solution

For the time being with aregoing with 1

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.