Skip to end of metadata
Go to start of metadata

PROPOSAL: subclass Linear Dimension

Discussion on imprecise dimensions

In response to my question "How to represent imprecision?", Martin made the excellent proposal "Recommendation_time_spans.docx", which answers the question re "Imprecise Start/End" of time-spans.
Martin also raised the issue that P83,P84 should be merged, once E54 has its own imprecision definition.

Christian-Emil Ore objected:

I think it will be very unwise to remove P83 had at least duration, P84 had at most duration. They model the basic way historians work or field archaeologists for that matter.

I am not arguing to take away CRM expressive power. I am arguing to dispense it uniformly.

Martin:

In cases of multi-dimensional values, such as color vectors (HSI) etc., the uncertainty may be an odd area. Restricting that to minimum-maximum in the ontology, would make such more complex cases incompatible with the CRM. Time, in contrast, has one dimension (except for in science fiction).

Sure, Duration merits standardized representation of imprecision. But there are many other Dimensions that merit the same!
Many important dimensions can be represented as a number and a unit (see Examples in the proposal below).
I could argue that composite dimensions (eg point in space) should be broken down into their component dimensions (eg lat,long,alt).
But I'd not be on firm ground there, so instead I'd like to propose a new subclass of E54.Dimension: E?? Linear Dimension (see below)

Proposal Linear Dimension

CIDOC CRM Class Declarations

E?? Linear Dimension

Subclass of: E54 Dimension
Scope Note: This class comprises dimensions that can be approximated by a number and an optional numeric interval of indeterminacy enclosing the assumed true value. Such dimensions are a prevailing common case: height, width, depth, mass, time duration, etc. In fact all of SI and its ontology formalization QUDT are based on the idea that you can represent any quantity as a multiplication of a real number; and 7 base physical quantities raised to rational powers (Length, Mass, Time, Electric Current, Temperature, Amount of Substance, Luminous Intensity), called "dimensionality".

  • "Dimensionless" is a special case where each of the base quantities is raised to power 0. Specific dimensionless units include Plane Angle (eg Radian), Percentage, etc.
  • Specific units (often bearing the names of prominent physicists) are characterized by dimensionality and a conversion multiplier.

See http://qudt.org/ (heading Quantity Dimensions and table The SI System) for a comprehensive treatment.

Examples

  • Energy: 10 Joule (which has dimensionality Length^2 Mass^1 Time^-2)
  • Energy: 10 GeV (Giga electron Volt, which has the same dimensionality and conversion multiplier 1.6021765314E-10)
  • Plane Angle: 3.14159 Radian (which is Dimensioness)
    <move ALL E54 examples here, since they are in fact Linear Dimensions>
    <come up with new E54 examples that are not Linear Dimensions>

Properties

<all inherited from Dimension, including P90 has value: E60 Number>
P?? has min value: E60 Number
P?? has max value: E60 Number

CIDOC CRM Property Declarations

P83=P84 had duration (was duration of)

Domain: E52 Time-Span,
Range: E?? Linear Dimension,
Scope Note: This property describes the length of time covered by an E52 Time-Span. It allows an E52 Time-Span to be associated with an E?? Linear Dimension representing it's duration (including any imprecision) independent from the actual beginning and end.

P?? has min value (is min value of)

Domain: E?? Linear Dimension
Range: E60 Number
Scope Note: This property describes the minimum of a E?? Linear Dimension. Together with "P90 has value" and "P?? has max value", it allows to capture the expected value and an interval of imprecision. "Min" and "max" are not declared as sub-properties of "value", so that "value" can be set independently of them. This supports use cases such as:

  • PERT (Program Evaluation and Review Technique) is a statistical tool, used in project management, that is designed to analyze and represent the tasks involved in completing a given project, taking in account uncertainty. It models a task using 3 durations: O (optimistic), M (most likely) and P (pessimistic). It corresponds to an asymmetric Beta distribution, but is often approximated with a Triangular distribution. The expected value is calculated as (O+4M+P)/6

  • Assume the dimensions of a painting are given as Height=(29.9,30.1) cm and Width=(39.9,40.1) cm. We have two digital images of the whole painting (eg normal light vs X-Ray) with image annotations. We want to correlate the pixels of the images to the dimensions of the painting, so we can correlate the image annotations residing on the different images. For that it is useful to assume fixed mid-value dimensions of the painting (eg the average of min & max, or Height=30 cm and Width=40 cm), despite the uncertainty.

Examples

  • Conservation Task 1 (E7) has time-span (E52) had duration (E??) has unit day (E58) has min value 3 (optimistic duration), has value 4 (most likely duration), has max value 6 (pessimistic duration)
  • Painting 1 (E22) has dimension (E??) has type Height (E55) has unit cm (E58) has min value 29.9, has value 30, has max value 30.1

P?? has max value (is max value of)

Domain: E?? Linear Dimension
Range: E60 Number
Scope Note: see "P?? has min value"

Discussion on has_value

> http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs

> "The primitive values "E60 Number"... are interpreted as rdf: literal.

It is adequate for data transport, because rdf does not have the necessary constructs, and in a literal we can encode any numbering system.

Do you mean something like "(29.9,30.1)"? I guess by the extensibility of rdf data types I could even tack a notation
"(30.5,31)"^^myext:MinMaxInterval,
and be able to recognize what the string means.

But I have several objections to such approach:

  • Normalization principles say "keep separate values in separate fields": procesing this string is more complicated than processing separate fields
  • SPARQL cannot compare such numbers, at least not efficiently
  • Most any developer will read "...are interpreted as rdf: literal" to mean "xsd:integer" or "xsd:double", not "a string using non-standard notation"
  • The chance of two CRM systems using this exact representation is nearly zero, unless CIDOC standardizes it
  • Case in point: BMX does it with extension properties PX.min_value, PX.max_value

> PX.min_value, PX.max_value as subPropertyOf P90F.has_value.

> that would infer both has_value=29.9 and has_value=30.1, which I think is strange.

This is not strange. Multiple values of the same unique property have to be interpreted as alternatives. Hence, the result is correct.

From the semantics of min_value, max_value, we should assert a continuum of values (all reals between min and max), not just the end-points. So the inference may be sound, but it's not complete

An average of an uncertainty interval does in general not make sense.

True: that should be decided by the specific conversion/application: it should be free to set it to a middle value that is reasonable for the specific case.
There are viable use cases that require the value to be calculated (see Scope notes of P?? has min value).
I think that the value being forced to both min & max (i.e. multivalued) doesn't make sense in most applications.

Kindest regards and thanks for all the help to everyone on CRM SIG, and especially Martin! Vladimir

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.