Skip to end of metadata
Go to start of metadata

Specifiation for migrating Rembrandt data to CRM

Limitations

In RS3.1 we don't migrate the following fields:

  • Various Remarks, since they map to Annotations and we have still not decided how to do it (see Alternatives in Property Types and Annotations)
    • By way of example, we have mapped <toeschrijving> Attribution, which is the most complex case.
      It includes author (source), qualification, remark, date; while others have only Remark
  • Group <link_sample_record> (12 fields), since that maps to Image annotation
  • Several fields with unknown meaning, typically "x" (see susana.ttl for details):
    <positie_signatuur_ref>, <reference_image.front>, <reference_image.back>
  • Key fields that are outside the corresponding record (so cannot be correlated reliably), and whose relation is unclear:
    <link_research_record>.<link_documentation_record.lref>, <link_research_record>.<link_documentation_record.lref>
  • files that are missing in some records, "NGL" in one, and "RKD" in all others. (We'll use only own IIP server):
    <file.image.location>, <file.application.location>

Special Selection

<collectie> Collection

The last record (CURRENT COLLECTION) has special treatment, see in susana.ttl. In 05_Susanna, the records are in chronological order.
Optional:

  • If <begindatum_in_collectie> are not in chronological order, throw an exception
  • If <einddatum_in_collectie> are not in chronological order, throw an exception

<toeschrijving> Attribution

The first <toeschrijving> record says only "Rembrandt". The second has more data, and the third is bogus (checked in XMLs 02..07). Therefore:

  • If more than one, use the second one; else use the only one.

Special Field Handling

Duplicate fields

Many fields are duplicate: NL tag at outer level, and EN tag within <object_number_RKDtechnical>.
These pairs are listed in the comments column of Rembrandt data#Reduced Sample Record, and in comments in susana.ttl.
They are handled in two different ways, described below

Merge

For multi-value fields (denoted "," in the comment): emit both, eg

Overwrite

For single-value fields (denoted "=" in the comment): emit only the NL field, ignoring EN, eg:

(here we deal with two fields at once)

For thesaurus fields, coordinate with Maria whether to use the NL or EN field, eg

  • we prefer bilingual thesarus (RKDtechnical)
  • RKD prefers Dutch thesaurus (RKDimages) since it's more authoritative (RKD are in the process of cleaning and merging thesauri)

Text fields

  • Language tags: for a free text field coming from a NL tag, emit @nl. For an EN tag, emit @en
  • If a text field includes quotes or newlines, emit it as extended Turtle string, eg
  • This includes the following fields (because of quote):
    <file.application>, <literature>, <literatuur>, <research.reason_objective><value>

Dates

Extract <datering>

The painting's date is present in two fields, that are not always consistent and that include text:

file <datering> <date>
02_Aristoteles 1653 gedateerd 1653 (dated)
03_Batseba 1643 1643 (dated)
04_HermanDoomer 1640 gedateerd 1640 (dated)
05_BadendeSusana 1636 1636 (dated)
06_Flora 1635 gedateerd 1635
07_man_met_baret rond het midden of in de tweede helft van de jaren 1630 ca. 1635-1640
08_NicolaesTulp 1632 gedateerd 1632 (dated)
09_man_in_orientaalse 1632 gedateerd 1632 (dated)
10_oude_vrouw na ca. 1631 ca. 1660
11_Andromeda 1630/1631 1630/1631
12_lachende_man 1629/1630 1629/1630

To extract a useful date:

  • use <datering> and ignore <date> (this is a random decision)
  • look for numbers (digit sequences) in <datering>
  • emit as "YYYY"^^xsd:gYear
  • handle 1 or 2 dates (P82 vs P82a&P82b)

Other Dates

For other date fields

  • Replace "/" with "-" (eg "1758/05/23" is not valid xsd:date lexical value)
  • Assign types xsd:date vs xsd:gYearMonth vs xsd:gYear depending on the date form (yyyy-mm-dd vs yyyy-mm vs yyyy)
    (More elaborate handling of date vs gYearMonth vs gYear is not for RS3.1)

P82 vs P82a&P82b

Several fields can contain one or two dates

  • if there is one date, emit
  • if there are two dates, emit

This applies to:

  • <datering>: painting (this single field can hold 2 dates, see Extract <datering>)
  • <begindatum_lijst>, <einddatum_lijst>: frame
  • <begindatum_tentoonstelling>, <einddatum_tentoonstelling>: exhibition
  • <begindatum_veiling>, <einddatum_veiling> auction
  • <research.date_begin>, <research.date_end>: research

Numbers

  • emit integer fields, eg <bedrag>=157 as eg 157), which is equivalent to "157"^^xsd:integer
  • emit floating fields, eg <hoogte>=47,2 as eg "47.2"^^xsd:double
    • convert comma to dot
  • emit keys as string, even if they are numeric, eg

Trim white space

Trim leading/trailing white-space from all fields. Best to use a parser option for this.
Useful for:

  • Space before image file (06_Flora.xml)
  • empty field (just a newline)

Missing or Empty fields

Missing or empty fields MUST NOT emit any RDF. This includes:

  • missing elements
  • totally empty elements:
  • elements having only whitespace:

    This is very important, otherwise invalid or inconsistent TTL will result

Missing Frame

If there aren't any Frame fields (<begindatum_lijst>, <einddatum_lijst>, <naam_lijstenmaker>, <lijstmateriaal>) then:

  • don't emit any statements related to part/2:
  • crm:P57_has_number_of_parts should be 1 not 2

Fake Values

Treat the following values as missing (i.e. don't emit)

  • <naam_koper> = "-"
  • <sample.name_number> = "x" (for RS3.3)

The following fields always have empty or fake value, so are simply ignored:

  • <collectie_afdrukken> <oorspronkelijke_lijst> <reference_image.back> <reference_image.front>

Files

The XMLs include references to various files, see Documentation, Files, Images#File types for details.
They are handled according the the following decision table ("content" means to check if element content starts with this):

source tag content target property extra actions (and justification)
<file.image>   rso\:P3_has_image_file
  1. Split content on " / "
    06_Flora.xml: <file.image>N-4930-00-000096-017-PYR.tif / N-4930-00-000096-018-PYR.tif</file.image>
    Some bright mind put two images in one element
  2. Replace ".tiff?" with ".jpg"
  3. If no ".tif", append ".jpg"
    07_NicolaesTulp.xml: <file.image>mh0146_front_nldetail_1997_038</file.image>
    Missing file ext
  4. Remove any path name (.*/)
    06_Flora.xml: <file.image>/pics/pyramids/careofthecollection/For%20Conservation/Rembrandt%20Project/N-4930-00-000052-PYR.tif</file.image>
<file.application> &lt rso\:P3_has_html Decode HTML entities lt gt amp
<file.application> http rso\:P3_has_url  
<file.application>     throw exception, printing the content

Counters

Some XML elements allow repetitions, which results in several nodes. We use counters to generate the URIs for these nodes.
The counters are reset to 1 at the start of every object, incremented globally for the object (no matter the nesting)

  • object: obj/priref (root of these below)
  • parts: part/n (1=painting, 2=frame)
  • <andere_benaming>, <title.other_older>: title/other/n (both XML elements use one counter)
  • <artistiek>: related/n
  • <literatuur>, <bronnen>: reference/n (both XML elements use one counter)
  • <collectie>: collection/n
  • <tentoonstellingen>: exhibition/n
  • <veiling>: acquisition/n
  • <link_research_record>: research/n
  • <link_documentation_record>: document/n
  • <link_file_record>: file/n

Thesauri

See Thesaurus Lookup function for details!

  • For thesaurus fields, call these:
    • LookupInThesaurusByLabel (String field, String label)
      for fields with simple content (eg <drager>)
    • LookupInthesaurusByLabels (String field, StringWithLang[] labels)
      for fields with <value> elements (eg <object.support>. These include multiple labels with language
  • for <iconclass_code>, generate the URI yourself
    Remove spaces, replace "(...)" with "_..._", prepend "_" and rst-iconclass: namespace
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.