Skip to end of metadata
Go to start of metadata

Cranach is out, so this is for historic purposes only
We got PDF reports about two paintings, and the corresponding XML files (20110601_FR001.xml, 20110601_FR053B.XML).

Cranach Web Site Migration:

  • Data Migration
    • Both systems are Microsoft SQL server, but will be exported to XML.
    • Cranach: stored in a Gallery Systems TMS collection system. It will have around 300 records with about 100 object fields, 100 document fields and 100 image fields. XML is exported from Crystal Reports and is very bad

Concerns:

  • The XMLs are for a printed report (CrystalReports) and are not very suitable for CRM conversion. A few of the problems I can see from a cursory examination:
  • Many values are unstructured, e.g.
    [Alice Hoppe-Harnoncourt, KHM Vienna, 31.03.2010]
    #
    Dimensions of support: 58.2 x 45.4-45.9 x 4-6 cm
    The original size has on the whole been preserved
  • tags are report-oriented (Header, Section, Text, Value) and not related to the business domain
  • Consider this sample:
  • Related elements are not related in the XML except by adjacency (e.g. DESCRIPTION1 and its following Text1 that describes its purpose).
    This will make for a very brittle conversion, will likely preclude the use of XPath, and may require SAX parsing with complex state handling
  • FieldName attributes are not used consistently. E.g. DESCRIPTION (DE description) vs LONGTEXT3 (EN description)
  • Recognizing the purpose of a field is based on brittle keywords such as "Beschreibung" vs "Beschreibung (engl"
  • data is not structured (entered in free text) and consistent (DESCRIPTION vs LONGTEXT3)

Questions

  • Is there a chance the Cranach data will be exported in a better format?
  • regarding the poor suitability of Cranach XML export, will it be possible to work directly with a MS SQL dump?
  • regarding unstructured data (e.g. "Dimensions of support: 58.2 x 45.4-45.9 x 4-6 cm"):
    • Is this entered in separate fields in the application (e.g. width from/to, height from/to, depth from/to, unit cm)?
    • If it's entered as free text, who will take responsibility to clean up data quality problems?
  • Are we to download images which may be available via Cranach website and import into ResearchSpace asset management system? (approx number?)
  • What other content is expected to be on the Cranach website and anticipated migrating to ResearchSpace? (any user contributions, annotations or discussions, etc?)
  • Are we to download images which may be available via Cranach website and import into ResearchSpace asset management system? (approx number?)
  • What other content is expected to be on the Cranach website and anticipated migrating to ResearchSpace? (any user contributions, annotations or discussions, etc?)
  • we would also like clarification over Item 24 "Web Publication of Cranach using RDF data source and ResearchSpace plug-ins and components". The Stage 3 Proposal states that Cranach will be a one off publication, but then does not go on to elaborate sufficiently

Answers

  • The Cranach sample was extracted using Crystal Reports by the project themselves. The purpose of the sample was to provide a general idea as to the scope of the data conversion.
  • We have not anticipated any pre-contract data testing.
  • SQL dumps will be available during the project if required.
  • The Cranach site is currently under construction and is incomplete. It is due for completion in Germany in December.
  • Cranach images will be served from Germany from an IIP Image server, http://iipimage.sourceforge.net/
  • Rembrandt and Cranach data will be very similar and both projects are currently talking to each other regarding bringing their models closer together. The Rembrandt model is representative (but not the same) of both projects.
  • The Rembrandt and Cranach web sites are resource web sites with limited budgets for web development. Their priority is to publish the information they have collected and not act as a collaboration environment. Rembrandt is a more ‘advanced’ web site in terms of user interface but still relatively straight forward. Screen shots are attached and are representative of what the Cranach project is doing albeit on a smaller budget. The idea is that the further development of the sites beyond a simple online resource, and indeed to a collaborative resource, requires migration to the ResearchSpace environment first.
  • Both systems start with existing SQL databases. In the case of Cranach, this uses Gallery System’s TMS. Rembrandt uses the RKD database resources held on an ADLIB system. Both are Microsoft SQL. However, the base collection systems do not provide all the information that is required for publication. This means that data is exported to databases that will support the web application and which also support the additional information that the projects have collected. In the case of Cranach this is a mySQL database. For Rembrandt this is a Microsoft SQL database.
  • These databases will be available to the ResearchSpace project and will be converted to the CIDOC-CRM RDF standard as part of the contract (see points 10 and 24 in the ITT). The web sites will then be “ported” as far as possible, changing the data layers used from SQL ones to RDF ones. The port of the Rembrandt database (to a website) is not in this particular contract stage.
  • The Cranach database has far less resource to develop its web application than the Rembrandt system (it is effectively being done by a part time student). However, the Rembrandt screen shots provide an insight into what both the web sites do - which is essentially to publish data, documents and images. As such item 24 should read, “Web Publication of Cranach using RDF as the data source based on the UI being constructed by the Cranach project”, and will be amended. The site will be further developed as ResearchSpace develops in later stages.
  • In the event that the site is not available for migration then it will be removed from the project.
  • Cranach images will remain in Germany on an IIP server (as already indicated). The site on ResearchSpace would therefore need to call that server to retrieve images.
  • Publication using ResearchSpace plugins refers to the ability to use ResearchSpace tools with a public web site environment. When a plugin or tool has been developed it should be available (see the specification) to place on a web template for use by web site users as opposed to ResearchSpace users. It is assumed that a working tool (searching) can be equally used on a public web site against a narrower set of data) as it could within ResearchSpace on a wider set of data. A tool that allows annotation of images could equally be available to a published site for use by anyone on a set of images provided by the web site. However, stage 3 will concentrate on a straight port of the Cranach web site as the main priority. There will be no ResearchSpace discussion or annotation material available at publication of Cranach (since ResearchSpace currently does not exist) and therefore no additional requirement to publish this type of material.
Name Size Creator Creation Date Comment  
XML File 20110601_FR001.xml 158 kB Vladimir Alexiev Jun 22, 2011 19:45    
PDF File 20110601_FR053B.PDF 69 kB Vladimir Alexiev Jun 22, 2011 19:45  
XML File 20110601_FR053B.XML 63 kB Vladimir Alexiev Jun 22, 2011 19:45    
PDF File 20110601_FR001.pdf 104 kB Vladimir Alexiev Jun 22, 2011 19:45  
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.