Skip to end of metadata
Go to start of metadata

Old mapping

renamed BM-data-full-old.rar

  • Includes thesauri: biographical, bibliographical, thesaurusAndplace
  • Test with one file (PrintsAndDrawings_133.rdf): 23576123 bytes RDF, 37302191 bytes NT (1.5822x more)
    perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}"|sort|uniq
    • this file is 1.13156 x bigger than average
    • museum objects: 2440 (estimated total: 371.9k Prints and Drawings objects)
    • triples: 306386
    • thesarus nodes: 2186: person-institution, department, thesauri, unit, dimension, event, exhibition, bibliography, series
    • unique literals: 18612
    • object-related nodes: 20922
    • blank nodes: 15066
  • ratios
    • 76.95 bytes RDF per triple
    • 125.57 triples per object
    • about 25 nodes per object
  • Totals
    • 14251190272 RDF/XML (14Gb unzipped, 338Mb rar)
    • 684 files: 20835073.5 bytes per file
    • 185.2 M triples
    • 36.87 M nodes
    • 1.475 M objects

New mapping

http://dl.dropbox.com/u/57052428/P%26D.rar (renamed to BM-data-PrintsAndDrawings.rar), only Prints and Drawings objects

riot --validate "P&D_133.rdf"
riot "P&D_133.rdf" > "P&D_133.nt"  
perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}" "P&D_133.nt" | sort | uniq > "P&D-subjects.txt"
perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$2\n}" "P&D_133.nt" | sort | uniq > "P&D-props.txt"
  • counts
    • size: 25.7Mb, just about average
    • museum objects: 2318 (estimated total: 387.1k Prints and Drawings objects)
    • triples: 251404
    • unique literals: 22468
    • object-related nodes: 36730+787 (codex/object + title)
    • blank nodes: 11897
  • averages
    • 25723*1024/2514040 = 104.77 bytes RDF per triple
    • 108.45 triples per object
  • Data growth
    • 167/153 files = 1.092
    • 4278/3594 Gb = 1.190
    • 387.1/371.9 k Prints and Drawings objects = 1.04
    • new estimated total: 1.534 M objects

Newest mapping

Count statements and unique entities (nodes & literals). Works for ttl and trig files:

ls -1 *.t* | xargs -n 1 riot.bat >> 0statements.nt
wc -l 0statements.nt 
perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}" 0statements.nt | sort | uniq > 0subjects.txt
wc -l 0subjects.txt
  • 8000 objects (1k per department: AES AOA ASIA CM GR ME PD PE)
  • 710184 triples
  • 194207 unique
  • 89 triples per object

Full Set

  • zipped 0.9Gb, unzipped 24Gb. 5929 files
  • about 2M objects

Thesauri (occurrences of skos:Concept):

  • BM-data/thesauri:
    • bibliography: 8425
    • biography: 176449
    • dimensionunits: 2
    • flatauthorities: 1467
    • thesaurusandplace: 182333
    • inline: 26917.
      Note: when N objects use the same term, it's repeated N times in the same named graph.
  • RS and RKD thesauri*.ttl: 27157
  • TOTAL: 422750
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.