Old mapping
renamed BM-data-full-old.rar
- Includes thesauri: biographical, bibliographical, thesaurusAndplace
- Test with one file (PrintsAndDrawings_133.rdf): 23576123 bytes RDF, 37302191 bytes NT (1.5822x more)
perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}"|sort|uniq
- this file is 1.13156 x bigger than average
- museum objects: 2440 (estimated total: 371.9k Prints and Drawings objects)
- triples: 306386
- thesarus nodes: 2186: person-institution, department, thesauri, unit, dimension, event, exhibition, bibliography, series
- unique literals: 18612
- object-related nodes: 20922
- blank nodes: 15066
- ratios
- 76.95 bytes RDF per triple
- 125.57 triples per object
- about 25 nodes per object
- Totals
- 14251190272 RDF/XML (14Gb unzipped, 338Mb rar)
- 684 files: 20835073.5 bytes per file
- 185.2 M triples
- 36.87 M nodes
- 1.475 M objects
New mapping
http://dl.dropbox.com/u/57052428/P%26D.rar (renamed to BM-data-PrintsAndDrawings.rar), only Prints and Drawings objects
riot --validate "P&D_133.rdf" riot "P&D_133.rdf" > "P&D_133.nt" perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}" "P&D_133.nt" | sort | uniq > "P&D-subjects.txt" perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$2\n}" "P&D_133.nt" | sort | uniq > "P&D-props.txt"
- counts
- size: 25.7Mb, just about average
- museum objects: 2318 (estimated total: 387.1k Prints and Drawings objects)
- triples: 251404
- unique literals: 22468
- object-related nodes: 36730+787 (codex/object + title)
- blank nodes: 11897
- averages
- 25723*1024/2514040 = 104.77 bytes RDF per triple
- 108.45 triples per object
- Data growth
- 167/153 files = 1.092
- 4278/3594 Gb = 1.190
- 387.1/371.9 k Prints and Drawings objects = 1.04
- new estimated total: 1.534 M objects
Newest mapping
Count statements and unique entities (nodes & literals). Works for ttl and trig files:
ls -1 *.t* | xargs -n 1 riot.bat >> 0statements.nt wc -l 0statements.nt perl -ne "/^(.*?) (.*?) (.*) \.$/; print qq{$1\n$3\n}" 0statements.nt | sort | uniq > 0subjects.txt wc -l 0subjects.txt
- 8000 objects (1k per department: AES AOA ASIA CM GR ME PD PE)
- 710184 triples
- 194207 unique
- 89 triples per object
Full Set
- zipped 0.9Gb, unzipped 24Gb. 5929 files
- about 2M objects
Thesauri (occurrences of skos:Concept):
- BM-data/thesauri:
- bibliography: 8425
- biography: 176449
- dimensionunits: 2
- flatauthorities: 1467
- thesaurusandplace: 182333
- inline: 26917.
Note: when N objects use the same term, it's repeated N times in the same named graph.
- RS and RKD thesauri*.ttl: 27157
- TOTAL: 422750
Labels:
None