Counting and analysis of repository content
Counting
- Total statements
select (count(*) as ?c) {?s ?p ?o}
- statements per property.
The max limit=200, so we get them in two portions:select ?p (count(*) as ?c) {?s ?p ?o} group by ?p order by ?p select ?p (count(*) as ?c) {?s ?p ?o} group by ?p order by ?p offset 200
- class instances (one instance has many rdf:type!)
select ?t (count(*) as ?c) {?s rdf:type ?t} group by ?t order by ?t
Analysis
We provide historic data, but focus on the latest data (BM-triples.xls of 2012-12).
Properties
Wihout sameAs expansion: 89995389 (2.9M=3.1% less triples)
- rdf:type=58426160 is 62.9% of all triples (see breakdown below)
- Object (business) & thesauri triples are 26.0+4.9=30.9%, of which we can assume objects are 21% and thesauri 10%.
- FRs=5751214 are 6.2% of all triples, or 29% of business triples
- bmo:PX_physical_description=25584 ~ rso:FC70_Thing=23993 is 3x more than the 8k objects!? Due to owl:sameAs
- owl:sameAs=72010 is 9x more than the 8k objects.
Each object has 3 sameAs URIs (a,b,c), which causes 9 statements: aa bb cc ab bc ca ba cb ac
That's what an equivalence relation will do to you. - skos:inScheme=357283 ~ skos:Concept=357318 is the total number of thesaurus terms
- skos:exactMatch=4495 come from RKD. E.g. rkd-plaats:renaix and rkd-plaats:renaix give 4 triples (2 symmetric, 2 reflexive)
Types
- _:nodeXX=23528903: 40.3% useless OWL DL restriction types
crm:En_Whatever rdf:type [owl:Restriction...]
We could eliminate these (24% of all triples) by:
- Delete such statements after loading the ontologies and before loading the data
delete where {?e rdfs:subClassOf ?t. ?t a owl:Restriction}
- Write a perl script to cut down ECRM to RDFS+inverse (what Doerr wanted) + transitive
- Delete such statements after loading the ontologies and before loading the data
- CRM classes=30864964: 52.8%: this is broken down into a decreasing number down the class hierarchy (ok):
owl:Thing=3627096 ~ crm:E1_CRM_Entity=3626903
crm:E77_Persistent_Item=3092726
crm:E2_Temporal_Entity=240162
Statements and MB
Objects, k | 0 | 8,000 | 115,000 | 1,500,000 |
Note | thes.only | w.sameAs | current | estimated |
Explicit statements | 4,444,431 | 14,697,760 | 24,749,779 | 269,296,796 |
Total statements | 21,992,273 | 99,139,808 | 179,458,815 | 2,075,903,690 |
Expansion ratio | 4.95 | 6.75 | 7.25 | 7.71 |
calculated | calculated | |||
Explicit per object | 1,282 | 177 | ||
Total per object | 9,643 | 1,369 | ||
Expansion ratio | 7.52 | 7.75 | ||
actual | estimated | |||
FTS size, MB | 18 | 276 | 3,600 | |
Storage, MB | 7,000 | 80,973 | ||
Labels:
None