View Source

{excerpt}Measure the speed of the different steps in [Repository Creation] application{excerpt}

h2. Old
The 1st dataset is with 6k objects, the 2nd is with 44k objects (lots of sameAs though, so actually half of it show as Museum objects in the UI).
The 2 test runs were measured on different machines (Mitac's laptop vs. Cr4 server), so a new clean nightly run on Cr4 with the small set is needed. Still there are some obvious points:
* thesauri index takes about a minute, it has ~250k objects
* main index has 6k (or 44k) objects and takes 0.5h (2.5h) because of the larger molecules and because each molecule should be navigated
* The speed of Lucene indexing is
|| item || size || time || speed ||
| thesauri | 250k | 1min | 250k/min |
| 6k objects | 6k | 30min | 200/min |
| 44k objects | 6k | 30min | 300/min |
* The speed of adding objects is: 700/min vs. 1300/min for the larger dataset; this also includes the overhead of parsing

h2. New Mapping
- objects: the 8k repo has sameAs, so the objects are tripled. E.g. Lucene indexes 24k objects, not 8k
- statement expansion explicit:total has grown from 5.5-6.5 to 8x, need to investigate this
- the 115k repo uses the new objects, but old thesauri/images files
- FTS indexing is quite fast. But FTS size is still too large

h2. Full Set
See [BM Data Volumetrics#Full Set]
- Storage location was on a RAM drive. Took 55G out of 64G. Using a RAM drive for repo load is times faster according to previous experiments on other servers
- storage size: 50+GB
- adding BM objects: start speed 132 obj/s, end speed: 26 obj/s. Approx ~20h total time
- ~2M BM bjects according to nuxeo ID file (not 1.5M as we said before)
- (?) lots of DBPedia thesauri items, w/o label; don't know where those came from
- ~407,000 thesauri items, indexed in 3min
- nuxeo ids added in 2600s (<1h)
- failed w/ Exception during Rembrandt paintings