View Source

{excerpt}Measure the speed of the different steps in [Repository Creation] application{excerpt}
{toc}

h2. Old
[^rs-kbgen-times.xlsx].
The 1st dataset is with 6k objects, the 2nd is with 44k objects (lots of sameAs though, so actually half of it show as Museum objects in the UI).
The 2 test runs were measured on different machines (Mitac's laptop vs. Cr4 server), so a new clean nightly run on Cr4 with the small set is needed. Still there are some obvious points:
* thesauri index takes about a minute, it has ~250k objects
* main index has 6k (or 44k) objects and takes 0.5h (2.5h) because of the larger molecules and because each molecule should be navigated
* The speed of Lucene indexing is
|| item || size || time || speed ||
| thesauri | 250k | 1min | 250k/min |
| 6k objects | 6k | 30min | 200/min |
| 44k objects | 6k | 30min | 300/min |
* The speed of adding objects is: 700/min vs. 1300/min for the larger dataset; this also includes the overhead of parsing

h2. New Mapping
[^BM-loading-time.xls]
{viewxls:BM-loading-time.xls}
- objects: the 8k repo has sameAs, so the objects are tripled. E.g. Lucene indexes 24k objects, not 8k
- statement expansion explicit:total has grown from 5.5-6.5 to 8x, need to investigate this
- the 115k repo uses the new objects, but old thesauri/images files
- FTS indexing is quite fast. But FTS size is still too large

h2. Full Set
See [BM Data Volumetrics#Full Set]
- Storage location was on a RAM drive. Took 55G out of 64G. Using a RAM drive for repo load is times faster according to previous experiments on other servers
- storage size: 50+GB
- adding BM objects: start speed 132 obj/s, end speed: 26 obj/s. Approx ~20h total time
- ~2M BM bjects according to nuxeo ID file (not 1.5M as we said before)
- (?) lots of DBPedia thesauri items, w/o label; don't know where those came from
- ~407,000 thesauri items, indexed in 3min
- nuxeo ids added in 2600s (<1h)
- failed w/ Exception during Rembrandt paintings