View Source


h1. Hardware Server Spec

Current proposal:
- Dell PowerEdge R720
1x [Intel Xeon E5-2650|] 2.00GHz, 20M Cache, 8 Cores
128GB RAM (8x 16GB 1600 MHz)
VFlash, 8GB SD Card for iDRAC Enterprise
3x 300GB SAS 6Gbps, 2.5-in, 15K RPM Hard Drive
4 x 256G SSD

h2. Server Component Details
1 Intel Xeon E5-2650 2.00GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C, 95W, DDR3-1600MHz
3 300GB, SAS 6Gbps, 2.5-in, 15K RPM Hard Drive (Hot-plug)
4 200GB, SSD SATA Value MLC 3G, 2.5-in Hard Drive (Hot-plug) - Limited Warranty Only
8 16GB RDIMM, 1600 MHz, Standard Volt, Dual Rank, x4

1 iDRAC7 Enterprise
1 VFlash, 8GB SD Card for iDRAC Enterprise
2 2M Rack Power Cord C13/C14 12A
2 Heat Sink for PowerEdge R720 and R720xd
1 16X DVD-ROM Drive SATA
1 2.5" Chassis with up to 16 Hard Drives
1 Bezel
1 Broadcom 5720 QP 1Gb Network Daughter Card
1 Connectivity for CFI Use Only, Select Raid Required from AutoRaid Selection
1 DIMM Blanks for Systems with 2 Processors
1 Dual, Hot-plug, Redundant Power Supply (1+1), 750W
1 PERC H710p Integrated RAID Controller, 1GB NV Cache
1 ReadyRails Sliding Rails With Cable Management Arm
1 Risers with up to 6, x8 PCIe Slots + 1, x16 PCIe Slot

h1. Software Server Considerations

h2. RS Software Servers

RS needs to run the following servers (some more details in sibling pages).
(!) TODO Jana: complete this
- OWLIM SE 5.3: semantic repository
- ???: Web Application Server
- Nuxeo v.???: Document Management (we're gradually removing dependencies)
- PostgreSQL 9.1: database used by Nuxeo
- IIPImage server 0.9.9: DeepZoom image server

h2. OS Virtualization

Discussion started at [RS 20110926 kick-off#Environments|RS 20110926 kick-off#Environments]. The current plan is to run on one physical machine (quite large server).

We could use Unix OS Containers (kernel-level virtualization) to split it into several software servers: RDF, DMS, web app server
Containers were originally developed for Linux, then ported to other Unices, eg OpenSolaris
- Pro: it's easier/more efficient than VMware or Xen virtualization, yet provides separation between the servers
- Pro: kernel-level virtualization will be much better at balancing load between the software servers, than our "random" splitting into several hardware servers. Eg if we see that the web app server needs more RAM, we just reconfigure its Container
- Pro: networking between the software servers will be very fast because it'll be virtualized (no physical network involved)
- Cons: no redundancy. However, that's not a goal of RS3 because true redundancy requires SAN, OWLIM cluster, DMS cluster, web app serve cluster, etc etc

Mitac: *disagree*, I see no added value in this kernel-level virtualization. Another option is to run all servers in the same OS.

h2. OWLIM Hardware Requirements

OWLIM will run on any modern JVM, but is optimized for certain deployment environments.
- Disk performance is very important. Most often OWLIM is deployed on solid state disks (SSD).
-- Since SSD are small/expensive, often a compressing filesystem (zFS) is used to increase the amount of data you can store on SSD. zFS is part of OpenSolaris that has been ported to Linux
-- Ontotext is currently experimenting with zFS on Ubuntu Linux.
- Our preferred OS is Ubunti Linux, but if zFS doesn't run well on it then we may go with OpenSolaris
- The amount of RAM is also important. With enough RAM (64G+) OWLIM can support hundreds of millions (<1B) statements, which could be enough until late in the project

Because of SSD and high RAM requirements, a typical production server runs to 10-15k EUR.

h2. OWLIM Licensing and CPU Cores

RS has 8 OWLIM SE licenses (8 CPU core licenses of OWLIM Standard Edition).
There's a nuance on how cores are counted, i.e. the number that should be put in the license file:
- If we deploy on a hyperthreading server (e.g. 2 Intel Xeon * 4 cores), the JVM reports double the cores (2*8=16), so the license file should have 16 licenses
- If we deploy on a non-hyperthreading server (e.g. AMD Opteron), the license file should have 8 licenses

(!) TODO Barry or Mitac: the current specs are for 2 Intel Xeon * 8 cores, which is double the number allowed by the license.
- We can limit the number of cores that OWLIM uses (eg bind OWLIM's JVM to use only one of the CPUs).
Naso: but Java has some bugs in this regard. Since we trust the client, we could just give them a bigger license.

Naso: OWLIM works a bit faster without hyperthreading.

h2. Volumetrics

This section describes the important factors that determine OWLIM performance

h3. Data Size

Most of the data in RS3 comes from:
- RKD thesauri: 0.2M explicit statements
- BM thesauri: 4.6M explicit statements
- BM image URLs: [RS-772@jira] \#4,#7
648k objects have images, 958k images (1.5 per object, 6% are shared).
1.4Gb trig, 9 statements per image, total 8.6M explicit statements.
- BM objects: sample of 8k objects: 0.71M explicit statements (see [BM Data Volumetrics] for details)

FTS indexes:
- 40Mb for thesauri
- 433Mb for objects (!) This is too much, need to investigate FTS molecule again

Current [repository|] (with 8k sample objects):
- Expected: 0.2+4.6+8.6+0.71 = 14.1M explicit statements
- Excerpt from (OLDER):
-- NumberOfStatements=53,572,421
-- NumberOfExplicitStatements=10,073,344
-- NumberOfEntities=4,414,823 (URIs and literals)
- Expansion ratio total:explicit = 5.35 (more than usual because we use complex ontologies, see next section)
- counting with Sesame Workbench: 92M statements (same result with/without "Include inferred statements")
This expansion is due to duplicate statements in empty vs named context that still need to be investigated
-- see [Repository Volumetrics] for details.

Estimated total:
- thesauri and images will be as above
- objects will grow from 8k to 1.5M: from 0.71M to 133M explicit statements
- explicit statements: 147M
- total statements: 147*5.35 = 786M
Some of the literals are pretty big\! Eg:
-- description of 20 cameos is a couple pages of text (duplicated to several Images)
-- brief biography of Rembrandt in a scope note in Person thesaurus

h3. Data Complexity

- CIDOC CRM is the main ontology and is fairly complex:
-- 86 classes, 264 properties (125 inverse pairs, 14 literal properties)
-- class inheritance: 10 levels deep, eg
-- property inheritance: 4 levels deep, eg
-- we use an OWL-DL 1.0 [axiomatization|] that adds anonymous Restriction classes.
We don't use them (no OWL reasoning), but this perhaps doubles the number of rdf:type statements
- RSO is an application ontology that adds 7 classes and 35 properties (1 level deep)
- BMO is an application ontology that adds 4 classes and 25 properties
- We also use a few aspects of external ontologies: SKOS (thesauri), OAC (annotation), BIBO (bibliography), QUDT (units)

(?) Barry, do you mean something else by "Complexity"? Eg aspects of the graph structure?

h3. Inference Complexity (Ruleset)

RS uses a custom ruleset FR-Implementation.pie that includes:
- builtin_RdfsRules-optimized.pie
- owl:TransitiveProperty (28 properties)
- owl:inverseOf (about 125 property pairs)
- 86 custom rules that infer indexing relations (Fundamental Relations) used for search (see [FR Implementation-old]). They use up to 4 variables, eg:
Id: FR0086
x <rdf:type> <rso:FC70_Thing>
x <rso:FRT_46_106_148> y
y <crm:P1_is_identified_by> z
z <crm:P3_has_note> t
x <rso:FR1_identified_by> t

- Our rules add approximately 6.2% of all triples, or 29% of business triples (see [Repository Volumetrics]).
- They don't slow down [Update-Delete Performance|Update-Delete Performance]
- What is the impact on loading i.e. [Repository Creation Performance]?
Regarding the rule complexity, a very useful thing to know would be a comparison of loading data with no ruleset and with your custom rule
Just because the rules only add few percent to the overall number of triples doesn't mean that they don't introduce significant processing overhead, e.g. if many statements match many rule parameters, but only some later premise fails. It might not be this way, but it could be.
If the difference in loading time between 'empty' rule set and your custom rule set is small then this is not a problem.

h3. Query Complexity

We're still working on performance optimization {jira:RS-938}
And have not analyzed all queries. Sample queries are at [Performance Test].

We think the most expensive queries are:
- Semantic Search: uses 1-10 of the "Fundamental Relations" described above to find objects, limits to 500 (or 1000).
Fast on 8k objects, will test on 100k soon, expected to be ok on 1.5M
- Get Display Fields of search results: need to query for about 10 fields of the objects found, each with a couple of alternatives, each 5-8 statements in the pattern. This is the slowest one, and we tried several alternatives:
-- SELECT: not good since it does cartesian product (eg 3 techniques * 5 materials * 3 production places = 45 rows per object)
-- CONSTRUCT: OWLIM experiences exponential slowdown with the number of fields (1-2 is ok, 10 is way too slow). Returns many duplicate triples
-- GROUP_CONCAT: OWLIM experiences exponential slowdown with the number of fields
-- run 10 queries (1 per display field) that restrict to 500 URIs (the objects found). Can be called "vertical sharding"
This is the current choice. Takes 4 sec for 500 objects
- Get Complete Object: drills down into the object using 72 "properties of interest", cutting off at thesaurus terms and museum collections.
Does a depth-first traversal, uses Sesame getStatement (analog of CONSTRUCT), perhaps 150 per object.
- Full Text Search: custom "focused" indexing, using a similar traversal as above.
- Thesaurus Autocomplete: uses Lucene index. Reasonably fast

TODO Mitac: It would be nice to now how long typical queries are taking on the hardware you are using now.
Mitac has done a lot of measurements related to performance, but they need to be presented in a more systematic way.

h3. Query Rate, Concurrent Clients

For planning purposes assume 10 concurrent clients.
Assume 50 queries per second.

h3. Update Rate

Assume 1 update per second.

h2. BBC Server Configuration

Here's the BBC server specification for comparison.

- 4 servers in a data centre at one location, run in clustered configuration (OWLIM EE) controlled by a ZXTM load balancer
- server1: 16 CPU cores with 16GB RAM and 300GB disk
- servers2&3: 16 CPU cores with 64GB RAM and 400GB disk

- 3 servers in a second data centre at a second location, run in a clustered configuration controlled by a ZXTM load balancer.
- server1: 16 CPU cores with 16GB RAM and 300GB disk space
- servers2&3: 16 CPU cores with 64GB RAM and 400GB disk space

Please note that BBC is not really comparable, since:
- BBC so far has used much smaller datasets than RS
- BBC has huge query rates that are not relevant for RS

h2. Typical Onto Servers

Hardware discussion at DOM internal meeting 20121123:
- SSD cost: $1.3 per Gb
- RAM cost: $10 per Gb
- BG-assembled server with Taiwanese motherboard for $10k: 256Gb RAM, 1Tb SSD disk
- A similar brand-name server (eg Dell, IBM) would cost 1.5-2x more
- SSDs are used in a Striped (non-redundant) configuration. RAID5 (redundant) configuration slows down their performance by 40% since the CPU cannot keep up with computing the RAID checksum. Therefore critical data should be kept (or synced/flushed) to hard disk
- SSDs have a limit on the number of writings. But even consumer-grade SSDs are good enough: 1 of 6 disks failed in 3 years.

h1. Recommendations

- Given 4*300GB disks using RAID 5, you will have 900GB usable space
- assuming you only need basic indexing (no context or predicate indices) then a good rule of thumb is 200 bytes per statement
-- (?) How about entity space (URIs and literals)? Some of BM's literals are quite big
-- (?) How about the Lucene index? (Note: it works best when it can be cached in memory)
-- 765M statements will require 153 GB storage space (single OWLIM instance)
- That's just enough for 1 database, 1 copy, and other stuff (OS, other servers, sample DeepZoom images)
- However, 1TB of SSDs are not that expensive nowadays. You will get much better performance with these, especially during loading, up to 10 times faster.

- 128GB RAM is a lot and allows you to cache a large part of the database. This means good parallelism for queries.
- However, it is not possible to say how fast your queries will run without knowing exactly what they are. I strongly suggest comparing with what happens now in your dev environments.

Barry Bishop: Basically, this is a pretty good machine, but SSDs would be nice.
SSDs make a tremendous difference. This is my most important comment on the whole thing. I've said it several times and Barry Norton mentioned
it a very long time ago as well.

Mitac: I think the machine is good, BUT
- I recommend we postpone the decision for buying the machine until we load 100k objects and have more reliable data on triples, storage and performance

Hi Mitac - when will this happen?