Background
- OWLIM repository is created from the files in trunk/data, which needs to be updated with the latest files
- entity-api is the project that is responsible for the repository creation
- we use two separate scripts, one to generate the repository, and another to deploy it. This is because the repository creation can fail due to various reasons
- the creation script also updates entity-api and data projects to the latest version
- the SVN projects are checked out to /nidata/researchspace/trunk on the dev server
Summary
The whole update is executed as follows (but read the next section for the details first)
- login to the dev server
- run the following commands (checking if each step went ok):
Update instructions
- Login (ssh) to the dev server, see [0 Contacts] for login details
- Go to entity-api folder
- Create the repository
- Check if it says BUILD Successful at the end
The successful output looks like this:
- Add the annotation points (takes about 2.5h; this is why we start it with "nohup")
- Check if everything went ok
- It should say "BUILD SUCCESSFUL" at the end
- If both the creation and the annotations went OK, deploy either as "susana" or as "susana.new":
- For "susana"
- For "susana.new"
- For "susana"
Note. Both deploy scripts stop and the start the tomcat where the openrdf-workbench is deployed.
Note. If you are building on a less well endowed box you may benefit from reducing the -Xmx12g JVM parameter to around -Xmx3g in the create-repo script
Important steps in BM Repository Creation
- Set "FR-Implementation" as BM Repository ruleset;
- Insert both the ontology and BM data within SystemTransaction;
- Add the ontology to the BM repository:
Loading the ontology *.ttl files(listed in String array A) from main loading directory - ../data
- Fixing QUDT units:
Executing insert Sparql query for {?u a skos:Concept; skos:inScheme unit: }
where {?u qudt:symbol ?s}
- Add main BM Thesauri
Loading thesauri *.trig files from server directory: data/BM/thesauri;
- Simplify ECRM: remove owl:Restriction:
Executing query - "delete where {?e rdfs:subClassOf ?t. ?t a owl:Restriction}",
Following "RS owl:Restriction" RS-1279-Optimize repo loading,
delete blank node types owl:Restriction (will reduce size by 24%);
Note: The DELETE doesn't work well. Therefore RS-1279 "variant 2 becomes the preferred one". And RS-1370 makes an ontology without owl:Restriction, where this DELETE step is not needed(https://confluence.ontotext.com/display/ResearchSpace/ECRM+Simplification)
- Fix thesauri labels:
It is now fixed (does not generate extra pref labels). See RS-1040.
- Create thesauri index:
Selecting by rdf:type skos:Concept to .uri files,
applying Lucene parameters and creating index via ASK queries included in "thes.lucene":
luc:moleculeSize = "2";
luc:index = "uris";
luc:includeEntities = "";
luc:includePredicates = "<http://www.w3.org/2000/01/rdf-schema#label>";
luc:languages = "en,nl,none";
luc:useRDFRank = "yes";
- Main BM data adding
Loading from /BM/data/data.zip
- Add paintings
Executing /InsertMainImageQueries.sparql inserting rdf:type rso:E38_Main_Image statements,
getting via rso:P3_has_image_file certain ".jpg" files listed
- Fix RS-1375 issues:
Delete orphan Image (asset) statements;
Currently disabled.
- Adding BM images
Loading from /BM/data/images.zip
- Create main index
Replaced with the following points:
- Total Object counting
Counting rdf:type rso:FC70_Thing objects
- Calculating Thesaurus Counts
Saving them in HashMap for further usage
- Saving Thesaurus counts to RDF Knowledge Base repository
- Creating Autocomplete Index
Using StandardAnalyzer Version.LUCENE_35
Iterating through sparql selecting thesauri skos:inScheme and optionally crm:P3_has_note, skos:scopeNote, skos:altLabel; rso:numberOfUses
Labels:
None