-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the Metadata_Analytics wiki!
Metadata_Analytics is a java based statistical analyzer for metadata repositories/federations.
The latest jdk should be installed from here
Download the most current implementation from here
Metadata_Analytics can be used for statistical analysis over XML records. It can analyze records residing either on filesystem or through oai-pmh target.It is schema indepedent, meaning that it can analyze all kinds of metadata in XML format.Furthermore, you can filter the XML records before analyzing by using XPath expressions.
The analysis results are stored in CSV formatted files residing in a folder located at the path where the metadata analytics jar is located named Analysis_Results .
For filesystem located XML records analysis there is a specific folder structure followed:
It supports statistical analysis over:
Repository:
- All/Selected Record elements.
- Record element attributes.
- Specific element vocabulary analysis.
- General repository analysis.
Federation:
- All/Selected Record elements.
- Record element attributes.
- Specific element vocabulary analysis.
- General federation analysis.
Statistical metrics calculated:
- For element based analysis:
- Frequency.
- Completeness.
- Dimensions.
- Importance
- Entropy.
- For attribute based analysis:
- Frequency.
- For specific element vocabulary analysis:
- Frequency.
- For general statistical analysis(repository,federation):
- Number of records analyzed.
- Average file size.
- Informativess.
- Approximate Storage requirements(bytes).
- Schema
The configure.properties file contains all the configuration paremeters of the Metadata_Analytics.
The properties contained are the following:
Input parameters:
For filesystem based analysis:
analytics.input.data
This parameter is used to define the class that handles the various inputs.The possible values are:
- analytics.input.FSInput(for FS based analysis)
- analytics.input.OAITargetInput(for OAI-PMH based analysis)
analytics.mdstore.path
This parameter is used only for filesystem input and it defines the path where the metadata are located
Tip : for Windows use \\ instead of \
repositories.analyze=*
This parameter can be used for analyzing specific repositories contained in a federation folder. For analyzing all the contained repositories just the * charecter.
example: repositories.analyze=REPOSITORY1,REPOSITORY2
analytics.xmlHandler.input.class
Possible values are the following:
- xmlHandling.FS2XMLInput(for FS based analysis)
- xmlHandling.OAI2XMLInput(for OAI-PMH based analysis)
analytics.initializer.class
Possible values are the following:
- initializers.FSInitializer(for FS based analysis)
- initializers.OAIInitializer(for OAI-PMH based analysis)
analytics.repositories.list
This property is used only for OAI-PMH based analysis, and it contains the URL/s of the repositories to be analyzed.
example: analytics.repositories.list=http://vegas.univ-tlse3.fr:8080/oaitarget/OAIHandler, http://www.rural-observatory.eu/RIHandler
analytics.repositories.metadataFormat
This property is used only for OAI-PMH based analysis.If there is only one value as input, then this value will be used as a prefix for all the repositories defined in the analytics.repositories.list property.If the input number is equal to the number of repositories defined on the analytics.repositories.list then the first repo is coupled with the first prefix value etc.
example: analytics.repositories.metadataFormat=lom,dc
analytics.storage
This propertu is used for defining the class that will handler the results output.Currently the only option supported is:
- analytics.storage.store2csv
analytics.mdstore.data.handler
This property is used for defining the way that the metadata will be parsed. Currently the only option supported is for XML parsing the following:
- analytics.analyzer.handlers.XMLHandler
analytics.element.values
This property is used for defining the elements for which you want specific analysis.The values should be expressed on XPath separated with comma.If an all elements analysis is needed use the * instead.
example: analytics.element.values=lom.educational.context.value,lom.educational.intendedenduserrole.value
*No namespace declaration is needed here although it might be used in the respective records.
vocabulary.element.values
This property is used for defining the elements for which you want vocabulary analysis.The values should be expressed on XPath separated with comma. *No namespace declaration is needed here although it might be used in the respective records.
example:
vocabulary.element.values =lom.educational.interactivityType.value
repositories.federated.analysis=false
This property is used to toggle on/off federation based analysis.The possible values are:
- true
- false
analytics.filtering=true
Toggle filtering on or off.
analytics.filtering.xpath.expression= xpath expression
Input the XPath expression for the filtering mechanism if enabled.
example:
vocabulary.element.values =/lom/classification/purpose/value[text()='discipline']
Info: you dont need to define namespace here.
Configuration samples for:
Depending on the operating system of your machine open cmd (for Windows) a bash command program(for linux),change to directory where Metadata_Analytics.jar is located and enter the following:
java -jar Metadata_Analytics.jar
Tip: For large numbers of XML files on the command above you should add the argument Xmx and define the maximum size of heap size that should be used by the java virtual machine like this:
java -Xmx4096m -jar Metadata_Analytics.jar
Some results samples for:
-
Repository based analysis:
-
Federation based analysis: