Skip to content
Konstantinos edited this page May 1, 2015 · 79 revisions

Welcome to the Metadata_Analytics wiki!


What is it?

Metadata_Analytics is a java based statistical analyzer for metadata repositories/federations.


Requirements

The latest jdk should be installed from here

Download the most current implementation from here


Features

Metadata_Analytics can be used for statistical analysis over XML records. It can analyze records residing either on filesystem or through oai-pmh target.It is schema indepedent, meaning that it can analyze all kinds of metadata in XML format.Furthermore, you can filter the XML records before analyzing by using XPath expressions.

The analysis results are stored in CSV formatted files residing in a folder located at the path where the metadata analytics jar is located named Analysis_Results .

For filesystem located XML records analysis there is a specific folder structure followed:

no image

It supports statistical analysis over:

Repository:

  • All/Selected Record elements.
  • Record element attributes.
  • Specific element vocabulary analysis.
  • General repository analysis.

Federation:

  • All/Selected Record elements.
  • Record element attributes.
  • Specific element vocabulary analysis.
  • General federation analysis.

Statistical metrics calculated:

  • For element based analysis:
    1. Frequency.
    2. Completeness.
    3. Dimensions.
    4. Importance
    5. Entropy.
  • For attribute based analysis:
    1. Frequency.
  • For specific element vocabulary analysis:
    1. Frequency.
  • For general statistical analysis(repository,federation):
    1. Number of records analyzed.
    2. Average file size.
    3. Informativess.
    4. Approximate Storage requirements(bytes).
    5. Schema

Configuration

The configure.properties file contains all the configuration paremeters of the Metadata_Analytics.

The properties contained are the following:

Input parameters:

For filesystem based analysis:

analytics.input.data

This parameter is used to define the class that handles the various inputs.The possible values are:

  • analytics.input.FSInput(for FS based analysis)
  • analytics.input.OAITargetInput(for OAI-PMH based analysis)

analytics.mdstore.path

This parameter is used only for filesystem input and it defines the path where the metadata are located

Tip : for Windows use \\ instead of \

repositories.analyze=*

This parameter can be used for analyzing specific repositories contained in a federation folder. For analyzing all the contained repositories just the * charecter.

example: repositories.analyze=REPOSITORY1,REPOSITORY2

analytics.xmlHandler.input.class

Possible values are the following:

  • xmlHandling.FS2XMLInput(for FS based analysis)
  • xmlHandling.OAI2XMLInput(for OAI-PMH based analysis)

analytics.initializer.class

Possible values are the following:

  • initializers.FSInitializer(for FS based analysis)
  • initializers.OAIInitializer(for OAI-PMH based analysis)

analytics.repositories.list

This property is used only for OAI-PMH based analysis, and it contains the URL/s of the repositories to be analyzed.

example: analytics.repositories.list=http://vegas.univ-tlse3.fr:8080/oaitarget/OAIHandler, http://www.rural-observatory.eu/RIHandler

analytics.repositories.metadataFormat

This property is used only for OAI-PMH based analysis.If there is only one value as input, then this value will be used as a prefix for all the repositories defined in the analytics.repositories.list property.If the input number is equal to the number of repositories defined on the analytics.repositories.list then the first repo is coupled with the first prefix value etc.

example: analytics.repositories.metadataFormat=lom,dc

analytics.storage

This propertu is used for defining the class that will handler the results output.Currently the only option supported is:

  • analytics.storage.store2csv

analytics.mdstore.data.handler

This property is used for defining the way that the metadata will be parsed. Currently the only option supported is for XML parsing the following:

  • analytics.analyzer.handlers.XMLHandler

analytics.element.values

This property is used for defining the elements for which you want specific analysis.The values should be expressed on XPath separated with comma.If an all elements analysis is needed use the * instead.

example: analytics.element.values=lom.educational.context.value,lom.educational.intendedenduserrole.value

*No namespace declaration is needed here although it might be used in the respective records.

vocabulary.element.values

This property is used for defining the elements for which you want vocabulary analysis.The values should be expressed on XPath separated with comma. *No namespace declaration is needed here although it might be used in the respective records.

example:

vocabulary.element.values =lom.educational.interactivityType.value

repositories.federated.analysis=false

This property is used to toggle on/off federation based analysis.The possible values are:

  • true
  • false

analytics.filtering=true

Toggle filtering on or off.

analytics.filtering.xpath.expression= xpath expression

Input the XPath expression for the filtering mechanism if enabled.

example:

vocabulary.element.values =/lom/classification/purpose/value[text()='discipline']

Info: you dont need to define namespace here.

Configuration samples for:


Run it

Depending on the operating system of your machine open cmd (for Windows) a bash command program(for linux),change to directory where Metadata_Analytics.jar is located and enter the following:

java -jar Metadata_Analytics.jar

Tip: For large numbers of XML files on the command above you should add the argument Xmx and define the maximum size of heap size that should be used by the java virtual machine like this:

java -Xmx4096m -jar Metadata_Analytics.jar


Results

Some results samples for:

References

Globe Metadata Analysis

Clone this wiki locally