Skip to content

ImporterCodeSpecEng

Victor Agroskin edited this page Mar 10, 2014 · 7 revisions

0. Using an example

Download dot15926 Editor 1.5alpha version developed to support data import extensions.

Example extension to access Google Directions API (https://developers.google.com/maps/documentation/directions/?hl=en) is published in this repository. Download the code, copy extension folder GoogleExt to extensions, pattern file patterns_external.py - to patterns folder, start the Editor, enable an extension, create a new data source, open it and execute:

show(id = builder.collect(type=patterns.GoogleRoute.base, name_loc1= 'Лескова ул., 7, Moscow, Russia 127349', name_loc2= 'ulitsa Kedrova, 8к1, Москва, город Москва, Russia 117292'))

or

show(id = builder.collect(type=patterns.GoogleRoute.base, name_loc1= 'Малая Никитская 2/1, Москва', name_loc2= 'Novo-Peredelkino, Корпус 1, улица Лукинская, 1, Moscow, Russia, 119634'))

See extension and pattern code for illustrations to specs published below.

1. Search for external data

Search for external data is initiated from console. To access external data and create RDF data set in triple-store following some pattern use:

builder.collect(type=patterns.<PatternName>.<PatternOptionName>, <role1> = ‘value1’, <role2> = ’value2’, …)

The call for builder.collect returns newly created elements only (with consecutive imports some elements are created just once to avoid duplication).

2. Search execution

The external search is done by Python functions called via pattern name. To facilitate such search functions are registered with public decorator, for example:

@public('collectors.<PatternName>')

def func(roles_in, pattern_name):

<here should be the code to access external data base, to parse the replies and generate unique IDs>

return [roles_out1, roles_out2…]

It is possible also to register function with decorator @public('collectors’), such function will be called at each attempt to import data with any pattern. In this case the pattern name is given to the function as pattern_name parameter.

Parameter roles_in is a dictionary with search conditions for roles which are searchable in the external database. Currently only strings are supported as condition values.

The function returns the list of dictionaries. Each dictionary contains 'role':'value' pairs, where role is a role of a pattern and value is imported from the external data base or is a generated unique ID.

Dictionaries can be returned consecutively via yield.

If roles received through roles_in dictionary are not changed in the function - they'll be given back to pattern builder as they are.

3. RDF data creation

Pattern builder gets reply from the external search and writes RDF following the pattern using rules for pattern creation documented in [Volume 4 of the documentation] (http://techinvestlab.ru/files/V4/dot15926Editor14_Vol4_PatternsAndMapping.pdf). Unique IDs (if received instead of full URIs) are used as fragment identifiers to form URIs in a namespace identified as namespace for new entities in the receiving data source.

Pattern builder checks whether the content received for a first part of each pattern element description is unique. If data received for this part is identical to the data already contained in the data source (imported earlier) - the builder will not create a copy of the existing element and will use existing URI instead.

Therefore the data for such element received for any consecutive parts of the pattern describing the same element will overwrite already existing data (if any).

Pattern builder processed partially returned role dictionaries as described in the main documentation, attempting to create as many pattern elements as possible.

Clone this wiki locally