À propos de cet article


Figure 1

RDFAdaptor framework.
RDFAdaptor framework.

Figure 2

Front-end interface screenshot.
Front-end interface screenshot.

Figure 3

Workflow of RDF data generation with RDFZier.
Workflow of RDF data generation with RDFZier.

Figure 4

Configuration template of RDFTranslatorAndLoader.
Configuration template of RDFTranslatorAndLoader.

Figure 5

Configuration template of SPARQLIn and SPARQLUpdate.
Configuration template of SPARQLIn and SPARQLUpdate.

Figure 6

Dum p All AGROVOC RDF Triples from SPARQL Endpoint to Local Files.
Dum p All AGROVOC RDF Triples from SPARQL Endpoint to Local Files.

Parameters defined in RDFTranslatorAndLoader.

Parameter Description
Input Source RDF tiples to be converted or loaded
Source Type data source, such as local file system, Remote URL or string stream
Source RDF Format format of the input RDF data, fully supporting the common RDF formats
Large Input Triples a selector for input data scale large or not, if the input is large, then the output step can not count, merge or split the triples
Advance BaseIRI resolve against a Base IRI if RDF data contains relative IRIs
BNode a selector for preserving BNode IDs
Verify URI syntax a selector for URI syntax/relative URIs/language tags/datatypes check
Verify relative URIs which returns fail log when corresponding errors occur
Verify language tags
Verify datatypes
Language tags a selector for language tags / datatype, including fail parsing if
Datatype languages / datatypes are not recognised and normalizing recognised language tags / datatypes values
Output Target RDF Format RDF format of the converted output
Commit or Split Size number of RDF triples for the output to each RDF files or submit to stores every batch, the default value is 0, which means all the input data would be processed at one time
Local File Setting options of file system storage, including three selectors for “Save to File System”, “Keep Source FileName” and “Merge to Single File (take precedence over “Commit or Split Size”)”, File name and location
TripleStore Setting options of RDF store, including a selector for “Save to Store”, Triple Store, Server URL, Database/RepositoryID/NameSpace (identifier of database for different triple store), UserName, Password, and Graph URI.
Stream setting option of String Stream for further data transferring, including a selector for “Save to Stream”, and Result Field

Parameters defined in SparqlUpdate.

Parameter Description
SPARQL Setting Query Endpoint Url From Field? checkbox, if checked means the Url of the SPARQL Query Endpoint would be coming from Kettle's previous steps and the value could get from the “Query Endpoint Url Field”
Query Endpoint Url Field only used by giving a list of drop-down options of input fields when the option “Query Endpoint Url From Field” is selected
Query Endpoint Url The value of the Query Endpoint Url would be used when “Query Endpoint Url From Field” is unchecked
Update Endpoint Url From Field? checkbox, if checked means the Url of the SPARQL Update Endpoint would be coming from Kettle's previous steps and the value could get from the “Update Endpoint Url Field
Update Endpoint Url Field only used by giving a list of drop-down options of input fields when the option “Update Endpoint Url From Field” is selected
Update Endpoint Url The value of the Update Endpoint Url would be used when “Update Endpoint Url From Field” is unchecked
Query From Field? checkbox, if checked means the SPARQL Update Query would be coming from Kettle's previous steps and the value could get the “Query Field Name”
Query Field Name only used when the option “Query From Field” is selected
Base URI resolve against a Base IRI if RDF data contains relative IRIs
SPARQL Update Query JavaScript programming for graph update which is only used when the option “Query From Field” is disable
Output Setting Result Field Name field specified for file saving
Http Auth HTTP UserID user ID of SPARQL endpoint if any
HTTP Password password of SPARQL endpoint if UserID exists

RDF data generation/translation and loading.

Data Source Data Format Number of Records Number of mapped fields Number of RDF generated Total Time-consuming
MongDB json 1,948,268 17 37,038,563 32min18s
SqlServer RDB 336,831 5 1,159,687 38.6s
798,389 9 7,521,876 5min4s

Parameters defined in RDFizer.

Parameter Description
Namespace Prefix collections of names identified by URI references
Namespace different prefixes depending on the required namespaces
Mapping Setting Subject URI HTTPURI template for the Subject/Resource, a placeholder {sid} would be used and replaced by UniqueKey
Class Types the classes to which the resource belongs, supporting multi-class types(split by semicolon), such as skos:Concepts; foaf:Person
UniqueKey the unique and stable primary key of resource, part of the Subject URI
Fields Mapping Parameters a list of field map from selected data source to target RDF schema, including the input Stream Field, Predicates, Object URIs, Multi-Values Sepator, Data Type, Lang Tag
Dataset Metadata Meta Subject URI URI pattern of generated dataset
Meta Class Types the classes to which the resource belongs
Parameters a list of descriptions of generated dataset, including PropertyType, Predicates, Object Values, DataType, Lang Tag
Output Setting File system setting option for file system storage, including Filename and RDF format
RDF store setting option for RDF store, including triple store name, server URL, Repository ID, Username (if any), Password, Graph URI

Parameters defined in SparqlIn.

Parameter Description
SPARQL Setting Accept URL from field checkbox, if checked means the Url of the SPARQL Endpoint would be coming from Kettle's previous steps and the value could get from the “URL field name”
URL field name only used by giving a list of drop-down options of input fields when the option “Accept URL from field” is selected
SPARQL Endpoint URL endpoint Url queried when “Query Endpoint Url From Field” is disabled
Query Type query type which provides two options: Graph query or Tuple query
Limit limitation on data size to be processed if necessary
Offset the starting position of data processing
Output Setting Result Field Name field specified for file saving
RDF Format target local data format, either JSON, XML, CSV or TSV for SELECT query, RDF format only for CONSTRUCT query
Max Rows definition of the maximum size of the output file, empty of 0 means get all the triples
Http Auth HTTP UserID user ID of SPARQL endpoint if any
HTTP Password password of SPARQL endpoint if UserID exists