Open Data in electronics industry
Auteur
John Walker (NXP Semiconductors)
In the electronics industry, the ability to get accurate, timely product data in front of the customer is a very important factor in the overall business process. Furthermore, enabling the customer to easily compare and select the right product for their application from the choice of literally hundreds, or even thousands, of candidates can reduce the overall time and costs involved in the purchasing process.
Typically this product data has it’s source at the manufacturer where it is stored in multiple systems in a variety of structured and unstructured formats. Often the data is duplicated in multiple places via manual processes leading to additional work and huge inconsistencies. Eventually the product data is published in formats like PDF and HTML.
Typically the data is then scraped or manually captured by data aggregation companies who align the data from different manufacturers, then sell this information on to distributors who use the data on their own websites, paper catalogues, etc. Often the distributors also do additional data capture to supplement the purchased product data.
Our goal is to simplify the overall process to reduce the time, effort and complexity required to manage, publish and use the product data, thereby reducing the costs of doing business and allowing manufacturers to get the latest information to the customer more quickly.
The approach is to provide a single, trusted source of product and product-related data in semantically rich formats that can be used to communicate the data and generate the multiple publication deliverables. Opening up access to the data is a key component, whether this is to free the data from existing silos for use within the organization, or making the data available to third parties. Also to facilitate the aggregation of data from multiple parties, it is very important to agree on a common schema that can be used to describe the products and enable easy mapping between schemata.
A key part of the approach is to use a Component Data Dictionary based on the ISO 13584 data model and IEC 61360 standard. This dictionary is basically an ontology and provides a set of classes and properties that can be used to describe instances of electrical/electronic components. The dictionary then acts as a schema that can be used to validate, but crucially also describes and defines the meaning. This highly structured data can then be used to generate publications such as PDF data sheets, web pages, selection tables and mobile apps. For less structured natural language content we use the DITA XML standard from OASIS.
For the past years we have been mainly using XML-based technologies (XSLT, XPath, XQuery, XSL-FO) as a way to store and publish the data. We have had a great deal of success in our approach, but have realized that using XML is not always the most ideal way to represent the data as the model is essentially a graph. Also working with proprietary XML schema is a barrier to the access and understanding of the data by third parties. As such we have begun experimenting with RDF and Linked Data.
The initial problem space we tackled has been the integration and publication of disparate data sets to enable a BW/BI solution to make sense of and connect data from several digital marketing systems to drive customer insights. The challenge being faced in the BW/BI solution was that several data sets had been supplied that were effectively disconnected. Without any way to relate the data sources there was no way to connect, for example, information about a customers product interests with information about which order lines they had placed sample orders for. Our approach was to publish the connecting data as RDF Linked Data with URIs defined for the various resources of interest and include the various identifiers used by other systems as literal values to enable the BW/BI solution to reconcile the various data sources and create business critical dashboard reports.
The RDF is generated from XML and CSV sources on a scheduled basis. For the XML sources we already store the files in an XML database and use XQuery to generate an RDF dump file per class of resource. For CSV sources we transform the data to RDF using XSLT. The RDF data is regenerated and loaded each hour to an externally hosted RDF store from which we expose a SPARQL 1.1 endpoint. We also manage stored SELECT queries which are exposed as REST services from which internal and external consumers can pull simple tabular data in XML, JSON and CSV/TSV format. Also we have configured a front end application which makes the URIs dereferenceable and supports content negotiation including a vanilla HTML representation.
So far, most of our success has been within the enterprise, but now we would like to put more focus on the broader ecosystem with data flowing in both directions. Basically how can the parties involve provide, and make use of, more open access to the data? As we are beginning to use Linked Data, we can make use of the basic principles to allow the data to be accessed over the web. However, this raises a number of interesting questions:
De activiteiten van Platform Linked Data Nederland (PLDN) worden mede mogelijk gemaakt dankzij het Kadaster, TNO, Big Data Value Center (BDVC), ECP, Forum Standaardisatie, Kennisnet, SLO, Waternet, Taxonic, MarkLogic, Triply, Franz Inc., SemmTech, Rijksdienst voor het Cultureel Erfgoed (RCE), Beeld en Geluid, EuroSDR, de KVK en ArchiXL
Wilt u op de hoogte gehouden worden van nieuws en ontwikkelingen binnen PLDN?
Schrijf u dan in voor de nieuwsbrief