…and how we are tackling them
In EW-Shopp we are developing a cross-domain data integration platform which aims to support businesses and business ecosystems to leverage on the huge amounts of data available today in order to build relevant custom insights and business knowledge around weather- and events-based data.
Today, we know that in Big Data applications the ‘Big’ is just partially related to quantities (i.e. volumes of data): what’s much more relevant is quality. In EW-Shopp we are dealing with multi-form, multi-source data types, proprietary enterprise and customer data, market data and intelligence, open data and external data as well as multilingual content. Because of the variety and richness of this business-technical ecosystem, we also need to manage diverse data paradigms such as traditional tabular documents, Relational and NoSQL databases as well as various data formats.
Integrating multiple data sources and formats, and therefore harmonising and reconciling these huge amounts of data from a technical point of view, is an important result which we strive to obtain, and we are integrating the best tools and techniques in the domain. But that is it is only one side of the full data integration coin. In fact, in EW-Shopp we really want enable businesses to gather meaningful weather- and events-based insights about their customers, markets and operations providing useful and usable services.
To this end we are developing, and testing out, a data integration methodology which helps us incorporate both technical and business-analytical aspects in an effective workflow. We are applying this methodology in each of our Pilots which are testing EW-Shopp with real-word business cases and data.
The methodology foresees a workflow with four well-identifiable building blocks, or stages, each of which can also be managed by four different actors (people, companies, etc.), allowing for a degree of flexibility and modularity.
The stages are:
The figure below summarises a typical EW-Shopp integration scenario which comprises the four stages.
Documentation is often the most overlooked stage, while it is crucial in complex integration scenarios. We probably all went through the experience of opening a bunch of (database) tables and having no idea what certain columns represented or why. And while some simple databases or datasets may well be ‘self-documented’, in the case of heterogeneous (and even multilingual) data this is not usually the case. It is also important to keep documentation updated – including documentation on the following stages.
The Ingestion stage, as the name suggests, is where all of the the data is brought together, cleaned, converted if needed, and harmonised as much as possible according to the aimed analytical stage which will follow.
Enrichment is really the ‘heart’ of the process where all of the ingested data is integrated and (as the term wants to emphasised) provided with its added value. In this stage data which apparently are distant and unrelated come together, ready to be fully used in the next stage. An important aspect of this stage is that it is heavily informed by the business and analytics experts who ultimately shape the insights they want to get.
Analytics is where the actual business analytics take place, providing all of the expected features such as advanced data visualization, smart intelligence and performance evaluation, based on the business case of each Pilot.
In EW-Shopp one of the tools used to support the process, and in particular the latter stages, is the Knowage Open Source Business Analytics suite developed by the Engineering Group (EW-Shopp consortium partner).
Data integration and the related methodology described is only one of the scenarios and building blocks we are developing and continuously improving in EW-Shopp. If you are interested to know more and keep up to date with the evolution of the project follow us here on LinkedIn and on all other online channels.
You can also get in touch with the EW-Shopp team at firstname.lastname@example.org