Streamlining and improving the EW-Shopp dataflow methodology

                                                                         To attain knowledge, add things everyday. To attain wisdom, remove things everyday


In a previous blog post we talked about the challenges in integrating multiple data sources, formats and types and some of the proposed strategies in EW-Shopp. In this post we share some updates on our data integration methodology, how we have been updating it, stramlining it and applying it to our demonstration pilot Business Cases.

A popular quote attributed to ancient Chinese philosopher Laozi goes along the lines of “To attain knowledge, add things everyday. To attain wisdom, remove things everyday.” So while our data sources, the ‘knowledge’ used for building custom insights and business knowledge, are constantly growing in quantity and quality (for example through our enrichment tools such as DataGraft and ASIA), our methodology has been streamlined and somewhat reduced in order to be more effective, usable and sharable.

Figure 1: Statue of Lao Tzu in Quanzhou – credit: Tommy Wong through Flickr – CC-BY-2.0

As mentioned in previous posts one of the challenging features of our project is the  inclusion of very diverse Business Cases. At the same time we committed to the adoption of a common methodology for evaluation and therefore to consolidate a common dataflow and apply it to all of these Business Cases.

In turn this meant designing a methodology which is flexible and open to different scenarios, with a coexist baseline logic.

The figure here below shows the current general dataflow logical model developed in EW-Shopp:

Figure 2: Example of a general EW-Shopp dataflow model

Through various iterations we have finally defined 4 key steps which are all supported by the EW-Shopp tools and involve all of the different actors. The main steps in the adopted dataflow model are:

  1. INGESTION. This step, typically involving business partners, covers the collection (ingestion) of all the relevant data needed for a Business Case. In particular, company/business data from the various business partners and stakeholders (e.g., historic data, real-time sensor data, data from ERPs, etc.), but also other data from different providers. In EW-Shopp these data include weather data provided by specific providers/databases (e.g. MARS historical data), as well as events’ data. Some preliminary data filtering/manipulation is also performed at this stage, for instance regarding anonymization in the case certain data is confidential or protected by IPR.
  2. ENRICHMENT. This is one of the core steps carried out in EW-Shopp and represents all of the activities related to data manipulation. Given the heterogeneity of the input data coming from the previous step, here the datasets are harmonised and prepared for later development of analytic models, predictions etc. Depending on the use case, the enrichment phase foresees, for instance, data filtering, reconciliation, transformation, aggregation, etc.
  3. ANALYTICS. In this phase of the dataflow, analyses, models and predictions are produced based on the data prepared in the previous steps. Several algorithms and tools are used depending on the scenario and Business Case.
  4. VISUALISATION. This step is specifically dedicated to the visualisation and navigation of the data as output from the above analyses and models by means of specific Business Intelligence reports and visualisation. This means that in this phase, the actual results are presented to final (final) users in a meaningful, understandable and effective way. The objective of this step is, therefore, to abstract data models and provide the ‘user interface’ for presentation of data. In various EW-Shopp Business cases we are using the Knowage Open Source suite  to develop interactive visualisation.

Figure 3: Visualisation example in Knowage for the Measurence Business Case

While developing and fine-tuning this methodology we focused on an effective evaluation tool for all of our Business Cases, but the adoption of a common baseline dataflow model within EW-Shopp also presents some advantages which we think might be interesting beyond the scope of the project and in particular:

  • Multi-actor by design. The EW-Shopp model allows for multiple stakeholders to collaborate and manage/use/manipulate data along the various steps, spanning from business partners to technology providers, data analysts and operators. Additionally, each of the four stages can have a responsible or point of reference not necessarily from the same organisation. This multi-actor model was already internally experimented within the actual data management work among EW-Shopp partners working on the various Pilots.
  • Flexibility in terms of automation vs. human in the loop. The dataflows in EW-Shopp are heterogeneous in terms of data and complex in terms of flows. In some steps of the flow high levels of automation are foreseeable but at the same time at other points human intervention is needed or desirable (e.g., for quality control, customisation, etc.). Our model does not enforce neither and allows for both to consist along the workflow.
  • The model is essentially agnostic about time granularity and time discreetness. This means that a full dataflow could happen continuously in near real-time (given of course that it were also strongly automatic), or at fixed time intervals (e.g., once a day). The model allows both scenarios to happen and coexist.
  • Tools modularity allowing for the potential inclusion within flows of tools external to EW-Shopp. The first, basic requirement for such feature is at the ingestion stage were business partners may need to use their own (potentially legacy) tools to extract data from their own or from some business partners’ systems. Additionally, at other stages of the flow this also provides for a degree of flexibility which is also highly compatible with EW-Shopp’s open source approach allowing, for instance, to use domain-specific tools for analytics tasks.

Whan to know more?

If you have any questions about the methodology or you would like to get in touch, please contact us through our online contact form or start a conversation on our Linkedin or Twitter profile. We would love to hear how this could be interesting in your scenarios and if you have any ideas about it!

Leave a Reply

Your email address will not be published. Required fields are marked *