Configuration pipeline exploiting Mustache template. #24

skarampatakis · 2017-05-20T07:04:12Z

So far, the CSV2RDF pipeline uses a "version" of the Mustache to be reconfigured. There are almost 240 edits to be done in order to reconfigure the pipeline.

Manual editing would probably introduce typos or other kind of errors. The number of edits required is reduced to only(!) 20 unique, considering that in most times there are duplicates.

The idea was to have the variables using a "Mustache like" template, as is {{@@variable_name@@}}, trying not to interfere with the JSON-LD notation, that is used by the pipeline. So the current workflow is to use a text editor or a shell script or any other possible way, to automatically replace all these entries with the required variable value. The list of the variables needed to be changed can be found here. Since LP has already a Mustache component, how could we use it to re-configure the pipeline?

At the moment the only possible way is to create a new pipeline, upload it in LP and execute. I think this solution may have scalability issues. Could it be possible to just reconfigure the pipeline and execute as it happens with FDP2RDF pipeline?

I believe this kind of approach could save as a lot of time, as this kind of parameters is already gathered by the OS Packager. So, we could use the OS Packager to configure both LP and OS pipelines.

@jakubklimek could you please have a look and provide a kickstart guide?

skodapetr · 2017-05-30T10:47:49Z

@skarampatakis Do I understand it correctly, that the scalability issue is related to a potentially huge number of pipelines and executions? So the desire is to limit the number of executions and instances to minimum?

skarampatakis · 2017-05-30T14:33:20Z

@skodapetr
I believe yes. Every time a user configures a pipeline ATM has to be uploaded and run. This would create huge number of pipelines (followed by it's garbage), in real world applications. Even if the user creates a custom pipeline due to complex data structure, but the same for his use case every let say year, would have to reconfigure the pipeline. We have seen this with our use case of Greek Municipalities where we had to create a new pipeline for each year, for each municipality, whilst all of them have the same structure.

Another possible solution I was thinking of and used recently, would be to use a YAML (or similar) descriptor with a script to reconfigure the pipeline internally. But I don't know if this is even possible for LP pipelines.

jindrichmynarz · 2017-05-30T15:09:44Z

Why not leverage the fact that LP-ETL pipelines are serialized in RDF? One solution might be to put placeholder resources (e.g., blank nodes instantiating sp:Variable identified with sp:varName) in places that require configuration. Consequently, there can be a SPARQL Update that replaces these variables with concrete data given configuration in RDF (e.g., provided via the Text holder component). This would enable higher-level operations than basic text manipulation with Mustache. What do you think?

skarampatakis · 2017-05-30T15:19:45Z

Most of the configurations needed to change are inside SPARQL queries of SPARQL Update or Construct components. Do we have a sloution on how we can change these with your method? Can we have a minimal working example so I can take a look?

Maybe I am missing something here.

jindrichmynarz · 2017-05-30T15:25:27Z

If the variable parts are inside SPARQL, then this method does not provide much help. You can think of using SPIN to represent SPARQL in RDF and then render it to literals using the SPIN API, but that would likely be an overkill. I was thinking more about variables such as simple values of configuration of LP-ETL components. I you need to generate SPARQL instead, there are many approaches, some of which I've cover in my post last year. Using Mustache might be a fine solution for that.

skarampatakis · 2017-05-30T15:44:18Z

Hi @jindrichmynarz ,
I have read your post (which by the way is great as all of them), and that was the "eureka" moment I had with using the mustache component to reconfigure the pipeline. I have also tried to play with the demo pipeline on your public LP instance but to be honest I was a bit confused. That's why I asked your help, if you can provide a kickstart so I can do the rest. We have the pipeline, we also have the variables need to changed. Now we only have to somehow "feed" these variables with values and replace them on the pipeline.

I have looked also at SPIN at some time but to be honest, I think that as you said it would be an overkill.
The overall problem is that we have a template of pipeline that is designed to be used by data uploaders of the platform. The average people responsible for this job does not and shouldn't either have a minimum knowledge of RDF or SW technologies.

So I believe that we need to have a solution that would be easy to produce and easy to be consumed.

The simplest thing that I would like to have would be a file with variable name - values pairs. Then "something" reads that file, configures the pipeline by replacing the variables with it's values and finally executes it! This file could be either be written by hand, or by the Packager. So it has to be simple and at any case not using RDF syntax. So YAML looks a good candidate. Mustache component of LP uses RDF flavor of mustache template so it may be a bit confusing. Please correct me if I got something wrong here.

Just thoughts, there may be better or more elegant solutions than this.

jindrichmynarz · 2017-05-30T15:54:00Z

Since the data model Mustache operates on is basically YAML, one option for solving this may be to implement a component that does standard Mustache rendering: using a template and YAML data to render its output.

However, YAML still requires some tech-savviness, so it may require a UI-based solution.

skarampatakis · 2017-05-30T15:58:03Z

Yes, that was my thought also when I thought about YAML. The UI-based solution could be the Packager itself. If you see a descriptor produced by the packager, most if not all, required variables are already there. So we could re-purpose the descriptor as a LP pipeline descriptor. It already has that role for OS platform pipelines, at least at my understanding.

skarampatakis · 2017-05-30T16:09:40Z

Please find attached a sample OS descriptor

{
  "model": {
    "dimensions": {
      "functional-classification": {
        "dimensionType": "classification",
        "primaryKey": [
          "functional_classification_generic_code"
        ],
        "attributes": {
          "functional_classification_generic_code": {
            "source": "Κ.Α.",
            "title": "Κ.Α."
          },
          "functional_classification_generic_label": {
            "source": "Περιγραφή",
            "title": "Περιγραφή",
            "labelfor": "functional_classification_generic_code"
          }
        },
        "classificationType": "functional"
      },
      "date": {
        "dimensionType": "datetime",
        "primaryKey": [
          "date_fiscal_year"
        ],
        "attributes": {
          "date_fiscal_year": {
            "source": "Έτος",
            "title": "Έτος"
          }
        }
      }
    },
    "measures": {
      "value": {
        "source": "Προϋπολογισθέντα",
        "title": "Προϋπολογισθέντα",
        "currency": "EUR",
        "direction": "revenue",
        "phase": "proposed"
      },
      "value_2": {
        "source": "Διαμορφωθέντα",
        "title": "Διαμορφωθέντα",
        "currency": "EUR",
        "direction": "revenue",
        "phase": "adjusted"
      },
      "value_3": {
        "source": "Βεβαιωθέντα",
        "title": "Βεβαιωθέντα",
        "currency": "EUR",
        "direction": "revenue",
        "phase": "approved"
      },
      "value_4": {
        "source": "Εισπραχθέντα",
        "title": "Εισπραχθέντα",
        "currency": "EUR",
        "direction": "revenue",
        "phase": "executed"
      }
    }
  },
  "regionCode": "eu",
  "countryCode": "GR",
  "cityCode": "Thessaloniki",
  "fiscalPeriod": {
    "start": "2016-01-01",
    "end": "2016-12-31"
  },
  "title": "Municipality of Thessaloniki, Greece Revenue Budget fot the fiscal year 2016",
  "name": "europe-greece-municipality-thessaloniki-2016-revenue",
  "description": "Municipality of Thessaloniki, Greece Revenue Budget fot the fiscal year 2016.",
  "resources": [
    {
      "name": "thessaloniki-2016-revenue",
      "format": "csv",
      "path": "thessaloniki-2016-revenue.csv",
      "mediatype": "text/csv",
      "bytes": 44711,
      "dialect": {
        "csvddfVersion": 1,
        "delimiter": ",",
        "lineTerminator": "\n"
      },
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "title": "Κ.Α.",
            "name": "Κ.Α.",
            "slug": "functional_classification_generic_code",
            "type": "string",
            "format": "default",
            "osType": "functional-classification:generic:code",
            "conceptType": "functional-classification"
          },
          {
            "title": "Περιγραφή",
            "name": "Περιγραφή",
            "slug": "functional_classification_generic_label",
            "type": "string",
            "format": "default",
            "osType": "functional-classification:generic:label",
            "conceptType": "functional-classification"
          },
          {
            "title": "Προϋπολογισθέντα",
            "name": "Προϋπολογισθέντα",
            "slug": "value",
            "type": "number",
            "format": "default",
            "osType": "value",
            "conceptType": "value",
            "decimalChar": ",",
            "groupChar": "."
          },
          {
            "title": "Διαμορφωθέντα",
            "name": "Διαμορφωθέντα",
            "slug": "value_2",
            "type": "number",
            "format": "default",
            "osType": "value",
            "conceptType": "value",
            "decimalChar": ",",
            "groupChar": "."
          },
          {
            "title": "Βεβαιωθέντα",
            "name": "Βεβαιωθέντα",
            "slug": "value_3",
            "type": "number",
            "format": "default",
            "osType": "value",
            "conceptType": "value",
            "decimalChar": ",",
            "groupChar": "."
          },
          {
            "title": "Εισπραχθέντα",
            "name": "Εισπραχθέντα",
            "slug": "value_4",
            "type": "number",
            "format": "default",
            "osType": "value",
            "conceptType": "value",
            "decimalChar": ",",
            "groupChar": "."
          },
          {
            "title": "Έτος",
            "name": "Έτος",
            "slug": "date_fiscal_year",
            "type": "integer",
            "format": "default",
            "osType": "date:fiscal-year",
            "conceptType": "date"
          }
        ],
        "primaryKey": [
          "Κ.Α.",
          "Έτος"
        ]
      }
    }
  ],
  "@context": "http://schemas.frictionlessdata.io/fiscal-data-package.jsonld",
  "owner": "mple",
  "author": "Sotiris Karampatakis <[email protected]>",
  "count_of_rows": 269
}

datapackage.json.txt

jindrichmynarz · 2017-05-30T16:16:28Z

If you can map the variables required by your pipeline to something like JSON Path in the FDP descriptor, then it may be used in a Mustache template.

skarampatakis · 2017-05-30T16:42:40Z

You mean something like that?

raw_dataset_uri  = $.resources[*].path

jindrichmynarz · 2017-05-30T17:37:01Z

Yes. Just to get an idea of how many of the variables your pipeline requires can be served from the FDP descriptor.

skarampatakis · 2017-05-30T20:53:49Z

Let suppose that we can map all of them. What would be the next step?

jindrichmynarz · 2017-05-31T04:11:27Z

The next step could be testing if the pipeline can be generated using standard Mustache.

skodapetr · 2017-05-31T13:19:52Z

@skarampatakis We got some new functionality (runtime configuration for more component, x-httpRequest) in develop brach. With that functionality I created a prototype:

It consists of two pipelines: Instance and Metapipeline. The Metapipeline retrieves the Instance, uses t-mustache for placeholder substitution and executes the pipeline.

What is missing:

Input from e-pipelineInput. So you can start the execution with custom configuration (variables) with HTTP POST request. I think it is not necessary to have this for the demonstration purpose.
Delete previous executions. Here I need to know: should we delete all previous executions of the pipeline or should we keep executions that failed?

It would be great if you can take a look and give some feedback. Especially if this solution would be suitable for your use-case.

jindrichmynarz · 2017-05-31T14:20:34Z

By the way, @skodapetr, I get load timeout for the demo instance of LP-ETL:

jakubklimek · 2017-05-31T14:35:07Z

@jindrichmynarz We were updating the instance, just try again.

jindrichmynarz · 2017-06-01T10:42:09Z

The problem persists after refreshes too or in other browsers. While the 1.2 MB of angular-material.js loads, the screen remains blank.

jakubklimek · 2017-06-01T13:38:50Z

Ok. We can open another issue for this. This is then probabky caused by angular timeouts. Is there anything in the console?

…

On Thu, Jun 1, 2017, 12:42 Jindřich Mynarz ***@***.***> wrote: The problem persists after refreshes too or in other browsers. While the 1.2 MB of angular-material.js loads, the screen remains blank. [image: screen shot 2017-06-01 at 12 40 31] <https://cloud.githubusercontent.com/assets/198642/26676355/ac2adf16-46c7-11e7-9893-580d2b8577a0.png> — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAr364gllKAyYnQX-5wqMBdD8JxWR1Meks5r_pWBgaJpZM4NhNyc> .

jakubklimek assigned jakubklimek and skodapetr May 22, 2017

jindrichmynarz mentioned this issue Jun 1, 2017

Large front-end assets for Angular may cause timeout linkedpipes/etl#395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration pipeline exploiting Mustache template. #24

Configuration pipeline exploiting Mustache template. #24

skarampatakis commented May 20, 2017

skodapetr commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017 •

edited

Loading

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 31, 2017

skodapetr commented May 31, 2017 •

edited by jindrichmynarz

Loading

jindrichmynarz commented May 31, 2017

jakubklimek commented May 31, 2017

jindrichmynarz commented Jun 1, 2017

jakubklimek commented Jun 1, 2017 via email

Configuration pipeline exploiting Mustache template. #24

Configuration pipeline exploiting Mustache template. #24

Comments

skarampatakis commented May 20, 2017

skodapetr commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017 • edited Loading

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 30, 2017

skarampatakis commented May 30, 2017

jindrichmynarz commented May 31, 2017

skodapetr commented May 31, 2017 • edited by jindrichmynarz Loading

jindrichmynarz commented May 31, 2017

jakubklimek commented May 31, 2017

jindrichmynarz commented Jun 1, 2017

jakubklimek commented Jun 1, 2017 via email

jindrichmynarz commented May 30, 2017 •

edited

Loading

skodapetr commented May 31, 2017 •

edited by jindrichmynarz

Loading