-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration pipeline exploiting Mustache template. #24
Comments
@skarampatakis Do I understand it correctly, that the scalability issue is related to a potentially huge number of pipelines and executions? So the desire is to limit the number of executions and instances to minimum? |
@skodapetr Another possible solution I was thinking of and used recently, would be to use a YAML (or similar) descriptor with a script to reconfigure the pipeline internally. But I don't know if this is even possible for LP pipelines. |
Why not leverage the fact that LP-ETL pipelines are serialized in RDF? One solution might be to put placeholder resources (e.g., blank nodes instantiating |
Most of the configurations needed to change are inside SPARQL queries of SPARQL Update or Construct components. Do we have a sloution on how we can change these with your method? Can we have a minimal working example so I can take a look? Maybe I am missing something here. |
If the variable parts are inside SPARQL, then this method does not provide much help. You can think of using SPIN to represent SPARQL in RDF and then render it to literals using the SPIN API, but that would likely be an overkill. I was thinking more about variables such as simple values of configuration of LP-ETL components. I you need to generate SPARQL instead, there are many approaches, some of which I've cover in my post last year. Using Mustache might be a fine solution for that. |
Hi @jindrichmynarz , I have looked also at SPIN at some time but to be honest, I think that as you said it would be an overkill. So I believe that we need to have a solution that would be easy to produce and easy to be consumed. The simplest thing that I would like to have would be a file with variable name - values pairs. Then "something" reads that file, configures the pipeline by replacing the variables with it's values and finally executes it! This file could be either be written by hand, or by the Packager. So it has to be simple and at any case not using RDF syntax. So YAML looks a good candidate. Mustache component of LP uses RDF flavor of mustache template so it may be a bit confusing. Please correct me if I got something wrong here. Just thoughts, there may be better or more elegant solutions than this. |
Since the data model Mustache operates on is basically YAML, one option for solving this may be to implement a component that does standard Mustache rendering: using a template and YAML data to render its output. However, YAML still requires some tech-savviness, so it may require a UI-based solution. |
Yes, that was my thought also when I thought about YAML. The UI-based solution could be the Packager itself. If you see a descriptor produced by the packager, most if not all, required variables are already there. So we could re-purpose the descriptor as a LP pipeline descriptor. It already has that role for OS platform pipelines, at least at my understanding. |
Please find attached a sample OS descriptor {
"model": {
"dimensions": {
"functional-classification": {
"dimensionType": "classification",
"primaryKey": [
"functional_classification_generic_code"
],
"attributes": {
"functional_classification_generic_code": {
"source": "Κ.Α.",
"title": "Κ.Α."
},
"functional_classification_generic_label": {
"source": "Περιγραφή",
"title": "Περιγραφή",
"labelfor": "functional_classification_generic_code"
}
},
"classificationType": "functional"
},
"date": {
"dimensionType": "datetime",
"primaryKey": [
"date_fiscal_year"
],
"attributes": {
"date_fiscal_year": {
"source": "Έτος",
"title": "Έτος"
}
}
}
},
"measures": {
"value": {
"source": "Προϋπολογισθέντα",
"title": "Προϋπολογισθέντα",
"currency": "EUR",
"direction": "revenue",
"phase": "proposed"
},
"value_2": {
"source": "Διαμορφωθέντα",
"title": "Διαμορφωθέντα",
"currency": "EUR",
"direction": "revenue",
"phase": "adjusted"
},
"value_3": {
"source": "Βεβαιωθέντα",
"title": "Βεβαιωθέντα",
"currency": "EUR",
"direction": "revenue",
"phase": "approved"
},
"value_4": {
"source": "Εισπραχθέντα",
"title": "Εισπραχθέντα",
"currency": "EUR",
"direction": "revenue",
"phase": "executed"
}
}
},
"regionCode": "eu",
"countryCode": "GR",
"cityCode": "Thessaloniki",
"fiscalPeriod": {
"start": "2016-01-01",
"end": "2016-12-31"
},
"title": "Municipality of Thessaloniki, Greece Revenue Budget fot the fiscal year 2016",
"name": "europe-greece-municipality-thessaloniki-2016-revenue",
"description": "Municipality of Thessaloniki, Greece Revenue Budget fot the fiscal year 2016.",
"resources": [
{
"name": "thessaloniki-2016-revenue",
"format": "csv",
"path": "thessaloniki-2016-revenue.csv",
"mediatype": "text/csv",
"bytes": 44711,
"dialect": {
"csvddfVersion": 1,
"delimiter": ",",
"lineTerminator": "\n"
},
"encoding": "utf-8",
"schema": {
"fields": [
{
"title": "Κ.Α.",
"name": "Κ.Α.",
"slug": "functional_classification_generic_code",
"type": "string",
"format": "default",
"osType": "functional-classification:generic:code",
"conceptType": "functional-classification"
},
{
"title": "Περιγραφή",
"name": "Περιγραφή",
"slug": "functional_classification_generic_label",
"type": "string",
"format": "default",
"osType": "functional-classification:generic:label",
"conceptType": "functional-classification"
},
{
"title": "Προϋπολογισθέντα",
"name": "Προϋπολογισθέντα",
"slug": "value",
"type": "number",
"format": "default",
"osType": "value",
"conceptType": "value",
"decimalChar": ",",
"groupChar": "."
},
{
"title": "Διαμορφωθέντα",
"name": "Διαμορφωθέντα",
"slug": "value_2",
"type": "number",
"format": "default",
"osType": "value",
"conceptType": "value",
"decimalChar": ",",
"groupChar": "."
},
{
"title": "Βεβαιωθέντα",
"name": "Βεβαιωθέντα",
"slug": "value_3",
"type": "number",
"format": "default",
"osType": "value",
"conceptType": "value",
"decimalChar": ",",
"groupChar": "."
},
{
"title": "Εισπραχθέντα",
"name": "Εισπραχθέντα",
"slug": "value_4",
"type": "number",
"format": "default",
"osType": "value",
"conceptType": "value",
"decimalChar": ",",
"groupChar": "."
},
{
"title": "Έτος",
"name": "Έτος",
"slug": "date_fiscal_year",
"type": "integer",
"format": "default",
"osType": "date:fiscal-year",
"conceptType": "date"
}
],
"primaryKey": [
"Κ.Α.",
"Έτος"
]
}
}
],
"@context": "http://schemas.frictionlessdata.io/fiscal-data-package.jsonld",
"owner": "mple",
"author": "Sotiris Karampatakis <[email protected]>",
"count_of_rows": 269
} |
If you can map the variables required by your pipeline to something like JSON Path in the FDP descriptor, then it may be used in a Mustache template. |
You mean something like that?
|
Yes. Just to get an idea of how many of the variables your pipeline requires can be served from the FDP descriptor. |
Let suppose that we can map all of them. What would be the next step? |
The next step could be testing if the pipeline can be generated using standard Mustache. |
@skarampatakis We got some new functionality (runtime configuration for more component, x-httpRequest) in develop brach. With that functionality I created a prototype: It consists of two pipelines: Instance and Metapipeline. The What is missing:
It would be great if you can take a look and give some feedback. Especially if this solution would be suitable for your use-case. |
By the way, @skodapetr, I get load timeout for the demo instance of LP-ETL: |
@jindrichmynarz We were updating the instance, just try again. |
Ok. We can open another issue for this. This is then probabky caused by
angular timeouts. Is there anything in the console?
…On Thu, Jun 1, 2017, 12:42 Jindřich Mynarz ***@***.***> wrote:
The problem persists after refreshes too or in other browsers. While the
1.2 MB of angular-material.js loads, the screen remains blank.
[image: screen shot 2017-06-01 at 12 40 31]
<https://cloud.githubusercontent.com/assets/198642/26676355/ac2adf16-46c7-11e7-9893-580d2b8577a0.png>
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAr364gllKAyYnQX-5wqMBdD8JxWR1Meks5r_pWBgaJpZM4NhNyc>
.
|
So far, the CSV2RDF pipeline uses a "version" of the Mustache to be reconfigured. There are almost 240 edits to be done in order to reconfigure the pipeline.
Manual editing would probably introduce typos or other kind of errors. The number of edits required is reduced to only(!) 20 unique, considering that in most times there are duplicates.
The idea was to have the variables using a "Mustache like" template, as is {{@@variable_name@@}}, trying not to interfere with the JSON-LD notation, that is used by the pipeline. So the current workflow is to use a text editor or a shell script or any other possible way, to automatically replace all these entries with the required variable value. The list of the variables needed to be changed can be found here. Since LP has already a Mustache component, how could we use it to re-configure the pipeline?
At the moment the only possible way is to create a new pipeline, upload it in LP and execute. I think this solution may have scalability issues. Could it be possible to just reconfigure the pipeline and execute as it happens with FDP2RDF pipeline?
I believe this kind of approach could save as a lot of time, as this kind of parameters is already gathered by the OS Packager. So, we could use the OS Packager to configure both LP and OS pipelines.
@jakubklimek could you please have a look and provide a kickstart guide?
The text was updated successfully, but these errors were encountered: