Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS Viewer: Download Data Package Function #43

Open
skarampatakis opened this issue Jun 2, 2017 · 15 comments
Open

OS Viewer: Download Data Package Function #43

skarampatakis opened this issue Jun 2, 2017 · 15 comments
Assignees

Comments

@skarampatakis
Copy link

Current behavior of the "Download Data Package" button is to redirect the user to the dumps folder.
It could make more sense if it could give the option to get the dataset in various formats (FDP, RDF, zip)

@liyakun
Copy link

liyakun commented Jun 8, 2017

@skarampatakis I think this probably should be done by openspending, as os-explorer on OpenBudgets is deployed by directly using the code from https://github.com/openspending/os-explorer.

@skarampatakis
Copy link
Author

I think this is configured somewhere here.

Original OS Viewer behavior is to show two links, one for the raw data and one for the datapackage.

image

@pwalsh
Copy link
Contributor

pwalsh commented Jun 8, 2017

Hi.

We have a low priority backlog item for creating ZIP downloads on Openspending, but we likely will not do this ourselves, as this can be very resource intensive for little gain.

For RDF, as the vast majority of data used in the Viewer generally does not have an RDF representation, we wouldn't add it as a general option on the Viewer, but it is the type of thing we could have a pull request for to add this functionality in certain conditions (such as on the Fraunhofer server where you know there is an RDF version of the data).

@skarampatakis
Copy link
Author

I see also ZIP files as low priority. But the rest is doable with ease. just add another

<i class="os-icon os-icon-download"></i><a href="http://eis-openbudgets.iais.fraunhofer.de/dumps" target="_blank">{{ 'Download Data Package' | i18n }}</a>

on the template and somehow feed the link href with the proper path.

Something like that

I agree that this is OBEU specific and that's the purpose of obeu-specific folder on OS-VIewer docker config.

@liyakun
Copy link

liyakun commented Jul 7, 2017

@skarampatakis sorry for the delay in replaying and thanks for pointing me out. I checked the obeu-specific, I did not find a way to get the exact dataset link. If the title of the page is the same as the dataset name, then I think it is still possible to use JavaScript to get the title and further locate the dataset. Unfortunately, the title of the page in viewer is not always the same as the title of the dataset.

@skarampatakis
Copy link
Author

Unfortunately, the title of the page in viewer is not always the same as the title of the dataset.

What you mean by this? Every dataset id in rudolf->OS Viewer is the same as it's filename minus the hash.

Perhaps you could only change the folder structure as you can't know where exactly that file is, because some are coming from custom pipelines where user can define whatever folder he/she wants or to the fromfdp folder, for datasets coming from the FDP2RDF pipeline.

Unless you make a find on the server but this is hacking.

These kind of problems could eliminate if only every dataset used what was decided in D1.5 for metadata. At least FDP2RDF pipeline could use this to define the distribution metadata. We are doing this in the CSV2RDF pipeline template

<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> <http://www.w3.org/ns/dcat#accessURL> <http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/revenue/dataset.nt> .

So you could ask the triplestore or Rudolf could provide this kind of information.

Another solution would be to create the dump on demand, store it somewhere for future use and send it to the user. This would be also a good solution as you could create the dumps as you like in whatever format. At least for datasets we have transformed in early days, it is a fact that they have separate files for datasets, DSDs and codelists.

So, by reconstructing the dump, you could provide the full dump of the dataset and other triples that describe the dataset and not a partial one.

@liyakun
Copy link

liyakun commented Jul 7, 2017

@skarampatakis

What you mean by this? Every dataset id in rudolf->OS Viewer is the same as it's filename minus the hash.

For example, the dataset in page
http://apps.openbudgets.eu/viewer/aragon-2016-expenditure__9121b?lang=en
has title
http://apps.openbudgets.eu/dumps/aragon-expenditure-2016.ttl.

The title in the page and the title of the dataset does not match. If it follows the same pattern, then of course, it is still possible to locate the dataset.

@marek-dudas
Copy link

The FDPtoRDF pipeline now follows what @skarampatakis described. The URL of the data dump is now stored in dataset metadata as ?dataset dcat:accessURL ?dumpUrl, e.g. <http://data.openbudgets.eu/resource/dataset/armenia-sample-test> <http://www.w3.org/ns/dcat#accessURL> <http://apps.openbudgets.eu/dumps/fromfdp/armenia-sample-test.nt>. The same URL is also duplicated to dcat:downloadURL, just to be sure.

FYI, since it was a small change, I made it directly on the Fraunhofer server to speed up the process. But I of course also updated the pipeline on GitHub.

@skarampatakis
Copy link
Author

skarampatakis commented Jul 27, 2017

Hi @marek-dudas,
the domain of dcat:accessURL and dcat:downloadURL should be a dcat:Distribution and not the dataset itself.

that means that the triples should be:

?dataset dcat:distribution ?dataset-distribution .
?dataset-distribution a dcat:Distribution .
?dataset-distribution dcat:accessURL ?dumpUrl .
?dataset-distribution dcat:format ?format.

e.g.

<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue> <http://www.w3.org/ns/dcat#distribution> <http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> .
<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> <http://purl.org/dc/terms/modified> "2017-03-30"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#Distribution> .
<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> <http://purl.org/dc/terms/format> <http://publications.europa.eu/resource/authority/file-type/RDF_N_QUADS> .
<http://data.openbudgets.eu/resource/dataset/europe/greece/municipalities/thessaloniki/2016/budget/revenue/distribution> <http://www.w3.org/ns/dcat#accessURL> <http://data.openbudgets.eu/dumps/resource/dataset/europe/greece/municipalities/thessaloniki/2016/> .
<http://publications.europa.eu/resource/authority/file-type/RDF_N_QUADS> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/terms/MediaTypeOrExtent> .

as can be seen here
dataset.nt.txt

@marek-dudas
Copy link

Thanks, I apparently misread the documentation, will fix it.

@marek-dudas
Copy link

It should now be corrected, according to the description above.

@larjohn larjohn self-assigned this Aug 10, 2017
@liyakun
Copy link

liyakun commented Aug 16, 2017

@larjohn @marek-dudas do you have any idea about how to fix this? thanks.

@larjohn
Copy link
Contributor

larjohn commented Aug 16, 2017

It should be fixed now - it will be visible in the next refresh.

@skarampatakis
Copy link
Author

I tried this dataset http://apps.openbudgets.eu/viewer/aragon-2016-expenditure__9121b?lang=en
Now the download button only redirects to a page where I get the following error
http://datastore.openspending.org//aragon-2016-expenditure__9121b/datapackage.json

<Error>
   <Code>AccessDenied</Code>
   <Message>Access Denied</Message>
   <RequestId>5BED139A1DC2127E</RequestId>  
   <HostId>2ICL8pguNf6RXvHG+0z1CWt9v/OHDI5IMF3jNk/mizIRUSwd4cAdEbLp+nIsT2yFYBg2SivGK
   Ww=
</HostId>
</Error>

This dataset was generated by a custom pipeline. Is this the root of the problem?

@larjohn
Copy link
Contributor

larjohn commented Sep 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants