Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a flag to open data packages with not safe paths? #171

Closed
vitorbaptista opened this issue Sep 14, 2017 · 5 comments · Fixed by #262
Closed

Provide a flag to open data packages with not safe paths? #171

vitorbaptista opened this issue Sep 14, 2017 · 5 comments · Fixed by #262
Labels

Comments

@vitorbaptista
Copy link
Contributor

vitorbaptista commented Sep 14, 2017

Overview

We could allow to open data package with not safe paths if user explicitly choose it:

package = Package(descriptor, safe=False)

From @vitorbaptista

This is what I'm trying to do:

import datapackage

descriptor = {
    'name': 'my-datapackage',
    'resources': [
      {
        'name': 'data',
        'path': '/tmp/my-data.csv'
      }
    ]
}
dp = datapackage.Package(descriptor)
# *** tableschema.exceptions.DataPackageException: Local path "/tmp/my-data.csv" is not safe

Reading the helpers.is_safe_path() method:

def is_safe_path(path):
"""Check if path is safe and allowed.
"""
if os.path.isabs(path):
return False
if '..%s' % os.path.sep in path:
return False
return True

Makes me think that what I'm trying to do isn't possible anymore. Any known workarounds?

(cc @roll)

@roll roll added the {current} label Sep 18, 2017
@roll roll self-assigned this Sep 18, 2017
@roll
Copy link
Member

roll commented Sep 18, 2017

@vitorbaptista
There was a long discussion on path security in Specs Working Group that has been finished with this resolution (http://specs.frictionlessdata.io/data-resource/#path-data-in-files):

POSIX paths (unix-style with / as separator) are supported for referencing local files, with the security restraint that they MUST be relative siblings or children of the descriptor. Absolute paths (/) and relative parent paths (../) MUST NOT be used, and implementations SHOULD NOT support these path types.

/ (absolute path) and ../ (relative parent path) are forbidden to avoid security vulnerabilities when implementing data package tools. These limitations on resource path ensure that resource paths only point to files within the data package directory and its subdirectories. This prevents data package tools being exploited by a malicious user to gain unintended access to sensitive information. For example, suppose a data package hosting service stores packages on disk and allows access via an API. A malicious user uploads a data package with a resource path like /etc/passwd. The user then requests the data for that resource and the server naively opens /etc/passwd and returns that data to the caller.

So for now a valid data package descriptor can't have path /tmp/my-data.csv. And I think it's really good - to protect all users by default. But as an implementation we could provide an additional functionality like a flag to support unsafe paths. Something like this:

package = Package(descriptor, safe=False)

And disable this is_safe_path checks for this case.

@roll roll changed the title How to create a new datapackage with a resource in the local filesystem? Provide a flag to open data packages with not safe paths? Sep 18, 2017
@roll roll added enhancement and removed question labels Sep 18, 2017
@roll roll removed their assignment Sep 18, 2017
@vitorbaptista
Copy link
Contributor Author

@roll Taking a step back, what I want to do is create a datapackage with files from the local filesystem. Apparently, the way I can do this with the current code is by adding the resources in a folder, and passing that as a base_path when creating the Package() object. Although that makes my job a bit harder, I think it's an OK-ish compromise. I would suggest not implementing the safe=False flag for now if others haven't complained about it.

vitorbaptista added a commit to openspending/datapackage-pipelines-fiscal that referenced this issue Oct 5, 2017
Currently, `datapackage-py` only supports "safe" resources, which when they're
local resources it means that they must be located somewhere inside the
datapackage.json's directory.

I tried setting the DataPackage's base path as the directory where temporary
files are created (e.g. `/tmp`), but that didn't work because the DP already
has other local resources that don't live in `/tmp`. As the DataPackage that
I'm loading is from a zip file, the directory its data live in exist only in
memory, so I can't add new files to it.

In the end, the cleanest solution was monkey patching (yeah, I know...). I
simply mock the `datapackage.helpers.is_safe_path()` to always return true when
saving the DataPackage, and everything works.

This is a temporary solution. I created an issue on
frictionlessdata/datapackage-py#171 to find a way
that won't require monkey patching.

openspending/openspending#1222
@vitorbaptista
Copy link
Contributor Author

@roll So, it didn't work out. I have a datapackage that already has a local resource (it's inside a zip file), and I need to add more local resources. I tried changing the base_path to the new local resources' base path, but that broke the original resource the DP has.

In the end, I had to mock out datapackage.helpers.is_safe_path. Check openspending/datapackage-pipelines-fiscal@20bbf6e.

@davibicudo
Copy link

This is also an issue for me, as I described in the unnecessary duplicate issue: #261
Thanks @vitorbaptista for the mock workaround, I'll copy it for now in my code.

@roll
Copy link
Member

roll commented Mar 31, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants