Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability issues #247

Merged
merged 8 commits into from
Oct 21, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions datapackage/group.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from itertools import chain
from datapackage import exceptions


# Module API
Expand Down Expand Up @@ -52,3 +53,24 @@ def read(self, limit=None, **options):
if count == limit:
break
return rows

def check_relations(self):
Copy link
Member

@roll roll Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test for this method - https://coveralls.io/builds/26209815/source?filename=datapackage%2Fgroup.py#L62?

Our coverage dropped below its validity threshold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep almost done.

# this function mimics the resource.check_relations but optimize it for groups
# it's also a prototype to proprose a faster check_relations process for all resources

# opti relations should ne loaded only once for the group
foreign_keys_values = self.__resources[0].get_foreign_keys_values()

# alternative to check_relations from tableschema-py
for resource in self.__resources:
try:
resource.check_relations(foreign_keys_values=foreign_keys_values)
except exceptions.DataPackageException as exception:
print('in %s: ' % resource.name)
if exception.multiple:
for error in exception.errors:
print(error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a debug print instead of raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, raising looks like the good thing to do.

else:
print(exception)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a debug print instead of raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, raising looks like the good thing to do


return True
20 changes: 15 additions & 5 deletions datapackage/resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def iter(self, relations=False, **options):

return self.__get_table().iter(relations=relations, **options)

def read(self, relations=False, **options):
def read(self, relations=False, foreign_keys_values=False, **options):
"""https://github.com/frictionlessdata/datapackage-py#resource
"""

Expand All @@ -195,17 +195,23 @@ def read(self, relations=False, **options):
raise exceptions.DataPackageException(message)

# Get relations
if relations:
if relations and not foreign_keys_values:
relations = self.__get_relations()

return self.__get_table().read(relations=relations, **options)
return self.__get_table().read(relations=relations, foreign_keys_values=foreign_keys_values,
**options)

def check_relations(self):
def check_relations(self, foreign_keys_values=False):
"""https://github.com/frictionlessdata/datapackage-py#resource
"""
self.read(relations=True)
self.read(relations=True, foreign_keys_values=foreign_keys_values)
return True

def drop_relations(self):
# storing relations datasets eats memory, we can need to garbage those
self.__relations = False
return self.__relations is False

def raw_iter(self, stream=False):
"""https://github.com/frictionlessdata/datapackage-py#resource
"""
Expand Down Expand Up @@ -419,6 +425,10 @@ def __get_relations(self):

return self.__relations

def get_foreign_keys_values(self):
# need to access it from groups for optimization
return self.__get_table().index_foreign_keys_values(self.__get_relations())

# Deprecated

@property
Expand Down