Skip to content

Convenience functions to insert / fetch when an attach field is in table definition #1156

@MaxFBurg

Description

@MaxFBurg

Feature Request

Problem

When inserting into a table that has a field result : attach@minio, the insert table method expects a file path. Similarly, fetch stores a file and returns a file path. This is often times inconvenient, because (i) the data saved in the file is required as an object in the python script one is executing, and (ii) the saved / downloaded files remains on local storage even after the script terminated.

Requirements

Possible solution: Introduce a parameter to insert that automatically saves the data that should be inserted to a file, inserts it into the table, and then removes that file. Similarly, fetch could save the file, and return the file / data loaded within the python script.

Justification

See problem section

Alternative Considerations

Currently I am using an AttachMixin as a workaround, i.e. my table would be defined as class MyTable(AttachMixin, dj.Computed). The mixin could be the code basis for the feature I suggested, although it would need a little bit of improvement.

class AttachMixin:

    def attach_insert(self, keys: Iterable[Dict[str, Any]], attach_keys: Iterable[str]) -> None:
        if not isinstance(attach_keys, list):
            raise ValueError("attach_keys must be a list")

        with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
            for (i, key), ak in product(enumerate(keys), attach_keys):
                path = os.path.join(temp_dir, create_random_str() + ".pkl")

                with open(path, "wb") as f:
                    pickle.dump(key[ak], f)
                keys[i][ak] = path

            self.insert(keys)

    def attach_insert1(self, key: Dict[str, Any], attach_keys: Iterable[str]) -> None:
        self.attach_insert([key], attach_keys)

    def attach_fetch(
        self,
        *attrs: str,
        key: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> Union[Dict[str, Any], List]:
        key = key or {}

        with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
            ret = (self & key).fetch(*attrs, download_path=temp_dir, **kwargs)  # array, list[dict]

            if isinstance(ret, dict):
                ret = self._load_from_dict(ret)

            elif isinstance(ret, Iterable):
                ret = np.array(ret)

                for i, value in enumerate(ret):
                    if isinstance(value, dict):
                        ret[i] = self._load_from_dict(value)

                    elif self._is_pkl_path(value):
                        with open(value, "rb") as f:
                            ret[i] = pickle.load(f)

                    else:
                        raise NotImplementedError(f"Value {value} is not a dict or a pkl path")

            elif self._is_pkl_path(ret):
                with open(ret, "rb") as f:
                    ret = pickle.load(f)

            else:
                raise NotImplementedError(f"Return value {ret} is not a dict, Iterable, or a pkl path")

        return ret

    def attach_fetch1(
        self,
        *attrs: str,
        key: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> Union[Dict[str, Any], List]:
        ret = self.attach_fetch(*attrs, key=key, **kwargs)
        if len(ret) > 1:
            raise dj.DataJointError(f"fetch1 should only return one tuple. {len(ret)} tuples were found")
        return ret[0]

    def _load_from_dict(self, d: dict[str, str]) -> dict[str, Any]:
        for key, value in d.items():
            if self._is_pkl_path(value):
                with open(value, "rb") as f:
                    d[key] = pickle.load(f)
        return d

    def _is_pkl_path(self, value):
        return (
            isinstance(value, str) and value.endswith(".pkl") and os.path.isfile(value)
        )

Related

This issues might be (loosely) related:
#1109
#1099

If you think such a feature could be helpful to be included in datajoint, I would be happy to help implementing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIndicates new improvementsstaleIndicates issues, pull requests, or discussions are inactive

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions