Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neuprint.utils.connection_table_to_matrix fix: #51

Open
markuspleijzier opened this issue Oct 19, 2023 · 2 comments
Open

neuprint.utils.connection_table_to_matrix fix: #51

markuspleijzier opened this issue Oct 19, 2023 · 2 comments

Comments

@markuspleijzier
Copy link

markuspleijzier commented Oct 19, 2023

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[104], line 1
----> 1 connection_table_to_matrix(et)

File ~/opt/miniconda3/envs/py39/lib/python3.9/site-packages/neuprint/utils.py:304, in connection_table_to_matrix(conn_df, group_cols, weight_col, sort_by, make_square)
    301 dtype = conn_df[weight_col].dtype
    303 agg_weights_df = conn_df.groupby([col_pre, col_post], sort=False)[weight_col].sum().reset_index()
--> 304 matrix = agg_weights_df.pivot(col_pre, col_post, weight_col)
    305 matrix = matrix.fillna(0).astype(dtype)
    307 if sort_by:

TypeError: pivot() takes 1 positional argument but 4 were given

Fix: specify column, index and values positional arguments:

matrix = agg_weights_df.pivot(index = col_pre, columns = col_post, values = weight_col)

Fix in function:

connection_table_to_matrix()
def connection_table_to_matrix(conn_df, group_cols='bodyId', weight_col='weight', sort_by=None, make_square=False):
    """
    Given a weighted connection table, produce a weighted adjacency matrix.

    Args:
        conn_df:
            A DataFrame with columns for pre- and post- identifiers
            (e.g. bodyId, type or instance), and a column for the
            weight of the connection.

        group_cols:
            Which two columns to use as the row index and column index
            of the returned matrix, respetively.
            Or give a single string (e.g. ``"body"``, in which case the
            two column names are chosen by appending the suffixes
            ``_pre`` and ``_post`` to your string.

            If a pair of pre/post values occurs more than once in the
            connection table, all of its weights will be summed in the
            output matrix.

        weight_col:
            Which column holds the connection weight, to be aggregated for each unique pre/post pair.

        sort_by:
            How to sort the rows and columns of the result.
            Can be two strings, e.g. ``("type_pre", "type_post")``,
            or a single string, e.g. ``"type"`` in which case the suffixes are assumed.

        make_square:
            If True, insert rows and columns to ensure that the same IDs exist in the rows and columns.
            Inserted entries will have value 0.0

    Returns:
        DataFrame, shape NxM, where N is the number of unique values in
        the 'pre' group column, and M is the number of unique values in
        the 'post' group column.

    Example:

        .. code-block:: ipython

            In [1]: from neuprint import fetch_simple_connections, NeuronCriteria as NC
               ...: kc_criteria = NC(type='KC.*')
               ...: conn_df = fetch_simple_connections(kc_criteria, kc_criteria)
            In [1]: conn_df.head()
            Out[1]:
               bodyId_pre  bodyId_post  weight type_pre type_post instance_pre instance_post                                       conn_roiInfo
            0  1224137495   5813032771      29      KCg       KCg          KCg    KCg(super)  {'MB(R)': {'pre': 26, 'post': 26}, 'gL(R)': {'...
            1  1172713521   5813067826      27      KCg       KCg   KCg(super)         KCg-d  {'MB(R)': {'pre': 26, 'post': 26}, 'PED(R)': {...
            2   517858947   5813032943      26   KCab-p    KCab-p       KCab-p        KCab-p  {'MB(R)': {'pre': 25, 'post': 25}, 'PED(R)': {...
            3   642680826   5812980940      25   KCab-p    KCab-p       KCab-p        KCab-p  {'MB(R)': {'pre': 25, 'post': 25}, 'PED(R)': {...
            4  5813067826   1172713521      24      KCg       KCg        KCg-d    KCg(super)  {'MB(R)': {'pre': 23, 'post': 23}, 'gL(R)': {'...

            In [2]: from neuprint.utils import connection_table_to_matrix
               ...: connection_table_to_matrix(conn_df, 'type')
            Out[2]:
            type_post   KC  KCa'b'  KCab-p  KCab-sc     KCg
            type_pre
            KC           3     139       6        5     365
            KCa'b'     154  102337     245      997    1977
            KCab-p       7     310   17899     3029     127
            KCab-sc      4    2591    3975   247038    3419
            KCg        380    1969      79     1526  250351
    """
    if isinstance(group_cols, str):
        group_cols = (f"{group_cols}_pre", f"{group_cols}_post")

    assert len(group_cols) == 2, \
        "Please provide two group_cols (e.g. 'bodyId_pre', 'bodyId_post')"

    assert group_cols[0] in conn_df, \
        f"Column missing: {group_cols[0]}"

    assert group_cols[1] in conn_df, \
        f"Column missing: {group_cols[1]}"

    assert weight_col in conn_df, \
        f"Column missing: {weight_col}"

    col_pre, col_post = group_cols
    dtype = conn_df[weight_col].dtype

    agg_weights_df = conn_df.groupby([col_pre, col_post], sort=False)[weight_col].sum().reset_index()
    matrix = agg_weights_df.pivot(index = col_pre, columns = col_post, values = weight_col)
    matrix = matrix.fillna(0).astype(dtype)

    if sort_by:
        if isinstance(sort_by, str):
            sort_by = (f"{sort_by}_pre", f"{sort_by}_post")

        assert len(sort_by) == 2, \
            "Please provide two sort_by column names (e.g. 'type_pre', 'type_post')"

        pre_order = conn_df.sort_values(sort_by[0])[col_pre].unique()
        post_order = conn_df.sort_values(sort_by[1])[col_post].unique()
        matrix = matrix.reindex(index=pre_order, columns=post_order)
    else:
        # No sort: Keep the order as close to the input order as possible.
        pre_order = conn_df[col_pre].unique()
        post_order = conn_df[col_post].unique()
        matrix = matrix.reindex(index=pre_order, columns=post_order)

    if make_square:
        matrix, _ = matrix.align(matrix.T).fillna(0.0).astype(matrix.dtype)
        matrix = matrix.rename_axis('bodyId_pre', axis=0).rename_axis('bodyId_post', axis=1)
        matrix = matrix.loc[sorted(matrix.index), sorted(matrix.columns)]

    return matrix
@stuarteberg
Copy link
Collaborator

This is a duplicate of #50, but I'll leave this open since that one was closed prematurely. The error is due to a change in the pandas API. Our master branch is already fixed, but I haven't made a release. A release (with other changes) will be coming in the next 3 weeks, so I'm not going to push one out today.

@markuspleijzier
Copy link
Author

ah ok cool, thank you @stuarteberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants