Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of an equivalent to the R function base::merge #134

Open
ZekeMarshall opened this issue Feb 8, 2024 · 2 comments
Open

Implementation of an equivalent to the R function base::merge #134

ZekeMarshall opened this issue Feb 8, 2024 · 2 comments

Comments

@ZekeMarshall
Copy link

ZekeMarshall commented Feb 8, 2024

Hi,

I was wondering whether there were any plans to implement a function similar to the R function base::merge?

I would be happy to help develop such a function and have started to do so, however I'd just like to check whether you would be amenable to the idea?

Below is a barebones function I created for a project I'm working on to align the columns for two NamedArrays.NamedArray objects, which could form the start of a merge-like function with the addition of hcat and vcat steps. There is probably a much better way to do this!

using NamedArrays
function align_array_columns(x::NamedArray, y::NamedArray, colorder::String = "x")

    # Check which columns are missing from x and y
    x_missing_cols = setdiff(Set(names(y)[2]), Set(names(x)[2]))
    y_missing_cols = setdiff(Set(names(x)[2]), Set(names(y)[2]))

    # If there are missing columns in the x matrix
    x_mat = copy(x)
    if length(x_missing_cols) != 0
        x_mat_missing = NamedArray(zeros(size(x,1), length(x_missing_cols)), names = (vec(names(x)[1]), collect(x_missing_cols)))
        x_mat_colnames = names(x)[2]
        x_mat = [x x_mat_missing]
        setnames!(x_mat, [x_mat_colnames; collect(x_missing_cols)], 2)
    end

   # If there are missing columns in the y matrix
    y_mat = copy(y)
    if length(y_missing_cols) != 0
        y_mat_missing = NamedArray(zeros(size(y,1), length(y_missing_cols)), names = (vec(names(y)[1]), collect(y_missing_cols)))
        y_mat_colnames = names(y)[2]
        y_mat = [y y_mat_missing]
        setnames!(y_mat, [y_mat_colnames; collect(y_missing_cols)], 2)
    end

    if colorder == "x"
        y_mat = y_mat[:, names(x_mat)[2]]
    elseif colorder == "y"
        x_mat = x_mat[:, names(y_mat)[2]]
    end

    aligned_mats = (x = x_mat, y = y_mat)

    return aligned_mats

end

Apologies if I have missed something!

Cheers,

Zeke

@davidavdav
Copy link
Owner

Hi, I am not really familiar with what R merge() does. Does it stack data where in one of the dimensions the labels are the same?

In general I think we'd want an interface to such a merge function that can operate on any dimension.

@ZekeMarshall
Copy link
Author

Hi @davidavdav , thanks for your quick reply!

I agree, such a function would need to be able to operate on any or all dimensions.

Here is an example of the R function base::merge():

txt1 <- "column1   column2   column3   column4
        row1   0         1         0         0
        row2   0         0         1         0
        row3   1         0         0         1
        "

txt2 <- "column5   column6   column7   column8
        row4   0         1         0         0
        row5   0         0         1         0
        row6   1         0         0         1
        "
dat1 <- read.table(textConnection(txt1), header = TRUE)  |> as.matrix()
dat2 <- read.table(textConnection(txt2), header = TRUE)  |> as.matrix()

merge(x = dat1, y = dat2, by = "row.names", all = TRUE)

Which returns a data frame which can then be converted back into a matrix.

  Row.names column1 column2 column3 column4 column5 column6 column7 column8
1      row1       0       1       0       0      NA      NA      NA      NA
2      row2       0       0       1       0      NA      NA      NA      NA
3      row3       1       0       0       1      NA      NA      NA      NA
4      row4      NA      NA      NA      NA       0       1       0       0
5      row5      NA      NA      NA      NA       0       0       1       0
6      row6      NA      NA      NA      NA       1       0       0       1

What I would find useful would be a NamedArray-only merge like function. And a set of functions which align two matrices selected dimensions values, introducing a zero or missing, as the function above does for columns only.

Let me know your thoughts and thanks again!

Zeke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants