Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream merging fails with gemini #358

Open
mrembert opened this issue Mar 7, 2025 · 12 comments · May be fixed by #391
Open

Stream merging fails with gemini #358

mrembert opened this issue Mar 7, 2025 · 12 comments · May be fixed by #391
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@mrembert
Copy link

mrembert commented Mar 7, 2025

I encountered an error when attempting to create a tool definition for the get_pums function from the tidycensus package using the create_tool_def function from ellmer.

Reproduction steps:

  1. Load the tidycensus and ellmer libraries:

    library(tidycensus)
    library(ellmer)
  2. Create a chat_gemini object:

    chat <- chat_gemini("gemini-2.0-flash")
  3. Attempt to create a tool definition for get_pums:

    create_tool_def(get_pums, chat)

Expected behavior:

I expected create_tool_def to successfully generate a tool definition for the get_pums function, allowing it to be used within the ellmer framework.

Actual behavior:

Instead of creating the tool definition, the code produced the following response/error:

tool(
  tidycensus::get_pums,
  "Load data from the American Community Survey Public Use Microdata Series API",
  variables = type_array(
    "A vector of variables from the PUMS API. Use `View(pums_variables)` to browse variable options.",
    items = type_string()
  ),
  state = type_string(
    "A state, or vector of states, for which you would like to request data. The entire US can be requested with `state = \"all\"` - though be 
patient with the data download!",
    required = FALSE # TODO: could also be a vector, unclear how to Error in merge_func(left, right, path) : is.list(left) is not TRUE
Error during wrapup: 'S4SXP': should not happen - please report
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Additional Information:

  • I've tried create_tool_def with other functions (e.g. sum) and it seems to work, so it seems to be related to something specific about the tidycensus::get_pums function.
@hadley
Copy link
Member

hadley commented Mar 8, 2025

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@walkerke
Copy link
Contributor

I've seen the same error Error in merge_func(left, right, path) : is.list(left) is not TRUE when using Gemini and not using tidycensus or tools - I get it at times with streaming text output, as when I do echo = FALSE it resolves the issue. I'll try to put together a reprex when I see it again (I can't reproduce with anything shareable, unfortunately).

@hadley
Copy link
Member

hadley commented Mar 10, 2025

@walkerke to be clear, I don't think this is a tidycensus problem, this is some bug in an edge case of our stream merging code. But it's impossible to fix without a reprex 😞

@mrembert
Copy link
Author

mrembert commented Mar 10, 2025

Let me know if this covers what you need.

library(tidycensus)
library(ellmer)


chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat)

#> [1] "```R\ntool(\n  get_pums,\n  \"Load data from the American Community Survey Public Use Microdata Series API\",\n  variables = type_array(\n    \"A vector of variables from the PUMS API. Use `View(pums_variables)` to browse variable options.\",\n    items = type_string()\n  ),\n  state = type_string(\n    \"A state, or vector of states, for which you would like to request data. The entire US can be requested with `state = \\\"all\\\"` - though be patient with the data download!\"\n  ),\n  puma = type_array( # TODO: Can be a vector or a named vector, need to express this better\n    \"A vector of PUMAs from a single state, for which you would like to request data. To get data from PUMAs in more than one state, specify a named vector of state/PUMA pairs and set `state = \\\"multiple\\\"`.\",\n    items = type_string()\n  ),\n  year = type_integer(\n    \"The data year of the 1-year ACS sample or the endyear of the 5-year sample. Defaults to 2023.\",\n    required = FALSE\n  ),\n  survey = type_string(\n    \"The ACS survey; one of either '\\\"acs1\\\"' or '\\\"acs5\\\"' (the default). Defaults to `\\\"acs5\\\"`.\",\n    required = FALSE\n  ),\n  variables_filter = type_object( # TODO: Need to express the named list format with particular types in the filter\n    \"A named list of filters you'd like to return from the PUMS API. For example, passing `list(AGE = 25:50, SEX = 1)` will return only males aged 25 to 50 in your output dataset. Defaults to `NULL`, which returns all records. If a housing-only dataset is required, use `list(SPORDER = 1)` to only return householder records (taking care in your analysis to use the household weight `WGTP`).\",\n    required = FALSE\n  ),\n  rep_weights = type_string( # TODO: Should be an enum of \"person\", \"housing\", or \"both\"\n    \"Whether or not to return housing unit, person, or both housing and person-level replicate weights for calculation of standard errors; one of '\\\"person\\\"', '\\\"housing\\\"', or '\\\"both\\\"'.\",\n    required = FALSE\n  ),\n  recode = type_boolean(\n    \"If TRUE, recodes variable values using Census data dictionary and creates a new '*_label' column for each variable that is recoded. Available for 2017 - 2022 data. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  return_vacant = type_boolean(\n    \"If TRUE, makes a separate request to the Census API to retrieve microdata for vacant housing units, which are handled differently in the API as they do not have person-level characteristics. All person-level columns in the returned dataset will be populated with NA for vacant housing units. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  show_call = type_boolean(\n    \"If TRUE, display call made to Census API. This can be very useful in debugging and determining if error messages returned are due to tidycensus or the Census API. Copy to the API call into a browser and see what is returned by the API directly. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  key = type_string(\n    \"Your Census API key.\"\n  )\n)\n```"

@hadley
Copy link
Member

hadley commented Mar 10, 2025

@mrembert I don't see an error there?

@mrembert
Copy link
Author

mrembert commented Mar 10, 2025

Yes. I noticed that to. I still get the error when I run it in the console.

Image

@hadley
Copy link
Member

hadley commented Mar 11, 2025

Can you try it with echo = TRUE?

@mrembert
Copy link
Author

mrembert commented Mar 11, 2025

Sure. When I include echo, I now get the error in the reprex:

library(tidycensus)
library(ellmer)


chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat, echo = TRUE)
#> 
#> tool(
#>   get_pums,
#>   "Load data from the American Community Survey Public Use Microdata Series 
#> API",
#>   variables = type_array(
#>     "A vector of variables from the PUMS API.",
#>     required = FALSE
#>   ),
#>   state = type_string(
#>     "A state, or vector of states, for which you would like to request data. 
#> The entire US can be requested with 'state = \"all\"'.",
#>     required = FALSE
#>   ),
#>   puma = type_array(
#>     "A vector of PUMAs from a single state, for which you would like to request
#> data. To get data from PUMAs in more than one

#> Error in merge_func(left, right, path): is.list(left) is not TRUE

@hadley hadley added this to the 0.1.2 milestone Mar 12, 2025
@hadley hadley changed the title create_tool_def Error with tidycensus::get_pums: "Error in merge_func(left, right, path) : is.list(left) is not TRUE" Stream merging fails with gemini Mar 13, 2025
@walkerke
Copy link
Contributor

walkerke commented Mar 25, 2025

I was able to get a reprex as well - I'm seeing it with Gemini Pro 2.0 Experimental when long outputs are requested:

library(ellmer)

chat <- chat_gemini(model = "gemini-2.0-pro-exp-02-05")

chat$chat("Write me a 10-paragraph essay on the history of the tidyverse.")
#> ## The Rise of the Tidyverse: A Revolution in R Data Wrangling
#> 
#> The R programming language, born in the 1990s, quickly gained traction among 
#> statisticians and data analysts for its powerful statistical capabilities and 
#> open-source nature. However, early R code could often be complex and 
#> inconsistent, particularly when it came to data manipulation. Different 
#> packages offered overlapping functionalities with varying syntax, leading to a 
#> steep learning curve and a lack of code readability.  This fragmented landscape
#> presented a significant barrier to efficient data analysis, particularly for 
#> those new to the language.
#> 
#> Enter Hadley Wickham, a statistician and R developer who recognized the need 
#> for a more cohesive and user-friendly approach to data wrangling.  Inspired by 
#> the principles of "tidy data," where each variable forms a column, each 
#> observation forms a row, and each type of observational unit forms a table, 
#> Wickham began developing a collection of packages designed to work together 
#> seamlessly.  These packages,
#> Error in merge_func(left, right, path): is.list(left) is not TRUE

Created on 2025-03-25 with reprex v2.1.0

Oddly though it does not always fail, sometimes it ends up working.

@hadley
Copy link
Member

hadley commented Mar 25, 2025

@walkerke that's super helpful thanks! I just got back from vacation and I'm catching up meetings this week, but will hopefully fix next week.

@walkerke
Copy link
Contributor

thanks @hadley!

@hadley
Copy link
Member

hadley commented Mar 27, 2025

Looks like the problem was with citations, which gemini some times provides. I should have a fix momentarily

hadley added a commit that referenced this issue Mar 27, 2025
@hadley hadley linked a pull request Mar 27, 2025 that will close this issue
@hadley hadley added the bug an unexpected problem or unintended behavior label Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants