Stream merging fails with gemini #358

mrembert · 2025-03-07T15:29:49Z

I encountered an error when attempting to create a tool definition for the get_pums function from the tidycensus package using the create_tool_def function from ellmer.

Reproduction steps:

Load the tidycensus and ellmer libraries:
```
library(tidycensus)
library(ellmer)
```
Create a chat_gemini object:
```
chat <- chat_gemini("gemini-2.0-flash")
```
Attempt to create a tool definition for get_pums:
```
create_tool_def(get_pums, chat)
```

Expected behavior:

I expected create_tool_def to successfully generate a tool definition for the get_pums function, allowing it to be used within the ellmer framework.

Actual behavior:

Instead of creating the tool definition, the code produced the following response/error:

tool(
  tidycensus::get_pums,
  "Load data from the American Community Survey Public Use Microdata Series API",
  variables = type_array(
    "A vector of variables from the PUMS API. Use `View(pums_variables)` to browse variable options.",
    items = type_string()
  ),
  state = type_string(
    "A state, or vector of states, for which you would like to request data. The entire US can be requested with `state = \"all\"` - though be 
patient with the data download!",
    required = FALSE # TODO: could also be a vector, unclear how to Error in merge_func(left, right, path) : is.list(left) is not TRUE
Error during wrapup: 'S4SXP': should not happen - please report
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Additional Information:

I've tried create_tool_def with other functions (e.g. sum) and it seems to work, so it seems to be related to something specific about the tidycensus::get_pums function.

The text was updated successfully, but these errors were encountered:

hadley · 2025-03-08T18:32:49Z

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

walkerke · 2025-03-10T15:36:12Z

I've seen the same error Error in merge_func(left, right, path) : is.list(left) is not TRUE when using Gemini and not using tidycensus or tools - I get it at times with streaming text output, as when I do echo = FALSE it resolves the issue. I'll try to put together a reprex when I see it again (I can't reproduce with anything shareable, unfortunately).

hadley · 2025-03-10T19:05:44Z

@walkerke to be clear, I don't think this is a tidycensus problem, this is some bug in an edge case of our stream merging code. But it's impossible to fix without a reprex 😞

mrembert · 2025-03-10T19:56:20Z

Let me know if this covers what you need.

library(tidycensus)
library(ellmer)


chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat)

#> [1] "```R\ntool(\n  get_pums,\n  \"Load data from the American Community Survey Public Use Microdata Series API\",\n  variables = type_array(\n    \"A vector of variables from the PUMS API. Use `View(pums_variables)` to browse variable options.\",\n    items = type_string()\n  ),\n  state = type_string(\n    \"A state, or vector of states, for which you would like to request data. The entire US can be requested with `state = \\\"all\\\"` - though be patient with the data download!\"\n  ),\n  puma = type_array( # TODO: Can be a vector or a named vector, need to express this better\n    \"A vector of PUMAs from a single state, for which you would like to request data. To get data from PUMAs in more than one state, specify a named vector of state/PUMA pairs and set `state = \\\"multiple\\\"`.\",\n    items = type_string()\n  ),\n  year = type_integer(\n    \"The data year of the 1-year ACS sample or the endyear of the 5-year sample. Defaults to 2023.\",\n    required = FALSE\n  ),\n  survey = type_string(\n    \"The ACS survey; one of either '\\\"acs1\\\"' or '\\\"acs5\\\"' (the default). Defaults to `\\\"acs5\\\"`.\",\n    required = FALSE\n  ),\n  variables_filter = type_object( # TODO: Need to express the named list format with particular types in the filter\n    \"A named list of filters you'd like to return from the PUMS API. For example, passing `list(AGE = 25:50, SEX = 1)` will return only males aged 25 to 50 in your output dataset. Defaults to `NULL`, which returns all records. If a housing-only dataset is required, use `list(SPORDER = 1)` to only return householder records (taking care in your analysis to use the household weight `WGTP`).\",\n    required = FALSE\n  ),\n  rep_weights = type_string( # TODO: Should be an enum of \"person\", \"housing\", or \"both\"\n    \"Whether or not to return housing unit, person, or both housing and person-level replicate weights for calculation of standard errors; one of '\\\"person\\\"', '\\\"housing\\\"', or '\\\"both\\\"'.\",\n    required = FALSE\n  ),\n  recode = type_boolean(\n    \"If TRUE, recodes variable values using Census data dictionary and creates a new '*_label' column for each variable that is recoded. Available for 2017 - 2022 data. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  return_vacant = type_boolean(\n    \"If TRUE, makes a separate request to the Census API to retrieve microdata for vacant housing units, which are handled differently in the API as they do not have person-level characteristics. All person-level columns in the returned dataset will be populated with NA for vacant housing units. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  show_call = type_boolean(\n    \"If TRUE, display call made to Census API. This can be very useful in debugging and determining if error messages returned are due to tidycensus or the Census API. Copy to the API call into a browser and see what is returned by the API directly. Defaults to FALSE.\",\n    required = FALSE\n  ),\n  key = type_string(\n    \"Your Census API key.\"\n  )\n)\n```"

hadley · 2025-03-10T20:17:33Z

@mrembert I don't see an error there?

mrembert · 2025-03-10T20:34:34Z

Yes. I noticed that to. I still get the error when I run it in the console.

hadley · 2025-03-11T06:05:31Z

Can you try it with echo = TRUE?

mrembert · 2025-03-11T14:04:46Z

Sure. When I include echo, I now get the error in the reprex:

library(tidycensus)
library(ellmer)


chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat, echo = TRUE)
#> 
#> tool(
#>   get_pums,
#>   "Load data from the American Community Survey Public Use Microdata Series 
#> API",
#>   variables = type_array(
#>     "A vector of variables from the PUMS API.",
#>     required = FALSE
#>   ),
#>   state = type_string(
#>     "A state, or vector of states, for which you would like to request data. 
#> The entire US can be requested with 'state = \"all\"'.",
#>     required = FALSE
#>   ),
#>   puma = type_array(
#>     "A vector of PUMAs from a single state, for which you would like to request
#> data. To get data from PUMAs in more than one

#> Error in merge_func(left, right, path): is.list(left) is not TRUE

walkerke · 2025-03-25T16:45:13Z

I was able to get a reprex as well - I'm seeing it with Gemini Pro 2.0 Experimental when long outputs are requested:

library(ellmer)

chat <- chat_gemini(model = "gemini-2.0-pro-exp-02-05")

chat$chat("Write me a 10-paragraph essay on the history of the tidyverse.")
#> ## The Rise of the Tidyverse: A Revolution in R Data Wrangling
#> 
#> The R programming language, born in the 1990s, quickly gained traction among 
#> statisticians and data analysts for its powerful statistical capabilities and 
#> open-source nature. However, early R code could often be complex and 
#> inconsistent, particularly when it came to data manipulation. Different 
#> packages offered overlapping functionalities with varying syntax, leading to a 
#> steep learning curve and a lack of code readability.  This fragmented landscape
#> presented a significant barrier to efficient data analysis, particularly for 
#> those new to the language.
#> 
#> Enter Hadley Wickham, a statistician and R developer who recognized the need 
#> for a more cohesive and user-friendly approach to data wrangling.  Inspired by 
#> the principles of "tidy data," where each variable forms a column, each 
#> observation forms a row, and each type of observational unit forms a table, 
#> Wickham began developing a collection of packages designed to work together 
#> seamlessly.  These packages,
#> Error in merge_func(left, right, path): is.list(left) is not TRUE

^{Created on 2025-03-25 with reprex v2.1.0}

Oddly though it does not always fail, sometimes it ends up working.

hadley · 2025-03-25T19:54:50Z

@walkerke that's super helpful thanks! I just got back from vacation and I'm catching up meetings this week, but will hopefully fix next week.

walkerke · 2025-03-26T20:41:02Z

thanks @hadley!

hadley · 2025-03-27T18:58:34Z

Looks like the problem was with citations, which gemini some times provides. I should have a fix momentarily

Fixes #358

hadley added this to the 0.1.2 milestone Mar 12, 2025

hadley changed the title ~~create_tool_def Error with tidycensus::get_pums: "Error in merge_func(left, right, path) : is.list(left) is not TRUE"~~ Stream merging fails with gemini Mar 13, 2025

hadley added a commit that referenced this issue Mar 27, 2025

Correctly implement citation merging for gemini

5d09ba3

Fixes #358

hadley linked a pull request Mar 27, 2025 that will close this issue

Correctly implement citation merging for gemini #391

Open

hadley added the bug an unexpected problem or unintended behavior label Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream merging fails with gemini #358

Stream merging fails with gemini #358

mrembert commented Mar 7, 2025 •

edited by hadley

Loading

hadley commented Mar 8, 2025

walkerke commented Mar 10, 2025

hadley commented Mar 10, 2025

mrembert commented Mar 10, 2025 •

edited by hadley

Loading

hadley commented Mar 10, 2025

mrembert commented Mar 10, 2025 •

edited

Loading

hadley commented Mar 11, 2025

mrembert commented Mar 11, 2025 •

edited by hadley

Loading

walkerke commented Mar 25, 2025 •

edited

Loading

hadley commented Mar 25, 2025

walkerke commented Mar 26, 2025

hadley commented Mar 27, 2025 •

edited

Loading

Stream merging fails with gemini #358

Stream merging fails with gemini #358

Comments

mrembert commented Mar 7, 2025 • edited by hadley Loading

hadley commented Mar 8, 2025

walkerke commented Mar 10, 2025

hadley commented Mar 10, 2025

mrembert commented Mar 10, 2025 • edited by hadley Loading

hadley commented Mar 10, 2025

mrembert commented Mar 10, 2025 • edited Loading

hadley commented Mar 11, 2025

mrembert commented Mar 11, 2025 • edited by hadley Loading

walkerke commented Mar 25, 2025 • edited Loading

hadley commented Mar 25, 2025

walkerke commented Mar 26, 2025

hadley commented Mar 27, 2025 • edited Loading

mrembert commented Mar 7, 2025 •

edited by hadley

Loading

mrembert commented Mar 10, 2025 •

edited by hadley

Loading

mrembert commented Mar 10, 2025 •

edited

Loading

mrembert commented Mar 11, 2025 •

edited by hadley

Loading

walkerke commented Mar 25, 2025 •

edited

Loading

hadley commented Mar 27, 2025 •

edited

Loading