-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream merging fails with gemini #358
Comments
Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls. |
I've seen the same error |
@walkerke to be clear, I don't think this is a tidycensus problem, this is some bug in an edge case of our stream merging code. But it's impossible to fix without a reprex 😞 |
Let me know if this covers what you need. library(tidycensus)
library(ellmer)
chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat)
#> [1] "```R\ntool(\n get_pums,\n \"Load data from the American Community Survey Public Use Microdata Series API\",\n variables = type_array(\n \"A vector of variables from the PUMS API. Use `View(pums_variables)` to browse variable options.\",\n items = type_string()\n ),\n state = type_string(\n \"A state, or vector of states, for which you would like to request data. The entire US can be requested with `state = \\\"all\\\"` - though be patient with the data download!\"\n ),\n puma = type_array( # TODO: Can be a vector or a named vector, need to express this better\n \"A vector of PUMAs from a single state, for which you would like to request data. To get data from PUMAs in more than one state, specify a named vector of state/PUMA pairs and set `state = \\\"multiple\\\"`.\",\n items = type_string()\n ),\n year = type_integer(\n \"The data year of the 1-year ACS sample or the endyear of the 5-year sample. Defaults to 2023.\",\n required = FALSE\n ),\n survey = type_string(\n \"The ACS survey; one of either '\\\"acs1\\\"' or '\\\"acs5\\\"' (the default). Defaults to `\\\"acs5\\\"`.\",\n required = FALSE\n ),\n variables_filter = type_object( # TODO: Need to express the named list format with particular types in the filter\n \"A named list of filters you'd like to return from the PUMS API. For example, passing `list(AGE = 25:50, SEX = 1)` will return only males aged 25 to 50 in your output dataset. Defaults to `NULL`, which returns all records. If a housing-only dataset is required, use `list(SPORDER = 1)` to only return householder records (taking care in your analysis to use the household weight `WGTP`).\",\n required = FALSE\n ),\n rep_weights = type_string( # TODO: Should be an enum of \"person\", \"housing\", or \"both\"\n \"Whether or not to return housing unit, person, or both housing and person-level replicate weights for calculation of standard errors; one of '\\\"person\\\"', '\\\"housing\\\"', or '\\\"both\\\"'.\",\n required = FALSE\n ),\n recode = type_boolean(\n \"If TRUE, recodes variable values using Census data dictionary and creates a new '*_label' column for each variable that is recoded. Available for 2017 - 2022 data. Defaults to FALSE.\",\n required = FALSE\n ),\n return_vacant = type_boolean(\n \"If TRUE, makes a separate request to the Census API to retrieve microdata for vacant housing units, which are handled differently in the API as they do not have person-level characteristics. All person-level columns in the returned dataset will be populated with NA for vacant housing units. Defaults to FALSE.\",\n required = FALSE\n ),\n show_call = type_boolean(\n \"If TRUE, display call made to Census API. This can be very useful in debugging and determining if error messages returned are due to tidycensus or the Census API. Copy to the API call into a browser and see what is returned by the API directly. Defaults to FALSE.\",\n required = FALSE\n ),\n key = type_string(\n \"Your Census API key.\"\n )\n)\n```" |
@mrembert I don't see an error there? |
Can you try it with |
Sure. When I include echo, I now get the error in the reprex: library(tidycensus)
library(ellmer)
chat <- chat_gemini("gemini-2.0-flash")
#> Using model = "gemini-2.0-flash".
create_tool_def(get_pums,chat, echo = TRUE)
#>
#> tool(
#> get_pums,
#> "Load data from the American Community Survey Public Use Microdata Series
#> API",
#> variables = type_array(
#> "A vector of variables from the PUMS API.",
#> required = FALSE
#> ),
#> state = type_string(
#> "A state, or vector of states, for which you would like to request data.
#> The entire US can be requested with 'state = \"all\"'.",
#> required = FALSE
#> ),
#> puma = type_array(
#> "A vector of PUMAs from a single state, for which you would like to request
#> data. To get data from PUMAs in more than one
#> Error in merge_func(left, right, path): is.list(left) is not TRUE |
I was able to get a reprex as well - I'm seeing it with Gemini Pro 2.0 Experimental when long outputs are requested: library(ellmer)
chat <- chat_gemini(model = "gemini-2.0-pro-exp-02-05")
chat$chat("Write me a 10-paragraph essay on the history of the tidyverse.")
#> ## The Rise of the Tidyverse: A Revolution in R Data Wrangling
#>
#> The R programming language, born in the 1990s, quickly gained traction among
#> statisticians and data analysts for its powerful statistical capabilities and
#> open-source nature. However, early R code could often be complex and
#> inconsistent, particularly when it came to data manipulation. Different
#> packages offered overlapping functionalities with varying syntax, leading to a
#> steep learning curve and a lack of code readability. This fragmented landscape
#> presented a significant barrier to efficient data analysis, particularly for
#> those new to the language.
#>
#> Enter Hadley Wickham, a statistician and R developer who recognized the need
#> for a more cohesive and user-friendly approach to data wrangling. Inspired by
#> the principles of "tidy data," where each variable forms a column, each
#> observation forms a row, and each type of observational unit forms a table,
#> Wickham began developing a collection of packages designed to work together
#> seamlessly. These packages,
#> Error in merge_func(left, right, path): is.list(left) is not TRUE Created on 2025-03-25 with reprex v2.1.0 Oddly though it does not always fail, sometimes it ends up working. |
@walkerke that's super helpful thanks! I just got back from vacation and I'm catching up meetings this week, but will hopefully fix next week. |
thanks @hadley! |
Looks like the problem was with citations, which gemini some times provides. I should have a fix momentarily |
I encountered an error when attempting to create a tool definition for the
get_pums
function from thetidycensus
package using thecreate_tool_def
function fromellmer
.Reproduction steps:
Load the
tidycensus
andellmer
libraries:Create a
chat_gemini
object:Attempt to create a tool definition for
get_pums
:Expected behavior:
I expected
create_tool_def
to successfully generate a tool definition for theget_pums
function, allowing it to be used within theellmer
framework.Actual behavior:
Instead of creating the tool definition, the code produced the following response/error:
Additional Information:
The text was updated successfully, but these errors were encountered: