Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL temp function being recognized as update query, returning no result #2359

Open
AndhikaWB opened this issue Aug 4, 2024 · 1 comment
Open
Labels
feature Feature requests SQL engine

Comments

@AndhikaWB
Copy link

Some SQL engines such as BigQuery provide a way to define temporary user function using the CREATE TEMP FUNCTION syntax. It's similar as create temporary table with the WITH table_name AS syntax but for function

Here I'm replicating WIDTH_BUCKET function from PostgreSQL which isn't available natively in BigQuery:

{sql}

CREATE TEMP FUNCTION WIDTH_BUCKET(
  value NUMERIC,
  min_value NUMERIC,
  max_value NUMERIC,
  num_buckets NUMERIC
) AS (
  RANGE_BUCKET(
    value,
    GENERATE_ARRAY(
      min_value,
      max_value,
      (max_value - min_value) / num_buckets
    )
  )
);

SELECT
  x,
  WIDTH_BUCKET(x, 0, MAX(x) OVER(), 4) as rank_equ_range,
  NTILE(4) OVER(ORDER BY x) AS rank_equ_member
FROM
  UNNEST([1, 1, 2, 5, 6, 7, 8, 9, 48, 100]) AS x

However, it's being executed with dbExecute because it match the CREATE keyword here

grepl('^\\s*(INSERT|UPDATE|DELETE|CREATE|DROP|ALTER).*', query, ignore.case = TRUE)

DBI::dbExecute(conn, query)

Hence returning no result if I use SQL code chunk directly

Currently, I'm using this workaround to execute it directly in R

{r}

library(DBI)
library(bigrquery)

cred_file <- paste0(
  'docker/root/.config/gcloud/legacy_credentials/',
  Sys.getenv('USER_EMAIL'), '/adc.json'
)

Sys.setenv(GOOGLE_APPLICATION_CREDENTIALS = cred_file)
bq_auth()

con <- dbConnect(
  bigquery(), 
  dataset = 'sales_data', 
  project = 'mumu-431300', 
  use_legacy_sql = FALSE
)

query = 'CREATE TEMP FUNCTION WIDTH_BUCKET(
  value NUMERIC,
  -- min_value is inclusive
  min_value NUMERIC,
  -- max_value is exclusive
  max_value NUMERIC,
  num_buckets NUMERIC
) AS (
  RANGE_BUCKET(
    value,
    GENERATE_ARRAY(
      min_value,
      max_value,
      (max_value - min_value) / num_buckets
    )
  )
);

SELECT
  x,
  WIDTH_BUCKET(x, 0, MAX(x) OVER(), 4) as rank_equ_range,
  NTILE(4) OVER(ORDER BY x) AS rank_equ_member
FROM
  UNNEST([1, 1, 2, 5, 6, 7, 8, 9, 48, 100]) AS x'

dbGetQuery(con, query)

Which has an actual output instead of empty like dbExecute

# A tibble: 10 × 3
       x rank_equ_range rank_equ_member
   <int>          <int>           <int>
 1     1              1               1
 2     1              1               1
 3     2              1               1
 4     5              1               2
 5     6              1               2
 6     7              1               2
 7     8              1               3
 8     9              1               3
 9    48              2               4
10   100              5               4

The possible fix is to exclude the CREATE FUNCTION or CREATE TEMP FUNCTION keyword from the matcher, something like this (I'm bad at regex) CREATE[\s]+(TEMP[\s]+FUNCTION|FUNCTION) but using exclude rule

Or perhaps making a new cell option to force treat it as non-update query, which I'm not sure how to do it

@AndhikaWB
Copy link
Author

After a few hours of tinkering, I finally created a working hook to force show the result using this code:

``` {r}
#| cache: false

knitr::knit_hooks$set(force_result = function(before, options) {
  if (before & options$force_result) {
    knitr::opts_hooks$set(sql = options$code)
    knitr::opts_hooks$set(code = c(''))
  } else if (options$force_result) {
    sql <- paste(knitr::opts_hooks$get('sql'), collapse = '\n')
    result <- dbGetQuery(options$connection, sql)
    paste(knitr::kable(result), collapse = '\n')
  }
})
```

Then call my previous SQL query using the new cell option

``` {sql}
#| force_result: true

CREATE TEMP FUNCTION WIDTH_BUCKET(
  value NUMERIC,
  -- min_value is inclusive
  min_value NUMERIC,
  -- max_value is exclusive
  max_value NUMERIC,
  num_buckets NUMERIC
) AS (
  RANGE_BUCKET(
    value,
    GENERATE_ARRAY(
      min_value,
      max_value,
      (max_value - min_value) / num_buckets
    )
  )
);

SELECT
  x,
  WIDTH_BUCKET(x, 0, MAX(x) OVER(), 4) as rank_equ_range,
  NTILE(4) OVER(ORDER BY x) AS rank_equ_member
FROM
  UNNEST([1, 1, 2, 5, 6, 7, 8, 9, 48, 100]) AS x
```

And the output will now appear

But still, a native solution would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests SQL engine
Projects
None yet
Development

No branches or pull requests

2 participants