Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run soda-core locally against a (local) delta table #2207

Open
sherl0ck- opened this issue Feb 2, 2025 · 1 comment
Open

Run soda-core locally against a (local) delta table #2207

sherl0ck- opened this issue Feb 2, 2025 · 1 comment

Comments

@sherl0ck-
Copy link

sherl0ck- commented Feb 2, 2025

Hi,

I'm trying to run checks against a locally defined delta table.

from soda.scan import Scan
df = spark.read.format("delta").load("employees_delta")
df.createOrReplaceTempView("employees_delta")
# I can confirm that the table indeed exists and can be accessed e.g. using spark.sql()

scan = Scan()
scan.set_data_source_name("my_spark")
scan.add_spark_session(spark, data_source_name="my_spark")
scan.add_configuration_yaml_file(file_path="/Users/jjovan/data_eng_spark/work/soda.yml")
scan.add_sodacl_yaml_file("/Users/jjovan/data_eng_spark/work/checks.yml")
# Note: I tried different orderings here, to no avail

scan.execute()

This fails with this exception

Query execution error in 40.my_spark.employees_delta.aggregation[0]: 'NoneType' object has no attribute 'sql'
SELECT 
  COUNT(*) 
FROM employees_delta
  | 'NoneType' object has no attribute 'sql'
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/Users/jjovan/anaconda3/lib/python3.10/site-packages/soda/execution/query/query.py", line 145, in _execute_cursor
  |     cursor.execute(self.sql)
  |   File "/Users/jjovan/anaconda3/lib/python3.10/site-packages/soda/data_sources/spark_df_cursor.py", line 16, in execute
  |     self.df = self.spark_session.sql(sqlQuery=sql)
  | AttributeError: 'NoneType' object has no attribute 'sql'

[00:16:26] Metrics 'row_count' were not computed for check 'row_count > 0'

My soda.yml configuration:

data_source my_spark:
  type: spark_df

checks.yml

checks for employees_delta:
  - row_count > 0

Versions as seen in pip list | grep soda

soda-core                              3.4.4
soda-core-spark                        3.4.4
soda-core-spark-df                     3.4.4
@tools-soda
Copy link

CLOUD-9157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants