Skip to content

Patch/abhi tablename : Adding import query type property and modify tableName property for redshift and postgres plugin #607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions amazon-redshift-plugin/docs/Redshift-batchsource.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDI
The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query.
The '$CONDITIONS' string is not required if numSplits is set to one.

**Import Query Type** - Determines how data is extracted—either by using a Table Name or a custom Import Query.

**Table Name**: Extracts data directly from a specified database table.

**Bounding Query:** Bounding Query should return the min and max of the values of the 'splitBy' field.
For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ protected void setConnectorSpec(ConnectorSpecRequest request, DBConnectorPath pa
}
sourceProperties.put(RedshiftSource.RedshiftSourceConfig.IMPORT_QUERY,
getTableQuery(path.getDatabase(), schema, table));
sourceProperties.put(RedshiftSource.RedshiftSourceConfig.PROPERTY_IMPORT_QUERY_TYPE,
RedshiftSource.RedshiftSourceConfig.IMPORT_QUERY);
sourceProperties.put(Constants.Reference.REFERENCE_NAME, ReferenceNames.cleanseReferenceName(table));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import io.cdap.plugin.db.CommonSchemaReader;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
Expand Down Expand Up @@ -56,34 +55,12 @@ public RedshiftSchemaReader(String sessionID) {
public Schema getSchema(ResultSetMetaData metadata, int index) throws SQLException {
String typeName = metadata.getColumnTypeName(index);
int columnType = metadata.getColumnType(index);
int precision = metadata.getPrecision(index);
String columnName = metadata.getColumnName(index);
int scale = metadata.getScale(index);
boolean isSigned = metadata.isSigned(index);

if (STRING_MAPPED_REDSHIFT_TYPES_NAMES.contains(typeName)) {
return Schema.of(Schema.Type.STRING);
}
if (typeName.equalsIgnoreCase("INT")) {
return Schema.of(Schema.Type.INT);
}
if (typeName.equalsIgnoreCase("BIGINT")) {
return Schema.of(Schema.Type.LONG);
}

// If it is a numeric type without precision then use the Schema of String to avoid any precision loss
if (Types.NUMERIC == columnType) {
int precision = metadata.getPrecision(index);
if (precision == 0) {
LOG.warn(String.format("Field '%s' is a %s type without precision and scale, "
+ "converting into STRING type to avoid any precision loss.",
metadata.getColumnName(index),
metadata.getColumnTypeName(index)));
return Schema.of(Schema.Type.STRING);
}
}

if (typeName.equalsIgnoreCase("timestamp")) {
return Schema.of(Schema.LogicalType.DATETIME);
}

return super.getSchema(metadata, index);
return getSchema(typeName, columnType, precision, scale, columnName, isSigned, true);
}

@Override
Expand Down Expand Up @@ -114,4 +91,45 @@ public List<Schema.Field> getSchemaFields(ResultSet resultSet) throws SQLExcepti
return schemaFields;
}

/**
* Returns the CDAP {@link Schema} for a database column based on JDBC metadata.
* Handles Redshift-specific and common JDBC types:
* Maps Redshift string types to {@link Schema.Type#STRING}
* Maps "INT" to {@link Schema.Type#INT}
* Maps "BIGINT" to {@link Schema.Type#LONG}.
* Maps NUMERIC with zero precision to {@link Schema.Type#STRING} and logs a warning.
* Maps "timestamp" to {@link Schema.LogicalType#DATETIME}.
* Delegates to the parent plugin for all other types.
* @param typeName SQL type name (e.g. "INT", "BIGINT", "timestamp")
* @param columnType JDBC type code (see {@link java.sql.Types})
* @param precision column precision (for numeric types)
* @param scale column scale (for numeric types)
* @param columnName column name
* @param isSigned whether the column is signed
* @param handleAsDecimal whether to handle as decimal
* @return the mapped {@link Schema} type
*/
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is there in parent class, you can use that one, instead you can call getSchema method of this class inside

Copy link
Contributor

@vikasrathee-cs vikasrathee-cs Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no update on this, also javadoc not updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update javadoc

public Schema getSchema(String typeName, int columnType, int precision, int scale, String columnName,
boolean isSigned, boolean handleAsDecimal) {
if (STRING_MAPPED_REDSHIFT_TYPES_NAMES.contains(typeName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is getting repeated from public Schema getSchema(ResultSetMetaData metadata, int index) method, extract that common code, do the same in postgres

return Schema.of(Schema.Type.STRING);
}
if ("INT".equalsIgnoreCase(typeName)) {
return Schema.of(Schema.Type.INT);
}
if ("BIGINT".equalsIgnoreCase(typeName)) {
return Schema.of(Schema.Type.LONG);
}
if (Types.NUMERIC == columnType && precision == 0) {
LOG.warn(String.format("Field '%s' is a %s type without precision and scale," +
" converting into STRING type to avoid any precision loss.",
columnName, typeName));
return Schema.of(Schema.Type.STRING);
}
if ("timestamp".equalsIgnoreCase(typeName)) {
return Schema.of(Schema.LogicalType.DATETIME);
}
return super.getSchema(typeName, columnType, precision, scale, columnName, isSigned, handleAsDecimal);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@
package io.cdap.plugin.amazon.redshift;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Strings;
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Macro;
import io.cdap.cdap.api.annotation.Metadata;
import io.cdap.cdap.api.annotation.MetadataProperty;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.FailureCollector;
import io.cdap.cdap.etl.api.PipelineConfigurer;
import io.cdap.cdap.etl.api.StageConfigurer;
import io.cdap.cdap.etl.api.batch.BatchSource;
import io.cdap.cdap.etl.api.batch.BatchSourceContext;
import io.cdap.cdap.etl.api.connector.Connector;
Expand All @@ -34,12 +37,17 @@
import io.cdap.plugin.db.config.AbstractDBSpecificSourceConfig;
import io.cdap.plugin.db.source.AbstractDBSource;
import io.cdap.plugin.util.DBUtils;
import io.cdap.plugin.util.ImportQueryType;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;

import java.util.Collections;
import java.util.Map;
import javax.annotation.Nullable;

import static io.cdap.plugin.db.config.AbstractDBSpecificSourceConfig.IMPORT_QUERY;
import static io.cdap.plugin.db.config.AbstractDBSpecificSourceConfig.PROPERTY_IMPORT_QUERY_TYPE;
import static io.cdap.plugin.db.config.AbstractDBSpecificSourceConfig.TABLE_NAME;

/**
* Batch source to read from an Amazon Redshift database.
*/
Expand All @@ -59,6 +67,30 @@ public RedshiftSource(RedshiftSourceConfig redshiftSourceConfig) {
this.redshiftSourceConfig = redshiftSourceConfig;
}

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
FailureCollector collector = pipelineConfigurer.getStageConfigurer().getFailureCollector();
StageConfigurer stageConfigurer = pipelineConfigurer.getStageConfigurer();
if (sourceConfig.containsMacro(TABLE_NAME) || sourceConfig.containsMacro(IMPORT_QUERY)) {
if (sourceConfig.getSchema() != null) {
stageConfigurer.setOutputSchema(sourceConfig.getSchema());
}
return;
}
validateTableNameAndImportQuery(collector);
super.configurePipeline(pipelineConfigurer);
}

@Override
public void prepareRun(BatchSourceContext context) throws Exception {
FailureCollector collector = context.getFailureCollector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this common logic of configurePipeline and prepareRun to its parent class as a separate method and call the same from postgres also

if (sourceConfig.containsMacro(TABLE_NAME) || sourceConfig.containsMacro(IMPORT_QUERY)) {
return;
}
validateTableNameAndImportQuery(collector);
super.prepareRun(context);
}

@Override
protected SchemaReader getSchemaReader() {
return new RedshiftSchemaReader();
Expand Down
48 changes: 48 additions & 0 deletions amazon-redshift-plugin/widgets/Redshift-batchsource.json
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,30 @@
{
"label": "SQL Query",
"properties": [
{
"widget-type": "radio-group",
"label": "Import Query Type",
"name": "importQueryType",
"widget-attributes": {
"layout": "inline",
"default": "nativeQuery",
"options": [
{
"id": "nativeQuery",
"label": "Native Query"
},
{
"id": "namedTable",
"label": "Named Table"
}
]
}
},
{
"widget-type": "textbox",
"label": "Table Name",
"name": "tableName"
},
{
"widget-type": "textarea",
"label": "Import Query",
Expand Down Expand Up @@ -229,6 +253,30 @@
}
]
},
{
"name": "ImportQuery",
"condition": {
"expression": "importQueryType != 'tableName'"
},
"show": [
{
"type": "property",
"name": "importQuery"
}
]
},
{
"name": "NativeTableName",
"condition": {
"expression": "importQueryType == 'tableName'"
},
"show": [
{
"type": "property",
"name": "tableName"
}
]
}
],
"jump-config": {
"datasets": [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,23 @@ Feature: CloudMySql source- Verify CloudMySql source plugin design time validati
| connectionName |
| database |
| referenceName |
| importQuery |

@CloudMySql_Required
Scenario: To verify CloudSQLMySQL source plugin validation error message with blank import query
Given Open Datafusion Project to configure pipeline
When Expand Plugin group in the LHS plugins list: "Source"
When Select plugin: "CloudSQL MySQL" from the plugins list as: "Source"
Then Navigate to the properties page of plugin: "CloudSQL MySQL"
Then Select dropdown plugin property: "select-jdbcPluginName" with option value: "driverName"
Then Select radio button plugin property: "instanceType" with value: "public"
Then Replace input plugin property: "connectionName" with value: "connectionName" for Credentials and Authorization related fields
Then Replace input plugin property: "user" with value: "username" for Credentials and Authorization related fields
Then Replace input plugin property: "password" with value: "password" for Credentials and Authorization related fields
Then Enter input plugin property: "referenceName" with value: "sourceRef"
Then Replace input plugin property: "database" with value: "DatabaseName"
Then Click on the Validate button
Then Verify that the Plugin Property: "importQuery" is displaying an in-line error message: "errorMessageImportQuery"


@CloudMySql_Required
Scenario: To verify CloudSQLMySQL source plugin validation error message with invalid connection name with public instance
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ errorLogsMessageInvalidBoundingQuery=Spark program 'phase-1' failed with error:
errorMessageInvalidPassword=SQL error while getting query schema: Error: Access denied for user
errorMessagePrivateConnectionName=Enter the internal IP address of the Compute Engine VM cloudsql proxy is running on, to connect to a private
errorMessageWithBlankPassword=Exception while trying to validate schema of database table
errorMessageImportQuery=Import Query cannot be null. Please specify the Import Query.
24 changes: 24 additions & 0 deletions cloudsql-mysql-plugin/widgets/CloudSQLMySQL-batchsource.json
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,30 @@
{
"label": "CloudSQL Properties",
"properties": [
{
"widget-type": "hidden",
"label": "Import Query Type",
"name": "importQueryType",
"widget-attributes": {
"layout": "inline",
"default": "nativeQuery",
"options": [
{
"id": "nativeQuery",
"label": "Native Query"
},
{
"id": "namedTable",
"label": "Named Table"
}
]
}
},
{
"widget-type": "hidden",
"label": "Table Name",
"name": "tableName"
},
{
"widget-type": "textarea",
"label": "Import Query",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,30 @@
{
"label": "CloudSQL Properties",
"properties": [
{
"widget-type": "hidden",
"label": "Import Query Type",
"name": "importQueryType",
"widget-attributes": {
"layout": "inline",
"default": "nativeQuery",
"options": [
{
"id": "nativeQuery",
"label": "Native Query"
},
{
"id": "namedTable",
"label": "Named Table"
}
]
}
},
{
"widget-type": "hidden",
"label": "Table Name",
"name": "tableName"
},
{
"widget-type": "textarea",
"label": "Import Query",
Expand Down
Loading
Loading