All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Updating all
APIARY_EXTENSIONSmodules to8.1.15(was8.1.13). Improved glue listener.
- Updating all
APIARY_EXTENSIONSmodules to8.1.13(was8.1.12). Improved glue listener.
- Updating all
APIARY_EXTENSIONSmodules to8.1.12(was8.1.11). Improved glue listener.
- Updating all
APIARY_EXTENSIONSmodules to8.1.11(was8.1.7). Improved glue listener. - Fix Maven download repository and upgrade Maven to
3.9.11(was3.9.4).
- Updating all
APIARY_EXTENSIONSmodules to8.1.7(was8.1.4). Updated kafka-clients to latest version.
- Updating all
APIARY_EXTENSIONSmodules to8.1.4(was8.1.3). Improved RENAME support in Glue listener.
- Updating all
APIARY_EXTENSIONSmodules to8.1.3(was8.1.2). Added RENAME support in Glue listener.
- Updating
APIARY_EXTENSIONS_VERSIONto8.1.2(was8.1.0). Fixes in apiary extensions multiple listener functionality.
- Updating
APIARY_EXTENSIONS_VERSIONto8.1.0(was8.0.2). Supports MSK cluster.
- Option to start gluesync listener without intializing glue databases.
- Upgrade Apiary extensions to 8.0.2 (was 7.3.9). (Glue Listener fix)
- Upgrade yum repos from EMR-5.36.2 (latest EMR 5 version)
- Upgrade HMS to 2.3.9 (was 2.3.7)
- Added
datanucleus.connectionPoolingTypeto hive-site.xml, defaults:BoneCP - Added
DATANUCLEUS_CONNECTION_POOLING_TYPEto support changing the database connection pooling. Valid options areBoneCP,DBCP,DBCP2,C3P0,HikariCP. - Added
DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE- Maximum pool size for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE- Minimum pool size for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE- Initial pool size for the connection pool (C3P0 only). - Added
DATANUCLEUS_CONNECTION_POOL_MAX_IDLE- Maximum idle connections for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_IDLE- Minimum idle connections for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE- Maximum active connections for the connection pool (DBCP/DBCP2 only). - Added
DATANUCLEUS_CONNECTION_POOL_MAX_WAIT- Maximum wait time for the connection pool (DBCP/DBCP2 only). - Added
DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT- Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD- Leak detection threshold for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME- Maximum lifetime for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT- Auto commit for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT- Idle timeout for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT- Connection wait timeout for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_READ_ONLY- Read only mode for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_NAME- Connection pool name (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_CATALOG- Connection pool catalog (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS- Register MBeans for the connection pool (HikariCP only).
- Added
MYSQL_DRIVER_JARto add the driver connector JAR to the system classpath. By default it is now using/usr/share/java/mysql-connector-java.jar.
- Switch from mariadb driver to default mysql driver. (Override settings to keep using mariadb driver).
- Added
MYSQL_CONNECTION_DRIVER_NAMEto support use different connection driver, defaults:com.mysql.jdbc.Driver. - Added
MYSQL_TYPEto support use different type of MySQL, defaults:mysql. - Added
mysql-connector-javato support to use drivercom.mysql.jdbc.Driver.
- Upgraded
APIARY_EXTENSIONS_VERSIONto7.3.9(was7.3.8). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.9(was7.3.8).
- Enables JMX (Java Management Extensions) on Hadoop clients, allowing for remote monitoring and management of JVM-related metrics
- CloudWatch metrics in favour of JMX Prometheus Exporter.
- Enable prometheus jmx agent when running on ECS by exporting
EXPORTER_OPTS
- Added snapshot.yaml for pushing docker image from feature branch.
- Safeguard AWS account id call to prevent incorrect DB locations.
- Upgrade Maven version from
3.9.3to3.9.4as the older version no longer supported.(https://dlcdn.apache.org/maven/maven-3/)
- issue-118 Added variable
ENABLE_HIVE_LOCK_HOUSE_KEEPERto support hive lock house keeper. See more details here: apache/iceberg#2301
- Added variable
MAX_REQUEST_SIZEto optionally increase the request size when sending records to Kafka. - Upgraded
APIARY_EXTENSIONS_VERSIONto7.3.8(was7.3.7). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.8(was7.3.7).
- Added variable
KAFKA_COMPRESSION_TYPEto optionally add compression type when sending Metastore events to Kafka through apiary-metastore-listener library. - Upgraded
APIARY_EXTENSIONS_VERSIONto7.3.7(was7.3.4). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.7(was7.3.6).
- Added variable
LIMIT_PARTITION_REQUEST_NUMBERto protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries.
- Upgraded github actions ubuntu runner to
22.04(was18.04). - Set
amazonlinuxversion to2(waslatest). - Upgraded mvn version to
3.9.3(was3.6.3).
- Variable
MYSQL_SECRET_USERNAME_KEYfor pulling aws credentials where the key is set to something other thanusername. Defaults tousername.
- Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.6(was7.3.5). It fixes a bug in sortOrders when syncing up Iceberg tables.
- Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.5(was7.3.4). It fixes a bug in parsing the table parameter -lastAccessTimewhen syncing up Iceberg tables.
- Upgraded
APIARY_EXTENSIONS_VERSIONto7.3.4(was6.0.1). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSIONto7.3.4(was7.3.0).
- LDAP Credentials now can be loaded directly using
LDAP_USERNAMEandLDAP_PASSWORD, this is useful to load them from Vault.
- Upgrade
apiary-gluesync-listenerversion to7.3.0(was4.2.0).
- Add ability to configure size of HMS MySQL connection pool, and configure stats computation on table/partition creation.
- Upgrade EMR repository to version
5.31.0(was5.30.2) soAWS SDK for Javalibrary is upgraded to1.11.852that enables AWS web identity token file file authentication using hadoop and public constructors.
- Enable authentication via
WebIdentityTokenCredentialsProvider.
- Upgrade EMR repository to version
5.30.2(was5.24.0) soAWS SDK for Javalibrary is upgraded to1.11.759and in that way support authentication using IAM role via an OIDC web identity token file (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html).
- Modified log4j2 security script to reduce container startup time.
- Added script to find and remove vulnerable log4j2 classes in order to mitigate security issue CVE-2021-44228.
- Allow override of
hive.metastore.disallow.incompatible.col.type.changes=trueproperty.
- Remove Atlas MetaStore listener in favor of internal processes that subscribe to the Kafka HMS event listener and push changes to Ranger.
Note: This release is a BREAKING change that removes all support for the Apache Atlas HMS listener.
- Enabled ranger audit log summarization.
- Add
allow-grant.shto main container. - Add
db-iam-user.shto main container.
- Removed
initContainerin favor of a single image.
- Issue-165 Add init container dockerfile for supporting air-gapped environments.
Create Hive database apiary_system on startup. Data for Ranger access logs goes to bucket <prefix>-apiary-system in Parquet format.
This is pre-work to prepare for Ranger access-log Hive tables in a future version of Apiary.
- Enable caller to set min and max size of the Hive metastore thread pool. If not set, defaults to 200/1000 (Hive defaults).
- If S3 access logs are enabled in
apiary-data-lake, create Hive databases3_logs_hiveon startup. Raw logs go to bucket<prefix>-s3-logsand Hive Parquet data to bucket<prefix>-s3-logs-hive. This is pre-work to prepare for S3 access-log Hive tables in a future version of Apiary.
- Updated
apiary-metastore-listenerversion to6.0.1(was6.0.0).
- If S3 Inventory is enabled in
apiary-data-lake, create Hives3_inventorydatabase on startup. - Add script
/s3_inventory_repair.shwhich can be used as the entrypoint of this Docker image to create and repair S3 inventory tables in the inventory database (if S3 inventory is enabled). The intent is to run the image this way on a scheduled basis in Kubernetes after AWS creates new inventory partition files in S3 each day.
- Updated
apiary-metastore-listenerandkafka-metastore-listenerversions to6.0.0(was5.0.2).
- Enable Prometheus exporter when running on Kubernetes instead of sending metrics to CloudWatch.
- Added an optional Apiary metastore listener which can be used to send Hive metadata events to a Kafka topic.
- Updated
apiary-metastore-listenerversion to5.0.2(was4.2.0).
- Set EKS hostname to ECS_TASK_ID required for enabling metastore metrics.
- Update using https for maven central repository as it no longer supports insecure communication over plain HTTP.
- Fix Ranger Solr auditing by upgrading
apiary-extensionsversion to5.0.1(was5.0.0)
- Atlas cluster name is set to Apiary
ATLAS_CLUSTER_NAMEenv variable when using Atlas plugin. If not set, will default toINSTANCE_NAMEvar.
- Update Ranger version from to
2.0.0(was1.1.0). - Update Ranger metastore plugin to
5.0.0(was4.2.0). - Support Ranger audit-only mode for read-only HMS endpoint when audit destination is SOLR.
- Add Atlas hive-bridge metastore listener, to send metadata events to Kafka.
- set DefaultAWSCredentialsProviderChain as default hadoop-aws credential provider.
- Updated
emr-apps.repoto5.24.0(was5.15.0). - Updated
emr-platform.repoto1.17.0(was1.6.0).
- Upgrade Hive to
2.3.4(was2.3.3) in order to fix https://issues.apache.org/jira/browse/HIVE-18767 - see #59 (Hive version is controlled by the version ofemr-apps.repo).
- If Ranger is configured on the metastore, the read-only instance of
the metastore will be configured for audit-only by using
ApiaryRangerAuthAllAccessPolicyProviderin apiary-metastore-ranger-plugin
- ReadOnlyAuth Pre Event Listener to manage Hive database whitelist in read-only metastores apiary-metastore-extensions.
- Support for
_inHIVE_DB_NAMESvariable. Fixes [#5] (ExpediaGroup/apiary#5).
- Updated apiary-metastore-listener to 4.0.0 (was 1.1.0).
- Updated apiary-gluesync-listener to 4.0.0 (was 1.1.0).
- Updated apiary-ranger-plugin to 4.0.0 (was 1.1.0).
- Updated apiary-metastore-metrics to 4.0.0 (was 1.1.0).
- Updated apiary-metastore-auth to 4.0.0 (was 1.1.0).
- Auto configure Hive metastore heapsize when running on ECS.
- Replace EMRFS with hadoop-aws S3A libraries.
- Option to send metastore metrics to CloudWatch - see #4.
- Refactor Environment variable names.
- Migrate secrets from Hashicorp Vault to AWS SecretsManager.
- Update startup script to configure Log4j, to fix sending Hive Metastore logs to CloudWatch.
- Deploy RangerAuth Pre Event Listener from apiary-metastore-extensions.
- Deploy GlueSync Listener from apiary-metastore-extensions.
- Deploy SNS Listener from apiary-metastore-extensions.
- Additional check to support external MySQL database for Hive Metastore, required to implement #48.
- Fix to update cacerts for Java.
- Fix Hive Metastore logging.