Skip to content

Commit

Permalink
Mvn2sbt (#232)
Browse files Browse the repository at this point in the history
* allow config driver overriding

* kick off updated build

* update for release

* update versions

* update versions

* Update sparkler and new build versions

* update for 0.4.4

* update log4j levels

* update version

* update version

* update to next snapshot

* Add version sniffing (#40)

* Interpreter Interoperability (#41)

* Update FetcherChrome with interpreter interoperability

* Update htmlCrawl

* Fixed build chain

* update tika detection

* update version

* allow for non selenium or magnesium execution

* make naming a bit more intelligent

* fix id

* fix id

* fix id

* remove plugin examples

* update version

* fix crawl hookup

* fix crawl hookup

* update version tag

* update path creation

* update path creation

* add conf overload from file

* fix mimetype lookup

* update version

* Update FetcherChrome.java

* Update version.sbt

* Update PluginDependencies.scala

* Update version.sbt

* Update DatabricksAPI.java

* Update version.sbt

* Update Ms integration

* Update version.sbt

* Version bump for Ms upgrade

* Version bump for critical Ms patch

* update SeleniumScripter to use maven central repo dependancy path (#57)

* remove old ci

* revert Docker changes

* Ms version bump to 0.2.0

* add new generic process, snapshots and catch missing mimetypes

* update version

* update version

* fix bug

* init checkpoints

* init checkpoints

* add dns support to chrome

* remove proxy from config

* update config

* update config

* update other config

* update version

* Extended logging support for slog4j

* Extended logging support for slog4j

* Extended logging support for slog4j

* migrate fetcher default to apache httpclient and add proxy support

* update version

* clean up

* update version

* fix status

* fix content type lookup

* trigger build

* update version

* fix ssl lookup

* add gitpod stuff

* revert

* update version

* Update version.sbt

* update scripter to 1.7.9

* update version

* fix sbt install

* Update version.sbt

* Update PluginDependencies.scala

* fix critical parsing bug

* fix critical parsing bug

* various bug fixes£

* update version

* update version

* add more checkpoints£

* put stuff in the right place

* more checkpoints

* more checkpoints

* more checkpoints

* more checkpoints

* more checkpoints

* more checkpoints

* Update version.sbt

* update json implementation for fetcher chrome

* Update version.sbt

* Update .gitpod.Dockerfile

* Update .gitpod.yml

* Update .gitpod.yml

* Update .gitpod.Dockerfile

* add proxy code

* add proxy code

* add proxy code

* add proxy code

* add proxy code

* add more logging

* remove prune for now

* update version

* try and work out removal issue

* change log level

* fix logger

* fix logger

* fix logger

* fix logger

* log title

* log title

* This is the fix for log level option to make sparkler work without providing this argument.

* Fix loggable issue

* stick snapshot version

* Fix prod error

* Fix prod error

* Detecting the breaking point

* Detecting the breaking point

* Detecting the breaking point

* Detecting the breaking point

* Update README.md

triggering git workflow

* Detecting the breaking point

* Detecting the breaking point

* Fix "ClassCastException" exception in definable debug levels

* Fix "ClassCastException" exception in definable debug levels

* use logback-classic Logger instead of slf4j logger

* add jobid file support to crawl and injector

* loop until no records left

* Update version.sbt

* update restlet repo because of ssl cert expiry

* Update version.sbt

* remove banana

* remove banana

* remove restlet

* remove restlet

* fix build

* update version

* add release workflow

* update version

* update samehost filter to allow subdomains

* update version

* add idf to crawler

* add idf to crawler

* fix samehost config catch

* fix samehost config catch

* fix samehost config catch

* updates to resolve npe

* update version

* update version

* add null check

* finish basic merge

Co-authored-by: Dmitri McGuckin <[email protected]>
Co-authored-by: dmitri-mcguckin <[email protected]>
Co-authored-by: Pankaj Raturi <[email protected]>
Co-authored-by: pankaj-tripat <[email protected]>
  • Loading branch information
5 people authored Jan 18, 2022
1 parent 2416058 commit 867f811
Show file tree
Hide file tree
Showing 139 changed files with 8,094 additions and 4,847 deletions.
92 changes: 0 additions & 92 deletions .github/workflows/build.yaml

This file was deleted.

99 changes: 99 additions & 0 deletions .github/workflows/dev-deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
name: dev-deploy

on:
push:
branches: [ master ]

env:
PKG_NAME: sparkler-app
PKG_VERSION: N/A
PKG_PATH: sparkler-core
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DATABRICKS_HOST: https://dbc-abaef56e-ca8a.cloud.databricks.com
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_DEV_TOKEN }}
DATABRICKS_RELEASE_PATH: dbfs:/FileStore/release

jobs:
standalone:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up JDK 8
uses: actions/setup-java@v2
with:
java-version: '8'
distribution: adopt

- name: Install Databricks CLI
run: pip install databricks-cli

- name: Create the release folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}

- name: Set package version
run: echo "PKG_VERSION=$(grep version version.sbt | cut -d'"' -f2)" >> $GITHUB_ENV
working-directory: ${{ env.PKG_PATH }}

- name: Build "Standalone" package
run: sbt package assembly -Dsparkprovided=false -Dmaven.javadoc.skip=true
working-directory: ${{ env.PKG_PATH }}

- name: Remove library jars
run: rm -r build/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}
working-directory: ${{ env.PKG_PATH }}

- name: Zip the Sparkler build
run: zip -r ${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.zip *
working-directory: ${{ env.PKG_PATH }}/build

- name: Deploy "Standalone" to Databricks
run: databricks fs cp ${{ env.SRC_ZIP }} ${{ env.DEST_ZIP }}
working-directory: ${{ env.PKG_PATH }}/build
env:
SRC_ZIP: ${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.zip
DEST_ZIP: ${{ env.DATABRICKS_RELEASE_PATH }}/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}-$GITHUB_SHA.zip

submit:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up JDK 8
uses: actions/setup-java@v2
with:
java-version: '8'
distribution: adopt

- name: Install Databricks CLI
run: pip install databricks-cli

- name: Create the release folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}

- name: Set package version
run: echo "PKG_VERSION=$(grep version version.sbt | cut -d'"' -f2)" >> $GITHUB_ENV
working-directory: ${{ env.PKG_PATH }}

- name: Create the plugins folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}/plugins/plugins-${{ env.PKG_VERSION }}

- name: Build "Submit" package
run: sbt clean package assembly -Dsparkprovided=true -Dmaven.javadoc.skip=true
working-directory: ${{ env.PKG_PATH }}

- name: Deploy "Submit" to Databricks
run: databricks fs cp ${{ env.SRC_JAR }} ${{ env.DEST_JAR }}
working-directory: ${{ env.PKG_PATH }}
env:
SRC_JAR: build/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.jar
DEST_JAR: ${{ env.DATABRICKS_RELEASE_PATH }}/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}-$GITHUB_SHA.jar

- name: Deploy plugins to Databricks
run: databricks fs cp --overwrite --recursive ${{ env.SRC_JARS }} ${{ env.DEST_DIR }}
working-directory: ${{ env.PKG_PATH }}
env:
SRC_JARS: build/plugins/
DEST_DIR: ${{ env.DATABRICKS_RELEASE_PATH }}/plugins/plugins-${{ env.PKG_VERSION }}/
100 changes: 100 additions & 0 deletions .github/workflows/release-deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
name: release-deploy

on:
release:
types:
- created

env:
PKG_NAME: sparkler-app
PKG_VERSION: N/A
PKG_PATH: sparkler-core
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DATABRICKS_HOST: https://dbc-6b70bbcd-c212.cloud.databricks.com
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TEST_TOKEN }}
DATABRICKS_RELEASE_PATH: dbfs:/FileStore/release

jobs:
standalone:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up JDK 8
uses: actions/setup-java@v2
with:
java-version: '8'
distribution: adopt

- name: Install Databricks CLI
run: pip install databricks-cli

- name: Create the release folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}

- name: Set package version
run: echo "PKG_VERSION=$(grep version version.sbt | cut -d'"' -f2)" >> $GITHUB_ENV
working-directory: ${{ env.PKG_PATH }}

- name: Build "Standalone" package
run: sbt package assembly -Dsparkprovided=false -Dmaven.javadoc.skip=true
working-directory: ${{ env.PKG_PATH }}

- name: Remove library jars
run: rm -r build/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}
working-directory: ${{ env.PKG_PATH }}

- name: Zip the Sparkler build
run: zip -r ${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.zip *
working-directory: ${{ env.PKG_PATH }}/build

- name: Deploy "Standalone" to Databricks
run: databricks fs cp ${{ env.SRC_ZIP }} ${{ env.DEST_ZIP }}
working-directory: ${{ env.PKG_PATH }}/build
env:
SRC_ZIP: ${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.zip
DEST_ZIP: ${{ env.DATABRICKS_RELEASE_PATH }}/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}-$GITHUB_SHA.zip

submit:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up JDK 8
uses: actions/setup-java@v2
with:
java-version: '8'
distribution: adopt

- name: Install Databricks CLI
run: pip install databricks-cli

- name: Create the release folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}

- name: Set package version
run: echo "PKG_VERSION=$(grep version version.sbt | cut -d'"' -f2)" >> $GITHUB_ENV
working-directory: ${{ env.PKG_PATH }}

- name: Create the plugins folder
run: databricks fs mkdirs ${{ env.DATABRICKS_RELEASE_PATH }}/plugins/plugins-${{ env.PKG_VERSION }}

- name: Build "Submit" package
run: sbt clean package assembly -Dsparkprovided=true -Dmaven.javadoc.skip=true
working-directory: ${{ env.PKG_PATH }}

- name: Deploy "Submit" to Databricks
run: databricks fs cp ${{ env.SRC_JAR }} ${{ env.DEST_JAR }}
working-directory: ${{ env.PKG_PATH }}
env:
SRC_JAR: build/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}.jar
DEST_JAR: ${{ env.DATABRICKS_RELEASE_PATH }}/${{ env.PKG_NAME }}-${{ env.PKG_VERSION }}-$GITHUB_SHA.jar

- name: Deploy plugins to Databricks
run: databricks fs cp --overwrite --recursive ${{ env.SRC_JARS }} ${{ env.DEST_DIR }}
working-directory: ${{ env.PKG_PATH }}
env:
SRC_JARS: build/plugins/
DEST_DIR: ${{ env.DATABRICKS_RELEASE_PATH }}/plugins/plugins-${{ env.PKG_VERSION }}/
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# Custom
.bloop
**/scalastyle-config
**/.scalafmt.conf
**/.metals
**/.bsp

# Standard Java
*.class
*.log
*.jar
Expand Down Expand Up @@ -50,7 +58,7 @@ sjob-**
# MAC Files
.DS_Store

Application Files
# Application Files
/resources
felix-cache/

Expand Down
61 changes: 0 additions & 61 deletions .gitlab-ci.yml

This file was deleted.

13 changes: 13 additions & 0 deletions .gitpod.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM registry.gitlab.com/spiculedata/custom-gitpod-full:latest

RUN sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/2.0.4/2.13-2.0.4) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm'

RUN brew install scala

RUN brew install coursier/formulas/coursier sbt scalaenv

RUN sudo env "PATH=$PATH" coursier bootstrap org.scalameta:scalafmt-cli_2.12:2.4.2 -r sonatype:snapshots -o /usr/local/bin/scalafmt --standalone --main org.scalafmt.cli.Cli

RUN scalaenv install scala-2.12.11 && scalaenv global scala-2.12.11

RUN brew install expect
Loading

0 comments on commit 867f811

Please sign in to comment.