Introduction

TODO: Give a short introduction of your project. Let this section explain the objectives or the motivation behind this project.

Getting Started

TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:

Installation process
Software dependencies
Latest releases
API references

Build and Test

TODO: Describe and show how to build your code and run the tests.

Documentation

sphinx-apidoc -f -o docs src/ananta -l -d 5 -M make html

Output Docs
Use docs/index.html

Contribute

VScode Tasks

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "build wheel",
            "type": "shell",
            "command": "python -m build"
        },
        {
            "label": "build docs",
            "type": "shell",
            "command": "sphinx-apidoc -f -o docs src/ananta -l -d 5 -M"
        }
    ]
}

Local installation for development

Install Java

version = jre1.8.0_341

Install Spark

version = spark-3.2.1-bin-hadoop3.2-scala2.13.tgz - Download link

Install python

version = 3.8.13

Get Winutils for Hadoop on windows

https://github.com/cdarlint/winutils

Setup PATH local / system on windows

Setup following User Variables under Windows> Path as follow:

HADOOP_HOME
- winutils location
- 3.2.1
LOCAL_DEV
- TRUE
PYSPARK_PYTHON
- python location
- If you are using conda/env use its python path
PYTHONPATH
- python location
- If you are using conda/env use its python path
SPARK_HOME
- location of spark downloaded and extracted
SPARK_LOCAL_DIRS (optional)
SPARK_VERSION
- 3.2.1

Setup environments variable as follow:

%SPARK_HOME%\bin
%SPARK_HOME%\python
%SPARK_HOME%\python\lib\py4j-0.10.9.5-src.zip
%HADOOP_HOME%\bin

Setup local Spark with Delta Live table

pyspark --packages io.delta:delta-core_2.12:1.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

Fix Bugs

Pyspark delta core to scala 2.13

configure_spark_with_delta_pip(config).getOrCreate()
Go to configure_spark_with_delta_pip which is pip_utils.py
scroll down to row 77
change 12 => 13

Ananta

Ananta - Data Onboarder 0.0.1 documentation

Ananta - Data Onboarder

Ananta - Data Onboarder 0.0.1 documentation

Welcome to Ananta’s documentation

Indices and tables

Built with Sphinx using a theme provided by Read the Docs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.azure_devops		.azure_devops
.vscode		.vscode
src/ananta		src/ananta
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.pypirc		.pypirc
LICENSE		LICENSE
README.md		README.md
Wiki.md		Wiki.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Getting Started

Build and Test

Documentation

Contribute

VScode Tasks

Local installation for development

Install Java

Install Spark

Install python

Get Winutils for Hadoop on windows

Setup PATH local / system on windows

Setup local Spark with Delta Live table

Fix Bugs

Pyspark delta core to scala 2.13

Ananta

Welcome to Ananta’s documentation

Indices and tables

etl-toolkit

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

azure-code-repo/etl-toolkit

Folders and files

Latest commit

History

Repository files navigation

Introduction

Getting Started

Build and Test

Documentation

Contribute

VScode Tasks

Local installation for development

Install Java

Install Spark

Install python

Get Winutils for Hadoop on windows

Setup PATH local / system on windows

Setup local Spark with Delta Live table

Fix Bugs

Pyspark delta core to scala 2.13

Ananta

Welcome to Ananta’s documentation

Indices and tables

etl-toolkit

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages