apachespark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Jul 28, 2024
Python

funkyminds / cleanframes

Star

type-class based data cleansing library for Apache Spark SQL

scala spark bigdata shapeless sparksql sparkscala apachespark

Updated Jun 23, 2019
Scala

josephmachado / docker_for_data_engineers

Star

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

docker docker-compose pyspark pyspark-notebook apachespark

Updated Apr 29, 2024
C

propelledanalytics / SparkSQL.jl

Star

SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.

spark julia-language julialang apachespark

Updated Jan 29, 2024
Julia

tspannhw / FLiPStackWeekly

Star

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

streaming cloudera apachespark apachekafka timspann apachenifi lakehouse apacheflink apacheiceberg

Updated Nov 12, 2024

aravinthsci / Spark_Delta_Lake

Star

Delta Lake Examples

spark datalake apachespark delta-lake deltalake

Updated Apr 24, 2020
Jupyter Notebook

SmartDataAnalytics / MA-INF-4223-DBDA-Lab

Star

Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn

machine-learning university rdf semantics bigdata teaching bonn apachespark sansa

Updated Aug 11, 2022
Jupyter Notebook

SandeepAswathnarayana / professional-certificate-programs

Star

This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Science].

Updated Oct 13, 2020
Jupyter Notebook

datumbrain / gossub

Star

Trigger spark-submit in Golang. A Go implementation of famous SparkLauncher.java.

go golang spark apachespark

Updated Oct 31, 2020
Go

sfrechette / spark-jdbc-mssql

Star

Connect to SQL Server using Apache Spark

scala sql-server spark apache-spark jdbc-driver sqlserver apachespark

Updated Sep 10, 2016
Scala

CarolinaNicasio / APACHESPARK-PYSPARK-2023

Star

PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.

python data-science spark apache python3 pyspark dataframe rdd apachespark github-actions

Updated Jun 27, 2023

lensesio / lenses-jdbc-spark

Star

Apache Spark with Kafka via JDBC !!!

kafka jdbc-driver apachespark

Updated Jun 1, 2018
Java

ashkrit / sparkmicroservices

Star

Microservices for Spark application

microservice apachespark

Updated Jul 16, 2023
Java

funkyminds / cleanframes-examples

Star

Examples usages for cleanframes library

scala spark bigdata shapeless sparksql apachespark

Updated Jun 15, 2019
Scala

sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark

Star

Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.

emr aws scala big-data bigdata s3 dataframes databricks link-prediction big-data-analytics linkprediction apachespark awsemr

Updated Dec 10, 2019
Scala

AbdelmajidLh / spark-functionality-repo

Star

Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.

scala spark apache python3 pyspark databricks databricks-notebooks apachespark

Updated Apr 8, 2024

divithraju / divith-raju-Immigration-Data-Engineering

Star

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

sql bigdata pandas dataset datapipeline datalake dataprocessing dataengineering capstone-project apachespark datacleaning bigdataproject datamodeling datawherehouse dataschema bigdataprocessing

Updated Dec 25, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the apachespark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apachespark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apachespark

Here are 45 public repositories matching this topic...

DataExpert-io / data-engineer-handbook

apache / hudi

holdenk / sparkProjectTemplate.g8

martandsingh / ApacheSpark

funkyminds / cleanframes

josephmachado / docker_for_data_engineers

propelledanalytics / SparkSQL.jl

tspannhw / FLiPStackWeekly

aravinthsci / Spark_Delta_Lake

SmartDataAnalytics / MA-INF-4223-DBDA-Lab

SandeepAswathnarayana / professional-certificate-programs

datumbrain / gossub

sfrechette / spark-jdbc-mssql

CarolinaNicasio / APACHESPARK-PYSPARK-2023

lensesio / lenses-jdbc-spark

ashkrit / sparkmicroservices

funkyminds / cleanframes-examples

sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark

AbdelmajidLh / spark-functionality-repo

divithraju / divith-raju-Immigration-Data-Engineering

Improve this page

Add this topic to your repo