Skip to content

Commit 1c26921

Browse files
Joshua CookJoshua Cook
authored andcommitted
begin documentation
1 parent d258918 commit 1c26921

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Databricks Project Template
2+
3+
This project template is designed to facilitate the development, testing, and deployment of Apache Spark Data Engineering pipelines across environments from local development using your preferred IDE to deployment on your Databricks cluster.
4+
5+
## Project Structure
6+
7+
This project has the following structure to a depth of 2.
8+
9+
```
10+
.
11+
├── Makefile
12+
├── README.md
13+
├── docker-compose.yml
14+
├── env
15+
│   └── docker
16+
├── example
17+
├── scripts
18+
│   └── development.py
19+
├── src
20+
│   ├── config.py
21+
│   ├── operations.py
22+
│   └── utility.py
23+
└── tests
24+
├── data
25+
└── spark
26+
```
27+
28+
- **`Makefile`** - defines common commands to be executed on the repo, including launching a local development server and running tests.
29+
- **`doc`** - contains documentation associated with this project
30+
- **`docker-compose.yml`** - defines the local development docker services
31+
- **`env/docker`** - contains the `Dockerfile` and `requirements.txt` used to define the Python environment for local development
32+
- **`example`** - contains a built-out example for how to use this project structure
33+
- **`scripts`** - contains python scripts used for exploration and development purposes (**TODO**) discuss how to use these with Databricks and with JupyterLab
34+
- **`src`** - contains source code
35+
- **`tests/data`** - contains fixture data used during testing
36+
- **`tests/spark`** - contains unit and integration tests
37+
38+
39+
## Development
40+
41+
### Launch Local Development Server
42+
43+
Local development is facilitated by Docker and Docker Compose and built as an extension to the `jupyter/pyspark-notebook` docker image.
44+
45+
To begin developing, start the development server using the following command:
46+
47+
```
48+
make launch-test-server
49+
```
50+
51+
This will launch a local single-node spark cluster. The password is `"local spark cluster"`.
52+
53+
This cluster can be interacted with using Jupyter Labs at [localhost:10000](localhost:10000).
54+
55+
The cluster is used for running local tests against the pyspark package being developed.
56+
57+
### Running Tests Locally
58+
59+
Run tests against the local package using the `make` commands below.
60+
61+
Run a single test file:
62+
63+
```
64+
make run-test testfile=<PATH_TO_TEST_FILE>
65+
```
66+
67+
Run all tests:
68+
69+
```
70+
make run-tests
71+
```

0 commit comments

Comments
 (0)