Skip to content

Commit 3b2dca7

Browse files
authored
334 handle spark session creation when already exists (#335)
* Enhance demo notebook and SparkModel.js for improved Spark session management - Added a new code cell in the demo notebook to initialize a Spark session with detailed configuration settings, including application ID and Spark UI link. - Updated SparkModel.js to check for existing Spark application IDs before storing new session information, improving error handling and preventing duplicate entries. - Enhanced logging for better visibility into Spark session creation and management processes. * Update Dockerfile to install npm dependencies with legacy peer dependencies flag - Modified the npm install command to include the --legacy-peer-deps flag, ensuring compatibility with older peer dependencies during the build process of the React application. * Update Dockerfile to use Node 18 and enhance build process - Upgraded Node.js version from 14 to 18 for improved performance and compatibility. - Cleared npm cache before installing dependencies to ensure a clean environment. - Added installation of @jridgewell/gen-mapping to support additional functionality. - Increased memory allocation for the build process by setting NODE_OPTIONS to 4096 MB. * Update Dockerfile to use Node 14 and streamline build process - Downgraded Node.js version from 18 to 14 for compatibility. - Simplified npm installation by removing cache cleaning and legacy peer dependencies flag. - Removed increased memory allocation for the build process, optimizing the Dockerfile for a more straightforward build. * Update Dockerfile to use Node 18 and optimize build process - Upgraded Node.js version from 14 to 18 for improved performance. - Implemented a clean install of npm dependencies with legacy peer dependencies support. - Added specific package installations for @jridgewell/gen-mapping and @babel/generator. - Increased memory allocation for the build process by setting NODE_OPTIONS to 4096 MB. * Refactor Dockerfile for improved npm dependency management and build process - Updated npm installation commands to set legacy peer dependencies and install packages in a specific order. - Cleaned npm cache and rebuilt before running the build command to ensure a fresh environment. - Increased clarity and efficiency in the Dockerfile setup for the web application. * Enhance Spark session management and update demo notebook - Updated demo notebook to reflect successful Spark session execution, including updated execution metadata and application ID. - Refactored Spark session creation in backend to streamline the process, removing unnecessary parameters and improving error handling. - Modified SparkModel.js to ensure proper session initialization and validation of Spark application IDs. - Improved logging for better visibility during Spark session creation and management processes. * Refactor demo notebook and SparkModel.js for improved Spark session handling - Removed outdated code cells from the demo notebook to enhance clarity and usability. - Updated SparkModel.js to improve validation of Spark application IDs, ensuring they start with 'app-' and are correctly extracted from the HTML. - Simplified the logic for storing Spark session information in the Notebook component, enhancing overall session management. * Refactor Notebook.js and SparkModel.js for improved Spark app ID handling - Updated Notebook.js to extract and store Spark app ID more efficiently, ensuring it is only stored if valid. - Enhanced logging to provide clearer visibility of the extracted Spark app ID. - Added a console log in SparkModel.js to confirm successful extraction of the Spark app ID, improving debugging capabilities. * Refactor Notebook.js to streamline Spark app ID logging - Removed redundant console log for Spark app ID and retained a single log statement for clarity. - Enhanced error handling in the Notebook component to ensure better debugging during cell execution. * Implement Spark app status endpoint and enhance logging in SparkModel.js - Added a new endpoint to retrieve the status of a Spark application by its ID in spark_app.py, improving the API's functionality. - Enhanced logging in SparkModel.js to provide better visibility during the storage process of Spark application information, including status checks and error handling. - Improved validation for Spark application IDs to ensure only valid IDs are processed, contributing to more robust error management. * Refactor Spark app status retrieval and enhance error handling - Moved the Spark app status retrieval logic from the route handler in spark_app.py to a static method in SparkApp class for better separation of concerns. - Improved error handling and logging in the new get_spark_app_status method, ensuring clearer responses for application not found and internal errors. - Simplified the route handler to directly return the response from the SparkApp method, enhancing code readability and maintainability. * Enhance notebook path handling in getSparkApps method - Safely handle notebook paths by simplifying them when they match the pattern work/user@domain/notebook.ipynb. - Improved clarity by logging the simplified notebook path for better debugging and visibility. * Refactor Spark app route and simplify notebook path handling - Removed unused JWT and user identification decorators from the Spark app route in spark_app.py for cleaner code. - Simplified the notebook path handling in getSparkApps method of NotebookModel.js by removing unnecessary path simplification logic, allowing direct usage of the provided notebook path. - Enhanced code readability and maintainability by streamlining the logic in both files. * Add create_spark_app endpoint and enhance error handling in SparkApp service * Enhance create_spark_app endpoint with user authentication and error handling - Added JWT authentication and user identification decorators to the create_spark_app route in spark_app.py to ensure only authenticated users can create Spark applications. - Implemented user context validation in the SparkApp service, returning a 401 response if the user is not found. - Added a database rollback mechanism on error during Spark app creation to maintain data integrity. * Enhance Spark session management and update demo notebook - Updated demo notebook to include successful Spark session execution details, including execution metadata and application ID. - Removed the create_spark_session endpoint from spark_app.py to streamline session management. - Refactored SparkApp service by removing the create_spark_session method, as session creation is now handled directly in the notebook. - Improved SparkModel.js to ensure proper validation and storage of Spark application IDs, including enhanced logging for better visibility during the process.
1 parent 02e3486 commit 3b2dca7

File tree

6 files changed

+192
-131
lines changed

6 files changed

+192
-131
lines changed

examples/[email protected]/demo.ipynb

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,39 @@
1010
"- This is just a demo notebook\n",
1111
"- For testing only"
1212
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"isExecuted": true,
17+
"lastExecutionResult": "success",
18+
"lastExecutionTime": "2024-12-10 10:26:03",
19+
"metadata": {},
20+
"outputs": [
21+
{
22+
"data": {
23+
"text/html": [
24+
"\n",
25+
" <div style=\"border: 1px solid #e8e8e8; padding: 10px;\">\n",
26+
" <h3>Spark Session Information</h3>\n",
27+
" <p><strong>Config:</strong> {'spark.driver.memory': '1g', 'spark.driver.cores': 1, 'spark.executor.memory': '1g', 'spark.executor.cores': 1, 'spark.executor.instances': 1, 'spark.dynamicAllocation.enabled': False}</p>\n",
28+
" <p><strong>Application ID:</strong> app-20241210080310-0003</p>\n",
29+
" <p><strong>Spark UI:</strong> <a href=\"http://localhost:18080/history/app-20241210080310-0003\">http://localhost:18080/history/app-20241210080310-0003</a></p>\n",
30+
" </div>\n",
31+
" "
32+
],
33+
"text/plain": [
34+
"Custom Spark Session (App ID: app-20241210080310-0003) - UI: http://0edb0a63b2fb:4040"
35+
]
36+
},
37+
"execution_count": 11,
38+
"metadata": {},
39+
"output_type": "execute_result"
40+
}
41+
],
42+
"source": [
43+
"spark = create_spark(\"work/[email protected]/demo.ipynb\")\n",
44+
"spark"
45+
]
1346
}
1447
],
1548
"metadata": {

server/app/routes/spark_app.py

Lines changed: 11 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,6 @@
88

99
logging.basicConfig(level=logging.INFO)
1010

11-
@spark_app_blueprint.route('/spark_app/<path:spark_app_id>', methods=['POST'])
12-
def create_spark_app(spark_app_id):
13-
data = request.get_json()
14-
notebook_path = data.get('notebookPath', None)
15-
return SparkApp.create_spark_app(spark_app_id=spark_app_id, notebook_path=notebook_path)
16-
17-
# @jwt_required()
18-
# @identify_user
1911
@spark_app_blueprint.route('/spark_app/<path:notbook_path>/config', methods=['GET'])
2012
def get_spark_app_config(notbook_path):
2113
logging.info(f"Getting spark app config for notebook path: {notbook_path}")
@@ -27,20 +19,16 @@ def update_spark_app_config(notbook_path):
2719
data = request.get_json()
2820
return SparkApp.update_spark_app_config_by_notebook_path(notbook_path, data)
2921

30-
@spark_app_blueprint.route('/spark_app/session', methods=['POST'])
31-
def create_spark_session():
22+
@spark_app_blueprint.route('/spark_app/<spark_app_id>/status', methods=['GET'])
23+
def get_spark_app_status(spark_app_id):
24+
logging.info(f"Getting spark app status for app id: {spark_app_id}")
25+
return SparkApp.get_spark_app_status(spark_app_id)
26+
27+
@spark_app_blueprint.route('/spark_app/<spark_app_id>', methods=['POST'])
28+
@jwt_required()
29+
@identify_user
30+
def create_spark_app(spark_app_id):
31+
logging.info(f"Creating spark app with id: {spark_app_id}")
3232
data = request.get_json()
3333
notebook_path = data.get('notebookPath')
34-
spark_config = data.get('config')
35-
36-
try:
37-
spark_app_id = SparkApp.create_spark_session(notebook_path, spark_config)
38-
return jsonify({
39-
'status': 'success',
40-
'sparkAppId': spark_app_id
41-
})
42-
except Exception as e:
43-
return jsonify({
44-
'status': 'error',
45-
'message': str(e)
46-
}), 500
34+
return SparkApp.create_spark_app(spark_app_id=spark_app_id, notebook_path=notebook_path)

server/app/services/spark_app.py

Lines changed: 62 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from app.models.spark_app import SparkAppModel
22
from app.models.notebook import NotebookModel
33
from app.models.spark_app_config import SparkAppConfigModel
4-
from flask import Response
4+
from flask import g, Response
55
from datetime import datetime
66
import json
77
from database import db
@@ -143,45 +143,66 @@ def update_spark_app_config_by_notebook_path(notebook_path: str = None, data: di
143143
status=200)
144144

145145
@staticmethod
146-
def create_spark_app(spark_app_id: str = None, notebook_path: str = None):
147-
logger.info(f"Creating spark app with id: {spark_app_id} for notebook path: {notebook_path}")
148-
149-
if spark_app_id is None:
150-
logger.error("Spark app id is None")
151-
return Response(
152-
response=json.dumps({'message': 'Spark app id is None'}),
153-
status=404)
154-
155-
if notebook_path is None:
156-
logger.error("Notebook path is None")
157-
return Response(
158-
response=json.dumps({'message': 'Notebook path is None'}),
159-
status=404)
160-
146+
def get_spark_app_status(spark_app_id: str):
147+
logger.info(f"Getting spark app status for app id: {spark_app_id}")
161148
try:
162-
# Get the notebook id
163-
notebook = NotebookModel.query.filter_by(path=notebook_path).first()
164-
notebook_id = notebook.id
165-
166-
# Create the spark app
167-
spark_app = SparkAppModel(
168-
spark_app_id=spark_app_id,
169-
notebook_id=notebook_id,
170-
user_id=notebook.user_id,
171-
created_at=datetime.now().strftime("%Y-%m-%d %H:%M:%S")
172-
)
173-
174-
db.session.add(spark_app)
175-
db.session.commit()
176-
177-
logger.info(f"Spark app created: {spark_app}")
149+
spark_app = SparkAppModel.query.filter_by(spark_app_id=spark_app_id).first()
150+
if spark_app is None:
151+
logger.error("Spark application not found")
152+
return Response(
153+
response=json.dumps({'message': 'Spark application not found'}),
154+
status=404
155+
)
156+
return Response(
157+
response=json.dumps({'status': spark_app.status}),
158+
status=200
159+
)
178160
except Exception as e:
179-
logger.error(f"Error creating spark app: {e}")
180-
return Response(
181-
response=json.dumps({'message': 'Error creating spark app: ' + str(e)}),
182-
status=404)
183-
184-
return Response(
185-
response=json.dumps(spark_app.to_dict()),
186-
status=200
187-
)
161+
logger.error(f"Error getting spark app status: {e}")
162+
return Response(
163+
response=json.dumps({'message': str(e)}),
164+
status=500
165+
)
166+
167+
@staticmethod
168+
def create_spark_app(spark_app_id: str, notebook_path: str):
169+
logger.info(f"Creating spark app with id: {spark_app_id} for notebook: {notebook_path}")
170+
try:
171+
if not g.user:
172+
logger.error("User not found in context")
173+
return Response(
174+
response=json.dumps({'message': 'User not authenticated'}),
175+
status=401
176+
)
177+
178+
# Get the notebook
179+
notebook = NotebookModel.query.filter_by(path=notebook_path).first()
180+
if notebook is None:
181+
logger.error("Notebook not found")
182+
return Response(
183+
response=json.dumps({'message': 'Notebook not found'}),
184+
status=404
185+
)
186+
187+
# Create new spark app
188+
spark_app = SparkAppModel(
189+
spark_app_id=spark_app_id,
190+
notebook_id=notebook.id,
191+
user_id=g.user.id,
192+
created_at=datetime.now()
193+
)
194+
195+
db.session.add(spark_app)
196+
db.session.commit()
197+
198+
return Response(
199+
response=json.dumps(spark_app.to_dict()),
200+
status=200
201+
)
202+
except Exception as e:
203+
logger.error(f"Error creating spark app: {e}")
204+
db.session.rollback() # Add rollback on error
205+
return Response(
206+
response=json.dumps({'message': str(e)}),
207+
status=500
208+
)

webapp/Dockerfile

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,26 @@
11
# Stage 1: Build the React application
2-
FROM node:14 as build
2+
FROM node:18 as build
33
WORKDIR /app
4+
5+
# Copy package files
46
COPY package*.json ./
5-
RUN npm install
7+
8+
# Clean and setup npm
9+
RUN npm cache clean --force && \
10+
npm set legacy-peer-deps=true
11+
12+
# Install dependencies in a specific order
13+
RUN npm install && \
14+
npm install @jridgewell/[email protected] && \
15+
npm install @babel/[email protected] && \
16+
npm install @babel/[email protected]
17+
18+
# Copy the rest of the application
619
COPY . .
7-
RUN npm run build
20+
21+
# Build with increased memory limit
22+
ENV NODE_OPTIONS="--max-old-space-size=4096"
23+
RUN npm rebuild && npm run build
824

925
# Stage 2: Serve the app with nginx
1026
FROM nginx:stable-alpine

webapp/src/components/notebook/Notebook.js

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -257,11 +257,13 @@ function Notebook({
257257

258258
// Check if contains a spark app id
259259
if (result[0] && result[0].data && result[0].data['text/html'] && SparkModel.isSparkInfo(result[0].data['text/html'])) {
260-
setSparkAppId(SparkModel.extractSparkAppId(result[0].data['text/html']));
261-
SparkModel.storeSparkInfo(SparkModel.extractSparkAppId(result[0].data['text/html']), notebook.path)
260+
const appId = SparkModel.extractSparkAppId(result[0].data['text/html']);
261+
setSparkAppId(appId);
262+
if (appId) {
263+
SparkModel.storeSparkInfo(appId, notebook.path);
264+
}
265+
console.log('Spark app id:', appId);
262266
}
263-
console.log('Spark app id:', sparkAppId);
264-
265267
} catch (error) {
266268
console.error('Failed to execute cell:', error);
267269
}
@@ -288,7 +290,7 @@ function Notebook({
288290
const handleCreateSparkSession = async () => {
289291
console.log('Create Spark session clicked');
290292
try {
291-
const { sparkAppId, initializationCode } = await SparkModel.createSparkSession(notebookState.path);
293+
const { initializationCode } = await SparkModel.createSparkSession(notebookState.path);
292294

293295
// Create a new cell with the initialization code
294296
const newCell = {
@@ -306,12 +308,10 @@ function Notebook({
306308
content: { ...notebookState.content, cells }
307309
});
308310

309-
// Execute the cell (now need to use the last index)
311+
// Execute the cell
310312
const newCellIndex = cells.length - 1;
311313
await handleRunCodeCell(newCell, CellStatus.IDLE, (status) => setCellStatus(newCellIndex, status));
312314

313-
console.log('Spark session created with ID:', sparkAppId);
314-
setSparkAppId(sparkAppId);
315315
} catch (error) {
316316
console.error('Failed to create Spark session:', error);
317317
alert('Failed to create Spark session. Please check the configuration.');

0 commit comments

Comments
 (0)