Skip to content

Implementation of MicroWaveDeviceInventory v2.0.1.f_impl

Latest

Choose a tag to compare

@PrathibaJee PrathibaJee released this 20 Jan 13:12
ce9a5ed
S.No Issue Number Status
1 #1501 Fixed
2 #1464 Fixed
3 #1350 Fixed
4 #1349 Fixed
5 #1348 Fixed
6 #1347 Fixed
7 #1082 Fixed
8 #1065 Fixed
9 #1057 Fixed
10 #901 Fixed
11 #1504 Fixed
12 #1503 Fixed
13 #1501 Fixed
14 #1500 Fixed
15 #1498 Fixed
16 #1494 Fixed
17 #1377 Fixed
18 #1505 Fixed
19 #1048 Fixed
20 #1507 Fixed
21 #1494 Fixed
22 #923 Fixed

Fix for the performance problem :

Reference : #1505 (comment)
Node.js executes JavaScript on a single thread. So, our REST server and the 4 high-CPU tasks share the same event loop.
When those CPU-intensive tasks run, they block the event loop, preventing it from quickly handling incoming HTTP requests.
As a result, API responses slow down because the event loop is busy processing the continuous cyclic operations instead of processing REST requests.
So , we made all the 4 high-CPU tasks run in a separate worker thread(means allocated with its own heap space , event loop). Bussiness logic for this application remains the same where as each background task run as a separate thread inside a single application.

MWDI – Current State, Challenges, and v2.0.1 Enhancements

Current Production State (MWDI v1.2.0)

  • MWDI v1.2.0 is currently running in production.
  • Configured with a sliding window size of 700.
  • The application consists of:
    • REST Interface (asynchronous, event‑driven)
    • Background Sliding Window Process (continuous)

Performance Snapshot

  • Cache updates complete in around 3 hours (approximately 38K devices processed).
  • When REST traffic increases, overall performance degrades.
  • Notification Processing Disabled
    • v1.2.0 and earlier versions could not handle notification load.
    • Therefore, notification processing is disabled in production.

New Features Introduced in v2.0.1

Version 2.0.1 introduces multiple new background processes:

1. Kafka Consumer

  • Consumes messages from Kafka topics.
  • Continuous background process.

2. DeviceMetaDataList Update Process

  • Periodic background process.

3. Cache Quality Measurement

  • Periodic background task to evaluate cache health.

Total Processes in v2.0.1

  • REST Server
  • 2 Periodic High‑CPU Tasks
  • 2 Continuous High‑CPU Tasks

Total: 5 parallel processes


Root Cause Analysis (Node.js Limitation)

As referenced in:
#1505 (comment)

Key points:

  • Node.js executes JavaScript on a single thread.
  • All processes share the same event loop:
    • REST Server
    • 2 periodic high‑CPU tasks
    • 2 continuous high‑CPU background tasks

Impact

  • CPU‑intensive background loops block the event loop.
  • Incoming HTTP requests slow down significantly.
  • REST APIs become slow or unresponsive under load.

In traditional multithreaded environments (e.g., Java), such tasks would naturally run on separate threads, avoiding contention.


Solution Approach in v2.0.1.f – Worker Threads

To overcome Node.js event loop limitations:

  • Worker Threads introduced for all background tasks.
  • Each background process receives:
    • Its own execution thread
    • Its own heap memory
    • No contention with REST APIs

Expected Outcome

  • REST APIs remain responsive.
  • Background processing runs independently.
  • Overall throughput and stability improve.

This solution must be validated in pre‑production with ~40K devices to confirm real‑world performance gains.


Key Challenges Faced During Development

1. Notification Processing Was Never Tested in Production

  • Disabled from day one due to performance issues.
  • Notification processing logic was untested.
  • The real bottleneck existed inside the application’s notification processing loop.
  • Required a complete rewrite (currently under testing).

2. Large Effort Estimation Gap

  • Kafka integration + total redesign of notification processing.
  • Initial estimates did not account for this complexity.

Testing Constraints

  • Development and test environments initially lacked notification simulation.
  • Multiple test builds were released for partial functionality testing in pre‑prod:
    • test_alarm_fix_1_v2.0.1
    • test_slidingW_analysis_1_v2.0.1
    • test_slidingwindow_analysis_2_v2.0.1
    • test_slidingwindow_analysis_3_v2.0.1
  • Pre‑production environment could not be disturbed.
  • Testing limited to:
    • Master Controller‑3
    • Up to 17k devices

Summary

  • v1.2.0 limitations stem from single‑threaded execution and disabled notification handling.
  • v2.0.1 introduces multiple high‑CPU background processes, revealing Node.js scalability limits.
  • v2.0.1.f addresses these challenges using Worker Threads, properly isolating workloads.
  • The solution is architecturally sound but needs large‑scale pre‑production validation.
  • Significant development effort was required due to:
    • Missing load simulation environments
    • Necessary redesign of core processing logic
    • MWDI being a mega service (not a microservice) - long‑term fixes require breaking it into smaller, isolated applications to eliminate scalability bottlenecks.