| S.No | Issue Number | Status |
|---|---|---|
| 1 | #1501 | Fixed |
| 2 | #1464 | Fixed |
| 3 | #1350 | Fixed |
| 4 | #1349 | Fixed |
| 5 | #1348 | Fixed |
| 6 | #1347 | Fixed |
| 7 | #1082 | Fixed |
| 8 | #1065 | Fixed |
| 9 | #1057 | Fixed |
| 10 | #901 | Fixed |
| 11 | #1504 | Fixed |
| 12 | #1503 | Fixed |
| 13 | #1501 | Fixed |
| 14 | #1500 | Fixed |
| 15 | #1498 | Fixed |
| 16 | #1494 | Fixed |
| 17 | #1377 | Fixed |
| 18 | #1505 | Fixed |
| 19 | #1048 | Fixed |
| 20 | #1507 | Fixed |
| 21 | #1494 | Fixed |
| 22 | #923 | Fixed |
Fix for the performance problem :
Reference : #1505 (comment)
Node.js executes JavaScript on a single thread. So, our REST server and the 4 high-CPU tasks share the same event loop.
When those CPU-intensive tasks run, they block the event loop, preventing it from quickly handling incoming HTTP requests.
As a result, API responses slow down because the event loop is busy processing the continuous cyclic operations instead of processing REST requests.
So , we made all the 4 high-CPU tasks run in a separate worker thread(means allocated with its own heap space , event loop). Bussiness logic for this application remains the same where as each background task run as a separate thread inside a single application.
MWDI – Current State, Challenges, and v2.0.1 Enhancements
Current Production State (MWDI v1.2.0)
- MWDI v1.2.0 is currently running in production.
- Configured with a sliding window size of 700.
- The application consists of:
- REST Interface (asynchronous, event‑driven)
- Background Sliding Window Process (continuous)
Performance Snapshot
- Cache updates complete in around 3 hours (approximately 38K devices processed).
- When REST traffic increases, overall performance degrades.
- Notification Processing Disabled
- v1.2.0 and earlier versions could not handle notification load.
- Therefore, notification processing is disabled in production.
New Features Introduced in v2.0.1
Version 2.0.1 introduces multiple new background processes:
1. Kafka Consumer
- Consumes messages from Kafka topics.
- Continuous background process.
2. DeviceMetaDataList Update Process
- Periodic background process.
3. Cache Quality Measurement
- Periodic background task to evaluate cache health.
Total Processes in v2.0.1
- REST Server
- 2 Periodic High‑CPU Tasks
- 2 Continuous High‑CPU Tasks
Total: 5 parallel processes
Root Cause Analysis (Node.js Limitation)
As referenced in:
#1505 (comment)
Key points:
- Node.js executes JavaScript on a single thread.
- All processes share the same event loop:
- REST Server
- 2 periodic high‑CPU tasks
- 2 continuous high‑CPU background tasks
Impact
- CPU‑intensive background loops block the event loop.
- Incoming HTTP requests slow down significantly.
- REST APIs become slow or unresponsive under load.
In traditional multithreaded environments (e.g., Java), such tasks would naturally run on separate threads, avoiding contention.
Solution Approach in v2.0.1.f – Worker Threads
To overcome Node.js event loop limitations:
- Worker Threads introduced for all background tasks.
- Each background process receives:
- Its own execution thread
- Its own heap memory
- No contention with REST APIs
Expected Outcome
- REST APIs remain responsive.
- Background processing runs independently.
- Overall throughput and stability improve.
This solution must be validated in pre‑production with ~40K devices to confirm real‑world performance gains.
Key Challenges Faced During Development
1. Notification Processing Was Never Tested in Production
- Disabled from day one due to performance issues.
- Notification processing logic was untested.
- The real bottleneck existed inside the application’s notification processing loop.
- Required a complete rewrite (currently under testing).
2. Large Effort Estimation Gap
- Kafka integration + total redesign of notification processing.
- Initial estimates did not account for this complexity.
Testing Constraints
- Development and test environments initially lacked notification simulation.
- Multiple test builds were released for partial functionality testing in pre‑prod:
test_alarm_fix_1_v2.0.1test_slidingW_analysis_1_v2.0.1test_slidingwindow_analysis_2_v2.0.1test_slidingwindow_analysis_3_v2.0.1
- Pre‑production environment could not be disturbed.
- Testing limited to:
- Master Controller‑3
- Up to 17k devices
Summary
- v1.2.0 limitations stem from single‑threaded execution and disabled notification handling.
- v2.0.1 introduces multiple high‑CPU background processes, revealing Node.js scalability limits.
- v2.0.1.f addresses these challenges using Worker Threads, properly isolating workloads.
- The solution is architecturally sound but needs large‑scale pre‑production validation.
- Significant development effort was required due to:
- Missing load simulation environments
- Necessary redesign of core processing logic
- MWDI being a mega service (not a microservice) - long‑term fixes require breaking it into smaller, isolated applications to eliminate scalability bottlenecks.