Skip to content

priyavrat7/Distributed-Task-Execution-Framework

Repository files navigation

Distributed Task Execution Framework

A distributed task execution system implemented in Java using Apache ZooKeeper for master-worker coordination, featuring event-driven task queue management, fault-tolerant worker load balancing, and multi-threaded middleware support for TCP/IP and RMI protocols.

πŸ“‹ Overview

This framework implements a distributed computing system where tasks are submitted by clients, coordinated by a master node, and executed by worker nodes. The system uses Apache ZooKeeper for distributed coordination, ensuring fault tolerance, service discovery, and efficient task distribution across multiple worker nodes.

✨ Features

  • Master-Worker Architecture: Automatic master election using ZooKeeper ephemeral znodes
  • Event-Driven Task Management: Real-time task assignment using ZooKeeper watchers
  • Fault Tolerance: Automatic worker failure detection and task reassignment
  • Load Balancing: Intelligent task distribution across idle workers
  • Multi-Protocol Support: Middleware server supporting both TCP/IP and RMI protocols
  • Service Discovery: Dynamic worker registration and discovery using ephemeral znodes
  • Automated Deployment: Shell scripts for ZooKeeper cluster setup and management

πŸ—οΈ Architecture

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client  │────────►│    Master    │────────►│  Worker  β”‚
β”‚          β”‚         β”‚  (ZooKeeper) β”‚         β”‚          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   ZooKeeper   β”‚
                    β”‚    Cluster    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ZNode Structure

The system uses the following ZooKeeper znode hierarchy:

/dist27/
β”œβ”€β”€ /master              # Master election node
β”œβ”€β”€ /tasks/              # Task submission queue
β”‚   β”œβ”€β”€ task-0000000001 # Sequential task nodes
β”‚   β”œβ”€β”€ task-0000000002
β”‚   └── ...
β”œβ”€β”€ /workers/            # Worker registration
β”‚   β”œβ”€β”€ worker-id-1      # Ephemeral worker nodes
β”‚   β”œβ”€β”€ worker-id-2
β”‚   └── ...
└── /idle/               # Idle worker tracking
    β”œβ”€β”€ worker-id-1      # Ephemeral idle nodes
    └── ...

Key Components

  1. Master Node:

    • First process to successfully create /master znode
    • Monitors /tasks for new task submissions
    • Tracks idle workers via /idle znodes
    • Assigns tasks to available workers
  2. Worker Nodes:

    • Register themselves under /workers with unique IDs
    • Create ephemeral /idle nodes when available
    • Watch for task assignments
    • Execute tasks and write results to /tasks/{task-id}/result
  3. Client:

    • Submits tasks by creating sequential nodes under /tasks
    • Retrieves results from /tasks/{task-id}/result

πŸ› οΈ Technology Stack

  • Language: Java (80.3%)
  • Coordination: Apache ZooKeeper 3.6.2
  • Protocols: TCP/IP, RMI (Java Remote Method Invocation)
  • Build: Shell scripts for compilation and deployment
  • Infrastructure: Multi-server ZooKeeper cluster

πŸš€ Getting Started

Prerequisites

  • Java Development Kit (JDK) 8 or higher
  • Apache ZooKeeper 3.6.2 or compatible version
  • Multiple servers for ZooKeeper cluster (minimum 3 for production)
  • Network connectivity between all nodes

Installation

  1. Clone the repository

    git clone https://github.com/priyavrat7/Distributed-Task-Execution-Framework.git
    cd Distributed-Task-Execution-Framework
  2. Set up ZooKeeper cluster

    Update available_servers.txt with your server addresses:

    server.1=hostname1:22299:22399
    server.2=hostname2:22299:22399
    server.3=hostname3:22299:22399
    
  3. Configure ZooKeeper

    Copy and customize the ZooKeeper configuration:

    cp zoo-base.cfg zoo.cfg
    # Edit zoo.cfg with your server addresses
  4. Initialize ZooKeeper data directories

    ./server-setup.sh

Running the System

  1. Start ZooKeeper cluster

    ./server-start.sh
  2. Compile the Java code

    ./compile.sh
  3. Start Master/Worker nodes

    # On each node, run the Java application
    # The first node to start becomes the master
    java -cp <classpath> MainClass
  4. Submit tasks via client

    ./client-create-nodes.sh
  5. Stop the system

    ./server-stop.sh

πŸ“– Usage

Master-Worker Coordination

The system automatically handles master election:

  • Each node attempts to create the /master znode
  • The first successful node becomes the master
  • Other nodes become workers
  • If master fails, workers can re-elect a new master

Task Submission

Tasks are submitted by creating sequential znodes under /tasks:

  • Each task contains serialized task data
  • Master monitors new tasks via watchers
  • Tasks are assigned to idle workers automatically

Worker Management

  • Workers register with unique IDs (format: PORT@LAB)
  • Idle workers create ephemeral nodes under /idle
  • Master assigns tasks by updating worker znode data
  • Worker deletion of idle node triggers task execution
  • Results are written to /tasks/{task-id}/result

πŸ”§ Configuration

ZooKeeper Configuration

Edit zoo-base.cfg or create zoo.cfg:

tickTime=2000
dataDir=/path/to/zookeeper/data
clientPort=2181
initLimit=5
syncLimit=2
server.1=hostname1:22299:22399
server.2=hostname2:22299:22399
server.3=hostname3:22299:22399

Server Configuration

Update available_servers.txt with your cluster servers:

server.1-lab2-25.cs.mcgill.ca:22299:22399
server.2-lab2-26.cs.mcgill.ca:22299:22399
server.3-lab2-29.cs.mcgill.ca:22299:22399

πŸ“ Project Structure

Distributed-Task-Execution-Framework/
β”œβ”€β”€ zk/                          # ZooKeeper related files
β”œβ”€β”€ compile.sh                   # Compilation script
β”œβ”€β”€ server-setup.sh              # ZooKeeper cluster setup
β”œβ”€β”€ server-start.sh              # Start ZooKeeper servers
β”œβ”€β”€ server-stop.sh               # Stop ZooKeeper servers
β”œβ”€β”€ server-availability.sh       # Check server availability
β”œβ”€β”€ client-create-nodes.sh       # Client task submission script
β”œβ”€β”€ available_servers.txt         # Server configuration
β”œβ”€β”€ zoo-base.cfg                 # Base ZooKeeper configuration
β”œβ”€β”€ zoo-base-source.cfg          # Source ZooKeeper configuration
β”œβ”€β”€ README.md                    # This file
└── [Java source files]          # Main implementation

πŸ” Fault Tolerance

  • Master Failure: Automatic re-election when master node fails
  • Worker Failure: Ephemeral znodes automatically removed on worker failure
  • Task Recovery: Master can reassign tasks from failed workers
  • Network Partitions: ZooKeeper handles split-brain scenarios

πŸ§ͺ Testing

  1. Start ZooKeeper cluster on multiple servers
  2. Launch multiple worker nodes across different machines
  3. Submit tasks using the client script
  4. Monitor task execution via ZooKeeper CLI or Java API
  5. Test fault tolerance by killing master or worker nodes

πŸ“ Key Algorithms

Master Task Assignment

// Pseudo-code
onTaskCreated() {
    taskQueue.add(newTask);
    assignTasksToIdleWorkers();
}

onIdleWorkerAvailable() {
    idleWorkerCache.update();
    assignTasksToIdleWorkers();
}

assignTasksToIdleWorkers() {
    while (!taskQueue.isEmpty() && !idleWorkerCache.isEmpty()) {
        task = taskQueue.poll();
        worker = idleWorkerCache.get();
        assignTask(worker, task);
        deleteIdleNode(worker);
    }
}

Worker Task Execution

// Pseudo-code
onIdleNodeDeleted() {
    taskId = getTaskIdFromWorkerNode();
    task = getTaskData(taskId);
    result = executeTask(task);
    writeResult(taskId, result);
    recreateIdleNode();
}

🀝 Contributing

This project was developed as part of COMP512 (Distributed Systems) coursework at McGill University.

Contributors:

  • Priyavrat Dev Sharma

πŸ“„ License

This project is part of academic coursework. Please refer to the license file for details.

πŸ™ Acknowledgments

  • McGill University, Department of Computer Science
  • Apache ZooKeeper project for excellent distributed coordination framework

πŸ“š Related Documentation

πŸ”— Links


Note: This framework is designed for educational and research purposes. For production use, ensure proper security configurations and error handling.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors