Skip to content
/ UDB Public
forked from russlank/UDB

Light-weigh database engine C++ library (b-tree indexing engine with dynamic heap-structured data records' storage implementation)

Notifications You must be signed in to change notification settings

digixoil/UDB

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UDB - Ultra Database Engine

A lightweight, high-performance B-Tree indexing engine with dynamic heap-structured record storage implementation.

πŸ“‹ Overview

UDB (Ultra Database) is a modernized version of a classic database indexing library, originally developed for DOS/Windows 3.1 and now updated for Visual Studio 2022/2025 and modern C++20.

This project serves as both a practical database library and an educational resource for understanding:

  • B-Tree and B+ Tree data structures
  • File-based indexing algorithms
  • Heap file storage management
  • Space reclamation techniques

Key Features

  • B-Tree Indexing: Efficient key-based access with O(log n) search, insert, and delete operations
  • Multiple Indexes: Support for multiple indexes per file with different key types
  • Heap File Storage: Dynamic variable-length record storage with space reclamation
  • Data Integrity: XOR checksums on all data structures for corruption detection
  • Multiple Key Types: Strings, integers, blocks, characters, and logical values
  • Sequential Access: Efficient cursor-based navigation (first, next, previous)
  • Thread Safety: Recursive mutex protection for concurrent access

πŸ§ͺ Build and Test Status

Component Status
Build βœ… Compiles successfully
Unit Tests βœ… 45/45 tests passing (100%)
File I/O βœ… All tests passing
B-Tree Navigation βœ… Working correctly
B-Tree Find/Insert/Delete βœ… All operations working correctly
Thread Safety βœ… Concurrent read/write tests passing
Persistence βœ… Data persists correctly across file reopen

Recent Fixes (v2.1.0)

The following issues have been resolved:

  1. Binary Search Bug: Fixed binarySearchNode() to correctly return position numItems+1 when the search key is greater than all existing keys in a node. Previously, it incorrectly returned the last position, causing keys to be inserted at wrong positions during node splits.

  2. UNIQUE Attribute Check: Fixed the UNIQUE constraint check in append() which incorrectly used bitwise OR (|) instead of AND (&), causing all indexes to behave as if they had the UNIQUE attribute.

  3. Node Split Logic: The node split and balance operations now correctly maintain the B-tree invariants after the binary search fix.

πŸš€ Quick Start

Building the Project

  1. Open UDBE.sln in Visual Studio 2022/2025
  2. Select your configuration (Debug/Release, x64)
  3. Restore NuGet packages (right-click solution β†’ "Restore NuGet Packages")
  4. Build the solution (F7 or Build β†’ Build Solution)

Running Tests

The project uses Google Test for unit testing:

# Build and run tests
msbuild UDBE.sln /p:Configuration=Debug /p:Platform=x64
.\bin\x64\Debug\UDBETests.exe

Expected output:

[==========] 45 tests from 2 test cases ran.
[  PASSED  ] 45 tests.

Basic Usage

#include "udb.h"
using namespace udb;

// Create a new index file with one index
MultiIndex index("mydata.ndx", 1);

// Initialize the index for string keys
index.setActiveIndex(1);
index.initIndex(
    KeyType::STRING,              // Key type
    50,                           // Key size (bytes)
    IndexAttribute::ALLOW_DELETE, // Attributes
    5,                            // Keys per node (B-Tree order)
    100,                          // Pre-allocate nodes
    100                           // Pre-allocate leaves
);

// Add some records
index.append("Alice", 1000);   // Key "Alice", data position 1000
index.append("Bob", 2000);
index.append("Charlie", 3000);

// Find a record
int64_t pos = index.find("Bob");
if (pos >= 0) {
    std::cout << "Found Bob at position " << pos << std::endl;
}

// Iterate through all records
char key[51];
pos = index.getFirst(key);
while (pos >= 0 && !index.isEOF()) {
    std::cout << "Key: " << key << " Position: " << pos << std::endl;
    pos = index.getNext(key);
}

πŸ“ Project Structure

UDB/
β”œβ”€β”€ UDBE/                    # Main library project
β”‚   β”œβ”€β”€ udb.h                # Main include file (include this in your projects)
β”‚   β”œβ”€β”€ udb_common.h         # Common definitions, types, and utilities
β”‚   β”œβ”€β”€ udb_file.h/cpp       # File I/O abstraction layer
β”‚   β”œβ”€β”€ udb_heap.h/cpp       # Heap file management (space allocation)
β”‚   β”œβ”€β”€ udb_btree.h/cpp      # B-Tree index implementation
β”‚   └── udb_sync.h           # Thread synchronization primitives
β”œβ”€β”€ UDBEApp/                 # Interactive test application
β”‚   └── dbe.cpp              # Command-line test interface
β”œβ”€β”€ UDBETests/               # Unit tests (Google Test)
β”‚   β”œβ”€β”€ test_btree.cpp       # B-Tree tests (33 tests)
β”‚   β”œβ”€β”€ test_file.cpp        # File I/O tests (12 tests)
β”‚   β”œβ”€β”€ test_heap.cpp        # Heap file tests
β”‚   └── test_main.cpp        # Test entry point
β”œβ”€β”€ doc/                     # Documentation
β”‚   β”œβ”€β”€ btree.md             # B-Tree algorithm explanation
β”‚   β”œβ”€β”€ heap-file.md         # Heap file structure explanation
β”‚   └── architecture.md      # Architecture overview
β”œβ”€β”€ UDBE.sln                 # Visual Studio solution
β”œβ”€β”€ README.md                # This file
└── TODO.md                  # Future development roadmap

Source File Descriptions

File Purpose
udb.h Main header - includes all other headers, provides version info
udb_common.h Error codes, key types, constants, utility functions, exceptions
udb_file.h/cpp Platform-independent file I/O with thread safety
udb_heap.h/cpp Heap file for variable-length record storage with holes management
udb_btree.h/cpp B+ Tree index implementation with multi-index support
udb_sync.h Thread synchronization primitives (mutexes, locks)

πŸ”§ Configuration Options

Key Types

Type Description Size Comparison
STRING Null-terminated string Variable (up to keySize) strcmp
INTEGER 16-bit signed integer 2 bytes Numeric
LONG_INT 32-bit signed integer 4 bytes Numeric
BLOCK Raw byte block Variable Byte-by-byte (MSB first)
NUM_BLOCK Numeric block (big-endian) Variable Byte-by-byte (LSB first)
CHARACTER Single character 1 byte ASCII
LOGICAL Boolean value 1 byte false < true

Index Attributes

Attribute Description
NONE No special attributes
UNIQUE Keys must be unique (duplicates rejected)
ALLOW_DELETE Enable key deletion with node rebalancing

Note: ALLOW_DELETE affects tree structure. When enabled, nodes are balanced during insert/delete operations, resulting in more predictable performance but slightly higher overhead.

πŸ“š API Reference

MultiIndex Class

The main class for B-Tree index operations.

Construction

// Create new file with N indexes
MultiIndex(const std::string& filename, uint16_t numIndexes);

// Open existing file
explicit MultiIndex(const std::string& filename);

Index Management

void setActiveIndex(uint16_t indexNo);    // Set active index (1-based)
uint16_t getActiveIndex() const;          // Get active index number
uint16_t getNumIndexes() const;           // Get total number of indexes
void initIndex(KeyType keyType,           // Initialize an index
               uint16_t keySize,
               IndexAttribute attributes,
               uint16_t numItems,         // B-Tree order (keys per node)
               int64_t freeCreateNodes,   // Nodes to pre-allocate
               int64_t freeCreateLeaves); // Leaves to pre-allocate

Key Operations

bool append(const void* key, int64_t dataPos);  // Add key-data pair
int64_t find(const void* key);                   // Find key, returns data position
bool deleteKey(const void* key);                 // Delete all entries with key
int64_t deleteCurrent();                         // Delete current entry

Navigation

int64_t getFirst(void* key = nullptr);   // Move to first entry
int64_t getNext(void* key = nullptr);    // Move to next entry
int64_t getPrev(void* key = nullptr);    // Move to previous entry
int64_t getCurrent(void* key = nullptr); // Get current entry
bool isEOF() const;                       // At end of index?
bool isBOF() const;                       // At beginning of index?

Error Handling

ErrorCode getError() const;   // Get current error code
bool hasError() const;        // Check if there's an error
void clearError();            // Clear error state

HeapFile Class

For variable-length record storage with space reclamation.

// Create new heap file
HeapFile(const std::string& filename, uint16_t holesTableSize);

// Open existing heap file
explicit HeapFile(const std::string& filename);

// Allocate space for a record
int64_t allocateSpace(size_t size);

// Free space occupied by a record
void freeSpace(int64_t position, size_t size);

πŸ§ͺ Running the Test Application

The test application (UDBEApp) provides an interactive interface:

> C                # Create new index file
> A Hello          # Append key "Hello"
> A World          # Append key "World"
> F Hello          # Find key "Hello"
> L                # List all keys
> .                # Go to first key
> +                # Next key
> -                # Previous key
> T                # Show current key
> R                # Remove current
> D Hello          # Delete key "Hello"
> N 1000           # Fill with 1000 sequential keys
> S                # Show statistics
> C 2              # Switch to index 2
> X                # Exit

πŸ“– Documentation

Detailed documentation is available in the doc/ directory:

  • B-Tree Algorithm - Comprehensive explanation of B-Tree/B+ Tree structure, operations (search, insert, delete), node splitting/merging, and performance characteristics
  • Heap File Structure - How variable-length records are stored, holes tables, space allocation/deallocation strategies
  • Architecture Overview - Library design, class hierarchy, error handling, thread safety model

πŸŽ“ Educational Value

This project is designed to be educational. Key learning points:

B-Tree Concepts

  • Why B-Trees are optimal for disk-based storage
  • Difference between B-Trees and B+ Trees
  • Node splitting and merging algorithms
  • How sequential access is optimized via leaf linking

File Organization

  • How to manage variable-length records
  • Space reclamation using holes tables
  • Checksum-based data integrity verification
  • File structure design for efficient I/O

Software Engineering

  • Modern C++ practices (RAII, smart pointers, exceptions)
  • Separation of concerns in class design
  • How to migrate legacy code to modern standards
  • Thread-safe design with recursive mutexes

πŸ”„ Migration from Legacy Code

This is a modernized version of classic DOS/Windows 3.1 database code. Key changes:

Aspect Original (DOS) Modern (C++20)
Memory farmalloc/farfree std::vector, RAII
Files DOS handles (_open, _read) std::fstream
Types int, long int64_t, uint16_t
Errors Global error codes Exceptions + error codes
Threading None std::recursive_mutex protection
Headers Custom header files #pragma once, include guards
Tests Manual testing Google Test framework

⚠️ Limitations

  1. Not for production use: This is an educational/reference implementation
  2. No transaction support: No ACID guarantees, no crash recovery
  3. Basic thread safety: Coarse-grained locking, not optimized for high concurrency
  4. No compaction: Heap file fragmentation accumulates over time
  5. Fixed endianness: Assumes little-endian (x86/x64)

πŸ› οΈ Completed Features

  • Unit test suite (Google Test) - 45 tests
  • Thread safety (recursive mutexes)
  • B-tree find() after node splits - FIXED
  • UNIQUE attribute constraint - FIXED
  • Node split and balance operations - FIXED

πŸ“œ License

This project is provided as-is for educational and reference purposes.

🀝 Contributing

Contributions are welcome! Please ensure:

  • Code follows the existing style and documentation patterns
  • All changes compile without warnings
  • All 45 tests pass (run the test suite)
  • Documentation is updated as needed

See TODO.md for the development roadmap and areas where contributions are welcome.

πŸ“ž Support

For questions or issues, please open an issue on the project repository.

About

Light-weigh database engine C++ library (b-tree indexing engine with dynamic heap-structured data records' storage implementation)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 100.0%