Skip to content

BitNet Inference Web UI: A modern web interface for running Microsoft's BitNet models efficiently on CPU. This project provides a user-friendly way to download, manage, and run inference with 1-bit quantized language models.

License

Notifications You must be signed in to change notification settings

mindscope-world/fastapi-bitnet-inference

Repository files navigation

BitNet Inference Web UI 🧠

A modern web interface for running Microsoft's BitNet models efficiently on CPU. This project provides a user-friendly way to download, manage, and run inference with 1-bit quantized language models.

BitNet Inference UI

🌟 Features

  • Easy Model Management

    • One-click downloads from Hugging Face
    • Direct model uploads (GGUF format)
    • Real-time download progress tracking
    • Popular models quick access
  • Efficient Inference

    • CPU-optimized inference
    • Support for 1-bit quantized models
    • Conversation mode
    • Adjustable parameters (temperature, max tokens)
  • Modern UI/UX

    • Clean, responsive interface
    • Dark/Light theme support
    • Real-time status updates
    • System logs viewer
  • Technical Features

    • FastAPI backend
    • Async model downloads
    • Automatic fallback mechanisms
    • Progress monitoring system BitNet Inference UI

🚀 Getting Started

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • CPU with AVX2 support (recommended)

Installation

  1. Clone the repository:
git clone https://github.com/mindscope-world/bitnet-inference.git
cd bitnet-inference
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py

The web interface will be available at http://localhost:8000

💻 Usage

Downloading Models

  1. Navigate to the "Download Model" tab
  2. Enter a model name or HuggingFace path (e.g., microsoft/BitNet-b1.58-2B-4T)
  3. Click "Download Model"
  4. Monitor the download progress in real-time

Running Inference

  1. Ensure a model is loaded
  2. Enter your prompt in the text area
  3. Adjust generation parameters if needed:
    • Temperature (0.1 - 1.5)
    • Max Tokens (10 - 2048)
    • Conversation Mode (on/off)
  4. Click "Generate" BitNet Inference UI

Model Compatibility

The application supports various BitNet models, including:

  • BitNet-b1.58-2B-4T
  • bitnet_b1_58-large
  • bitnet_b1_58-3B

🛠️ Technical Details

Architecture

bitnet-inference/
├── app/
│   ├── static/
│   │   ├── imgs/
│   │   ├── css/
│   │   └── js/
│   ├── templates/
│   └── models/
├── app.py
├── setup_env.py
├── simple_model_server.py
└── requirements.txt

Key Components

  • FastAPI Backend: Handles model management and inference requests
  • Async Downloads: Non-blocking model downloads with progress tracking
  • Fallback System: Automatic switching between optimized and standard inference
  • Theme System: Dynamic theme switching with system preference detection

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Contributors

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📞 Support

For support, please open an issue in the GitHub repository or contact @mindscope-world.

🔮 Future Plans

  • Add batch processing support
  • Implement model fine-tuning interface
  • Add more visualization options
  • Support for custom quantization
  • API documentation interface
  • Docker deployment support

Made with ❤️ by @mindscope-world

About

BitNet Inference Web UI: A modern web interface for running Microsoft's BitNet models efficiently on CPU. This project provides a user-friendly way to download, manage, and run inference with 1-bit quantized language models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published