Skip to content

Conversation

jmitrevs
Copy link
Contributor

Description

This attempts to create an oneAPI accelerator backend based on #1246, leaving the standard oneAPI backend largely unchanged. The oneAPI Accelerator backend

  • Utilizes sideband signals (sop and eop) in StreamingBeat for multi-kernel synchronization.
  • Adds a wrapper to the top level function to to make an an always-run kernels.
  • Introduces compile-time type extraction utilities for streamlined template handling.
  • Adds DMA-based data movement for generic execution.
  • Automatically generates the code with three communicating kernels

Sideband Signal Support

  • Added start-of-packet (sop) and end-of-packet (eop) signals for kernel synchronization.
  • The following using-directive is generated per inter-kernel pipe and hostpipe. This ensures multiple kernels can operate in sync.
    using InputBeatT = sycl::ext::intel::experimental::StreamingBeat<
        data_T, // Data type
        true,    // Enable start-of-packet
        true>;   // Enable end-of-packet

Utilizes while loop for always-on kernel execution.

  • Uses sop/eop sideband signals for synchronization.
  • In io_parallel mode, the while loop is added to the main kernel code, myproject.cpp.
  • In io_stream mode, wrapper layers, SidebandExtraction and SidebandMerging, are added to handle the sideband signals at the beginning and end of the loop. The standard layers remain unchanged, and they do not use sideband signals. The sidebands are passed directly from the SidebandExtraction to the SidebandMerging layers via a separate pipe. This is an overhead, though the pipe payload is only 2-bits wide, to try to minimize the overhead.

Added DMA Kernels for Hardware Execution

  • DMA-based data movement for improved memory transfer:

    • DMA_convert_data and DMA_convert_data_back move data between host and FPGA efficiently.
    template <class srcType, class dest_pipe, size_t num_iterations> struct DMA_convert_data {};
    template <class src_pipe, class dstType, size_t num_iterations> struct DMA_convert_data_back {};
  • Modification to the way that testbench starts

    q.single_task(DMA_convert_data<float, Conv1DInputPipe, num_iterations>{vals_ptr});
    q.single_task(Myproject{});
    q.single_task(DMA_convert_data_back<Layer4OutPipe, float, num_iterations>{output_ptr}).wait();
  • Modification to the way that python bridge under way (but not yet complete). It uses the three kernels as above, including the DMA transfers. However, at the moment the compilation is still software emulation only, and the function calls are for one input set at a time. Support will be added to be able to call the function with a sequence of inputs and to compile the bridge to run on hardware.

Utility Functions for Compile-Time Type Extraction

  • Added helper structs to extract data types from pipes and StreamingBeat:

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change which adds functionality)

Tests

This is still in progress, not automated. We should make sure we do not break other tests, though.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant