Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance test and document for DPI and tweak trait name. #4380

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/explanations.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,4 @@ read these documents in the following order:
* [Deep Dive into Legacy Connection Operators](explanations/connection-operators)
* [Properties](explanations/properties)
* [Layers](explanations/layers)
* [Calling Native Functions from Chisel (DPI)](explanations/dpi)
189 changes: 189 additions & 0 deletions docs/src/explanations/dpi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
uenoku marked this conversation as resolved.
Show resolved Hide resolved
layout: docs
title: "Calling Native Functions from Chisel (DPI)"
section: "chisel3"
---

# Calling Native Functions from Chisel (DPI)

## DPI Basics

Chisel's DPI API allows you to integrate native code into your Chisel hardware designs. This enables you to leverage existing libraries or implement functionality that is difficult to express directly in Chisel.

Here's a simple example that demonstrates printing a message from a C++ function:
```c++
extern "C" void hello()
{
std::cout << "hello from c++\\n";
}
```

To call this function from Chisel, we need to define a corresponding DPI object.

```scala mdoc:silent
import chisel3._
import chisel3.util.circt.dpi._

object Hello extends DPIVoidFunctionImport {
override val functionName = "hello"
override val clocked = true
final def apply() = super.call()
}

class HelloTest extends Module {
Hello() // Print
}
```

Output Verilog:

```scala mdoc:verilog
chisel3.docs.emitSystemVerilog(new HelloTest)
```

Explanation:

* `Hello` inherits from `DPIVoidFunctionImport` which defines a DPI function with a void return type in C++.
* `functionName` specifies the C-linkage name of the C++ function.
* `clocked = true` indicates that its function call is invoked at the clock's posedge.
* We define an `apply` method for a function-like syntax. This allows us to call the DPI function with `Hello()`

## Type ABI

Unlike normal Chisel compilation flow, we use a specific ABI for types to interact with DPI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means. Could you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be useful to explicitly point out that what we're doing in this section is showing how Chisel types map to C types.


### Argument Types

* Operand and result types must be passive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have another document that explains what "passive" means, we should link to it here. This is a rather technical term, and will otherwise cause confusion.

* A `Vec` is lowered to an *unpacked* *open* array type, e.g., `a: Vec<4, UInt>` to `byte a []`.
* A `Bundle` is lowered to a packed struct.
* `Int`, `SInt`, `Clock`, `Reset` types are lowered into 2-state types.
Small integer types (< 64 bit) must be compatible with C-types and arguments are passed by value. Users are required to use specific integer types for small integers shown in the table below. Large integers are lowered to bit and passed by reference.


| Width | Verilog Type | Argument Passing Modes |
| ----- | ------------ | ---------------------- |
| 1 | bit | value |
| 8 | byte | value |
| 16 | shortint | value |
| 32 | int | value |
| 64 | longint | value |
| > 64 | bit [w-1:0] | reference |

### Function Types
The type of DPI object you need depends on the characteristics of your C++ function:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of DPI object I need for what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading ahead, I think this section would be better typed out in prose, rather than summarized as a bulleted list.

"There are several base classes which are used to define DPI functions in Chisel. Which one you use depends on two factors: the return type of the function in C++ and how the DPI is called in relation to the clock."

Then outline the information in the bullet points in two paragraphs after that.


* Return Type
* For functions that don't return a value (like hello), use `DPIVoidFunctionImport`.
* For functions with a return type (e.g., integer addition), use `DPINonVoidFunctionImport[T]`, where `T` is the return type.
* The output argument must be the last argument in DPI fuction.
* Clocked vs. Unclocked:
* Clocked: The function call is evaluated at the associated clock's positive edge. By default, Chisel uses the current module's clock. For custom clocks, use `withClocked(clock) {..}`. If the function has a return value, it will be available in the next clock cycle.
* Unclocked: The function call is evaluated immediately.

## Example: Adding Two Numbers
Here's an example of a DPI function that calculates the sum of two numbers:

```c++
extern "C" void add(int lhs, int rhs, int* result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It surprised me to see here that the int* result is an out-parameter. Is that always the case? (If so, maybe it needs to be called out earlier. Or maybe I just missed it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I thought I clarified that but apparently it was dropped somewhere. I'll add explanation for that.

{
*result = lhs + rhs;
}
```

```scala mdoc:silent
trait AddBase extends DPINonVoidFunctionImport[UInt] {
override val functionName = "add"
override val ret = UInt(32.W)
override val inputNames = Some(Seq("lhs", "rhs"))
override val outputName = Some("result")
final def apply(lhs: UInt, rhs: UInt): UInt = super.call(lhs, rhs)
}

object AddClocked extends AddBase {
override val clocked = true
}

object AddUnclocked extends AddBase {
override val clocked = false
}

class AddTest extends Module {
val io = IO(new Bundle {
val a = Input(UInt(32.W))
val b = Input(UInt(32.W))
val c = Output(UInt(32.W))
val d = Output(UInt(32.W))
val en = Input(Bool())
})

// Call DPI only when `en` is true.
when (io.en) {
io.c := AddClocked(io.a, io.b)
io.d := AddUnclocked(io.a, io.b)
} .otherwise {
io.c := 0.U(32.W)
io.d := 0.U(32.W)
}
}
```

```scala mdoc:verilog
chisel3.docs.emitSystemVerilog(new AddTest)
```


Explanation:

* `Add` inherits from `DPINonVoidFunctionImport[UInt]` because it returns a 32-bit unsigned integer.
* `ret` specifies the return type.
* `clocked` indicates that this is a clocked function call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is set to false so isn't NOT a clocked function call?

* `inputNames` and `outputName` provide optional names for the function's arguments and return value (these are just for Verilog readability).

## Example: Sum of an array
Chisel vectors are converted into SystemVerilog open arrays when used with DPI. Since memory layout can vary between simulators, it's recommended to use `svSize` and `svGetBitArrElemVecVal` to access array elements.

```c++
extern "C" void sum(const svOpenArrayHandle array, int* result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is svOpenArrayHandle?

I guess, more generally, what libraries does Chisel depend on in the C++ DPI? We can't simply assume that a general Chisel user will have whatever library exposes DPI.h or whatever it is.

It would be good to have a section at the very start that gives the C++ prerequisites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, that makese sense. SV Spec (Section 35) defineds svdpi.h which declares these functions and types (including svOpenArrayHandle). These are expected to be implemented by simulators. I'll create a section for that.

// Get a length of the open array.
int size = svSize(array, 1);
// Initialize the result value.
*result = 0;
for(size_t i = 0; i < size; ++i) {
svBitVecVal vec;
svGetBitArrElemVecVal(&vec, array, i);
*result += vec;
}
}
```

```scala mdoc:silent

object Sum extends DPINonVoidFunctionImport[UInt] {
override val functionName = "sum"
override val ret = UInt(32.W)
override val clocked = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should clocked be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OpenArray example both clocked and unclocked work. So I didn't explicitly describe the example. It depends on whether user wants to evaluate the value at every input value changes or clock's posedge.
This is an example for Add

dpi.io.add_clocked_result.peek()
dpi.io.add_clocked_result.expect(60)
dpi.io.add_unclocked_result.peek()
dpi.io.add_unclocked_result.expect(36)

override val inputNames = Some(Seq("array"))
override val outputName = Some("result")
final def apply(array: Vec[UInt]): UInt = super.call(array)
}

class SumTest extends Module {
val io = IO(new Bundle {
val a = Input(Vec(3, UInt(32.W)))
val sum_a = Output(UInt(32.W))
})

io.sum_a := Sum(io.a) // compute a[0] + a[1] + a[2]
}
```

```scala mdoc:verilog
chisel3.docs.emitSystemVerilog(new SumTest)
```

# FAQ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show an example where having clocked = false makes sense? Is that done in a when context...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclocked calls be useful when user wants to replace pure (but expensive) combinatorial logics with dpi call (actually Add is a good example of such use case).
An unclocked call under when(cond) is lowered into always_comb + if(cond) and DPI is conditionally invoked as well. I updated Add example to include both of them.

class AddTest extends Module {
  val io = IO(new Bundle {
    val a = Input(UInt(32.W))
    val b = Input(UInt(32.W))
    val c = Output(UInt(32.W))
    val d = Output(UInt(32.W))
    val en = Input(Bool())
  })

  // Call DPI only when `en` is true.
  when (io.en) {
    io.c := AddClocked(io.a, io.b)
    io.d := AddUnclocked(io.a, io.b)
  } .otherwise {
    io.c := 0.U(32.W)
    io.d := 0.U(32.W)
  }
}
module AddTest(
  input         clock,
                reset,
  input  [31:0] io_a,
                io_b,
  output [31:0] io_c,
                io_d,
  input         io_en
);

  logic [31:0] _add_0;
  reg   [31:0] _GEN;
  always @(posedge clock) begin
    if (io_en) begin
      add(io_a, io_b, _add_0);
      _GEN <= _add_0;
    end
  end // always @(posedge)
  reg   [31:0] _GEN_0;
  always_comb begin
    if (io_en) begin
      add(io_a, io_b, _GEN_0);
    end
    else
      _GEN_0 = 32'bx;
  end // always_comb
  assign io_c = io_en ? _GEN : 32'h0;
  assign io_d = io_en ? _GEN_0 : 32'h0;

* Can we export functions? -- No, not currently. Consider using a black box for such functionality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "export" here?

* Can we call a DPI function in initial block? -- No, not currently. Consider using a black box for initialization.
* Can we call two clocked DPI calls and pass the result to another within the same clock? -- No, not currently. Please merge the DPI functions into a single function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this last point means.

It feels like what we actually want is a light explanation of the underlying "framework" for the DPI calls, and that the above comment is actually a restriction imposed on this framework.

For example, is this true?: "All DPI calls on the same clock are dispatched at the same time. Because of this, it is not possible to use the result of one DPI call as an argument to another."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All DPI calls on the same clock are dispatched at the same time. Because of this, it is not possible to use the result of one DPI call as an argument to another

Yes, that's correct. This is fundamental restrictions for side-effecting operations in Chisel but DPI is a first example that has both side-effect and results.

37 changes: 34 additions & 3 deletions src/main/scala/chisel3/util/circt/DPI.scala
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,25 @@ trait DPIFunctionImport {
def inputNames: Option[Seq[String]] = None
}

// Base trait for a non-void function that returns `T`.
trait DPINonVoidFunctionImport[T <: Data] extends DPIFunctionImport {

/** Base trait for a non-void function that returns `T`.
*
* @tparam T Return type
* @see Please refer [[https://www.chisel-lang.org/docs/explanations/dpi]] for more detail.
* @example {{{
* object Add extends DPINonVoidFunctionImport[UInt] {
* override val functionName = "add"
* override val ret = UInt(32.W)
* override val clocked = false
* override val inputNames = Some(Seq("lhs", "rhs"))
* override val outputName = Some("result")
* final def apply(lhs: UInt, rhs: UInt): UInt = super.call(lhs, rhs)
* }
*
* Add(a, b) // call a native `add` function.
* }}}
*/
def ret: T
def clocked: Boolean
def outputName: Option[String] = None
Expand All @@ -133,8 +150,22 @@ trait DPINonVoidFunctionImport[T <: Data] extends DPIFunctionImport {
final def call(data: Data*): T = callWithEnable(true.B, data: _*)
}

// Base trait for a clocked void function.
trait DPIClockedVoidFunctionImport extends DPIFunctionImport {
trait DPIVoidFunctionImport extends DPIFunctionImport {

/** Base trait for a void function.
*
* @see Please refer [[https://www.chisel-lang.org/docs/explanations/dpi]] for more detail.
* @example {{{
* object Hello extends DPIVoidFunctionImport {
* override val functionName = "hello"
* override val clocked = true
* final def apply() = super.call()
* }
*
* Hello() // call a native `hello` function.
* }}}
*/
def clocked: Boolean
final def callWithEnable(enable: Bool, data: Data*): Unit =
RawClockedVoidFunctionCall(functionName, inputNames)(Module.clock, enable, data: _*)
final def call(data: Data*): Unit = callWithEnable(true.B, data: _*)
Expand Down
31 changes: 30 additions & 1 deletion src/test/scala/chiselTests/DPISpec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ private object EmitDPIImplementation {
val dpiImpl = s"""
|#include <stdint.h>
|#include <iostream>
|#include <svdpi.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate an earlier point: Let's add a prerequisites section near the top. Let's make sure to name what library(ies?) are needed for Chisel DPI, and let's be sure to include the svdpi.h header in our examples, since otherwise, the code doesn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also link to the documentation so that Chisel users can quickly cross-reference the DPI primitives, such as svBitVecVal or svOpenArrayHandle.

|
|extern "C" void hello()
|{
Expand All @@ -27,6 +28,16 @@ private object EmitDPIImplementation {
|{
| *result = lhs + rhs;
|}
|
|extern "C" void sum(const svOpenArrayHandle array, int* result) {
| int size = svSize(array, 1);
| *result = 0;
| for(size_t i = 0; i < size; ++i) {
| svBitVecVal vec;
| svGetBitArrElemVecVal(&vec, array, i);
| *result += vec;
| }
|}
""".stripMargin

class DummyDPI extends BlackBox with HasBlackBoxInline {
Expand Down Expand Up @@ -62,8 +73,9 @@ class DPIIntrinsicTest extends Module {
io.add_unclocked_result := result_unclocked
}

object Hello extends DPIClockedVoidFunctionImport {
object Hello extends DPIVoidFunctionImport {
override val functionName = "hello"
override val clocked = true
final def apply() = super.call()
}

Expand All @@ -85,22 +97,35 @@ object AddUnclocked extends DPINonVoidFunctionImport[UInt] {
final def apply(lhs: UInt, rhs: UInt): UInt = super.call(lhs, rhs)
}

object Sum extends DPINonVoidFunctionImport[UInt] {
override val functionName = "sum"
override val ret = UInt(32.W)
override val clocked = false
override val inputNames = Some(Seq("array"))
override val outputName = Some("result")
final def apply(array: Vec[UInt]): UInt = super.call(array)
}

class DPIAPITest extends Module {
val io = IO(new Bundle {
val a = Input(UInt(32.W))
val b = Input(UInt(32.W))
val add_clocked_result = Output(UInt(32.W))
val add_unclocked_result = Output(UInt(32.W))
val sum_result = Output(UInt(32.W))
})

EmitDPIImplementation()

Hello()

val result_clocked = AddClocked(io.a, io.b)
val result_unclocked = AddUnclocked(io.a, io.b)
val result_sum = Sum(VecInit(Seq(io.a, io.b, io.a)))

io.add_clocked_result := result_clocked
io.add_unclocked_result := result_unclocked
io.sum_result := result_sum
}

class DPISpec extends AnyFunSpec with Matchers {
Expand Down Expand Up @@ -151,6 +176,8 @@ class DPISpec extends AnyFunSpec with Matchers {
dpi.io.b.poke(36.U)
dpi.io.add_unclocked_result.peek()
dpi.io.add_unclocked_result.expect(60)
dpi.io.sum_result.peek()
dpi.io.sum_result.expect(84)

dpi.clock.step()
dpi.io.a.poke(24.U)
Expand All @@ -159,6 +186,8 @@ class DPISpec extends AnyFunSpec with Matchers {
dpi.io.add_clocked_result.expect(60)
dpi.io.add_unclocked_result.peek()
dpi.io.add_unclocked_result.expect(36)
dpi.io.sum_result.peek()
dpi.io.sum_result.expect(60)
}
.result

Expand Down