-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TT-Mesh Programming example demonstrating MeshTrace and Multi-MeshCQ #18128
base: main
Are you sure you want to change the base?
Conversation
// in the Virtual Mesh | ||
SubDevice sub_device_1(std::array{CoreRangeSet(CoreRange({0, 0}, {0, 0}))}); | ||
SubDevice sub_device_2(std::array{CoreRangeSet(CoreRange({1, 1}, {1, 1}))}); | ||
auto sub_device_manager = mesh_device->create_sub_device_manager({sub_device_1, sub_device_2}, 3200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is 3200 here?
// in the Virtual Mesh | ||
SubDevice sub_device_1(std::array{CoreRangeSet(CoreRange({0, 0}, {0, 0}))}); | ||
SubDevice sub_device_2(std::array{CoreRangeSet(CoreRange({1, 1}, {1, 1}))}); | ||
auto sub_device_manager = mesh_device->create_sub_device_manager({sub_device_1, sub_device_2}, 3200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can someone explain why dev needs to do this sequence?
mesh_device->create_sub_device_manager
mesh_device->load_sub_device_manager
|
||
// =========== Step 3: Create Workloads to run on the Virtual Mesh =========== | ||
// Specify Device Ranges on which the Workloads will run | ||
LogicalDeviceRange all_devices({0, 0}, {mesh_device->num_cols() - 1, mesh_device->num_rows() - 1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume {0, 0} is a Device Coordinate in the mesh. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceRange vs LocalDeviceRange - why do we need to be explicit about "Local"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Local/Logical?
Fwiw, I am about to nuke this in favor of ND ranges. New terminology is MeshCoordinate
, MeshCoordinateRange
, MeshShape
. There is a constructor to create a range that spans the entire shape - it can be used here.
auto trace_id = BeginTraceCapture(mesh_device.get(), 0); | ||
EnqueueMeshWorkload(mesh_device->mesh_command_queue(), add_mesh_workload, false); | ||
EnqueueMeshWorkload(mesh_device->mesh_command_queue(), multiply_and_subtract_mesh_workload, false); | ||
EndTraceCapture(mesh_device.get(), 0, trace_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder: this is error prone if exception hits
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_src0_buf, add_src0_vec); | ||
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_src1_buf, add_src1_vec); | ||
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_src0_buf, mul_sub_src0_vec); | ||
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_src1_buf, mul_sub_src1_vec); | ||
// Synchronize | ||
EnqueueRecordEvent(mesh_device->mesh_command_queue(data_movement_cq), write_event); | ||
EnqueueWaitForEvent(mesh_device->mesh_command_queue(workload_cq), write_event); | ||
// =========== Step 8: Run MeshTrace on MeshCQ0 =========== | ||
ReplayTrace(mesh_device.get(), workload_cq, trace_id, true); | ||
// Synchronize | ||
EnqueueRecordEvent(mesh_device->mesh_command_queue(workload_cq), trace_event); | ||
EnqueueWaitForEvent(mesh_device->mesh_command_queue(data_movement_cq), trace_event); | ||
// =========== Step 9: Read Outputs on MeshCQ1 =========== | ||
std::vector<bfloat16> add_dst_vec = {}; | ||
std::vector<bfloat16> mul_sub_dst_vec = {}; | ||
EnqueueReadMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_dst_vec, add_output_buf); | ||
EnqueueReadMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_dst_vec, mul_sub_output_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see how things look natively for c++ consumers
CC @pgkeller @davorchap
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_src0_buf, add_src0_vec); | |
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_src1_buf, add_src1_vec); | |
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_src0_buf, mul_sub_src0_vec); | |
EnqueueWriteMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_src1_buf, mul_sub_src1_vec); | |
// Synchronize | |
EnqueueRecordEvent(mesh_device->mesh_command_queue(data_movement_cq), write_event); | |
EnqueueWaitForEvent(mesh_device->mesh_command_queue(workload_cq), write_event); | |
// =========== Step 8: Run MeshTrace on MeshCQ0 =========== | |
ReplayTrace(mesh_device.get(), workload_cq, trace_id, true); | |
// Synchronize | |
EnqueueRecordEvent(mesh_device->mesh_command_queue(workload_cq), trace_event); | |
EnqueueWaitForEvent(mesh_device->mesh_command_queue(data_movement_cq), trace_event); | |
// =========== Step 9: Read Outputs on MeshCQ1 =========== | |
std::vector<bfloat16> add_dst_vec = {}; | |
std::vector<bfloat16> mul_sub_dst_vec = {}; | |
EnqueueReadMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), add_dst_vec, add_output_buf); | |
EnqueueReadMeshBuffer(mesh_device->mesh_command_queue(data_movement_cq), mul_sub_dst_vec, mul_sub_output_buf); | |
auto data_movement_cq = mesh_device->command_queue(0); | |
auto workload_cq = mesh_device->command_queue(1); | |
data_movement_cq.write_buffer(add_src0_buf, add_src0_vec); | |
data_movement_cq.write_buffer(add_src1_buf, add_src1_vec); | |
data_movement_cq.write_buffer(mul_sub_src0_buf, mul_sub_src0_vec); | |
data_movement_cq.write_buffer(mul_sub_src1_buf, mul_sub_src1_vec); | |
// Synchronize | |
data_movement_cq.record_event(write_event); | |
workload_cq.wait_for_event(write_event); | |
// =========== Step 8: Run MeshTrace on MeshCQ0 =========== | |
auto replay_status = workload_cq.replay(trace_id); | |
replay_status.wait(); | |
// Synchronize | |
workload_cq.record_event(trace_event); | |
data_movement_cq.wait_for_event(trace_event); | |
// =========== Step 9: Read Outputs on MeshCQ1 =========== | |
std::vector<bfloat16> add_dst_vec = {}; | |
std::vector<bfloat16> mul_sub_dst_vec = {}; | |
data_movement_cq.read_buffer(add_dst_vec, add_output_buf); | |
data_movement_cq.read_buffer(mul_sub_dst_vec, mul_sub_output_buf); |
i did remove enqueue here, because those are calls of command queue, it should be clear that they are async.
also maybe instead of record_event
, I'd use place_event
. and instead of wait_for_event
- synchronize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infra portion looks fine
Ticket
No Ticket.
Problem description
Missing programming examples for
MeshTrace
and Multi-CQ execution on aMeshDevice
.What's changed
Add programming example tracing eltwise binary
MeshWorkloads
, using 2 CQs for data-movement and compute + usingMeshEvents
for synchronization.The workloads target different
SubDevices
, thus running simultaneously on theMeshDevice
.Checklist