FlatRecon: Flat Placement Reconstruction Full Legalizer #3193

haydar-c · 2025-07-08T11:34:47Z

This PR brings the FlatRecon full legalizer. It aims to reconstruct a given flat placement with minimum disturbance to the given solution. It can be used with a solution read from an '.fplace' file or with Global Placement output. However, it expects the given solution to be close to legal in both cases.

It has beed run on the MCNC, VTR Chain, Koios, and Titan benchmarks. It also been tested on the elfPlace placements generated from the aug-elfPlace on Titan benchmarks. [Results to be appended]

I have added 2 regression tests for that legalizer. The first into the vtr_reg_strong with fast checks on MCNC benchmarks and the second one into the vtr_reg_nightly_test7 with some of the Titan benchmarks.

The logged output of that legalizer looks like below for an example of the LU32PEEng.v from VTR Chain:

This works well for the reconstruction of provided flat placement. However, if used directly after GP output, if we have too many orphan molecules the runtime goes really high.

The legalizations strategy of cluster_legalizer is set to SKIP_LB_ROUTING for the first passs. After first pass, each checked and illegal ones destroyed and their molecules are passed to second pass. The legalization strategy is set to FULL after first pass in that commit. Be careful with the place where you compress the legalizer and extract the atom lookup for placement.

Before that, the strategy was converted to full after the first reconstruction pass. After that, I was doing a neighbour search with full strategy. Now, I am doing neighbour search with SKIP_LB_ROUTING as well. After that, it requires last neighbour pass with FULL strategy. Also added ugly timer prints to be cleaned.

Changed the way of processing mainly in the first pass. In this version, they are first sorted by external pin number and then grouped into tiles. Then processed similar to naive by tiles but this one still checks the compatibility and capacity. Cleaning after each tile clusters created in that pass. Then, checking each cluster legality after they are created in the first neighbour pass. Second is already done with FULL strategy. The memory usage is decreased by nearly by %35. We lost runtime but it seems near 1-2%. TODO for that one: I added an if in neighbour pass that checks if the molecule is already clustered before checking its compatibility. This works now but ideally my algorithm should not try to add an already clustered molecule to another one. When removed, it failed on vtr_chain_largest for only LU32PEEng.v.

This is the version that results are presented in 2 May Friday meeting. The memory improved version is presented with master merged. We have things to try on top of that version. Handling the illegal clusters right away. Also added a 0.5f buffer into fixed blocks in partial placement verification to pass the assert. This should be handled more explicitely later.

In this version, after doing my packing for reconstruction, I am calling the initial_placement for ap flow. The results seems promising for this version and being added to the presentation. Especially reduces the max displacement as expected.

The displacement for each atom was being calculated as distance to the head of the tile. Now it considers the offset of the tile as well. That affects the percent of atoms displaced, max atom displacemetn and average atom displacement. Did no touched the cluster error for now. This can also be rewritten. Instead of being calculated solely on cluster location and centroid, we can try to use cluster information.

…terId

first pass if any failing cluster occurs In the first pass of reconstruction, creating clusters at a tile and checking whether they are legal or not. If illegal, try to cluster with FULL strategy there before going to next tile. Molecules being still unclustered passed to neighbour pass.

…terId

Prioritizing chained molecules in the reconstruction pass. Ensuring initial placer placing the clusters created in reconstruction pass first. Its current sorting also results in the same ordering due to standart deviation and size ordering. However, if used in reconstruction legalizer, I want to ensure that clusters created in reconstruction pass processed first.

…terId

Prioritizing the long carry chains in the reconstruction pass.

…terId

reading interface

…_legalizer and its dependances

… update documentations

…terId

AlexandreSinger

Hi @haydar-c

Thank you so much for these changes. I have some comments on the code. Two major things that must be resolved before this can come in is a quadratic run time loop in your algorithm which can be resolved and an issue with how you are compressing the cluster legalizer.

AlexandreSinger · 2025-07-08T20:59:25Z

vpr/src/analytical_place/analytical_placement_flow.cpp

+                g_vpr_ctx.mutable_atom().mutable_flat_placement_info().blk_x_pos[atom_blk_id] = g_vpr_ctx.device().grid.width() / 2.0f;
+                g_vpr_ctx.mutable_atom().mutable_flat_placement_info().blk_y_pos[atom_blk_id] = g_vpr_ctx.device().grid.height() / 2.0f;
+                g_vpr_ctx.mutable_atom().mutable_flat_placement_info().blk_layer[atom_blk_id] = 0;
+                g_vpr_ctx.mutable_atom().mutable_flat_placement_info().blk_sub_tile[atom_blk_id] = 0;


Mutating the global flat placement info here is odd to me. Why does it need to be updated? Can we not modify the flat placement object coming in? Or even better leave the flat placement object as it is and only use the partial placement object?

AlexandreSinger · 2025-07-08T21:04:10Z

vpr/src/analytical_place/analytical_placement_flow.cpp

@@ -245,6 +264,15 @@ void run_analytical_placement_flow(t_vpr_setup& vpr_setup) {
    // Print the device utilization
    print_device_utilization(target_device_utilization);

+    // Write out a flat placement file at the end of Full Legalization if the


By moving this to here, you have changed the meaning of the write_flat_place option:

I am ok with this change; in the future, if we want the post-placement flat placement, we can create another option. You should update the documentation for this option accordingly.

Since this has become AP-specific though, you should probably change this option to something like "--ap_write_post_fl_flat_place" or something to make this more clear.

It seems strange to not be able to write out the final placement; maybe we need another command line option to specify this.

AlexandreSinger · 2025-07-08T21:11:38Z

vpr/src/analytical_place/analytical_placement_flow.cpp

-                atom_loc_layer = current_loc_layer;
-                atom_loc_sub_tile = current_loc_sub_tile;
-                found_valid_atom = true;
+                if (current_loc_x != -1 && current_loc_y != -1 && current_loc_layer != -1 && current_loc_sub_tile != -1) {


This line concerns me a little. Here, -1 means that the user has left that part of the location unspecified.

For example, it is common for people to leave the layer or sub_tile unspecified instead of forcing it to 0. I think that there should be special handling for that. If not, this code will put a lot of blocks in the center of the device!

AlexandreSinger · 2025-07-08T21:13:04Z

vpr/src/analytical_place/full_legalizer.h

+ * both cases it expects the given flat placement to be close to legal.
+ *
+ * Before packing, it sorts the molecules such that long carry chain elements
+ * are being first, and then molecules with highest external input pin numbers


"are being first" -> "are packed first"

AlexandreSinger · 2025-07-08T21:13:53Z

vpr/src/analytical_place/full_legalizer.h

+ *
+ * Before packing, it sorts the molecules such that long carry chain elements
+ * are being first, and then molecules with highest external input pin numbers
+ * becoming first among long carry chain elements and the others. It then


What does this mean? Which is packed first, long carry chains or high external input pin? I think this sentence needs to be cleaned up.

AlexandreSinger · 2025-07-08T22:53:01Z

vpr/src/analytical_place/full_legalizer.cpp

+    //       how reconstruction pass clusters created, so there is no need to explicitely
+    //       prioritize these clusters. However, if it changes in time, the atoms clustered
+    //       in neighbor pass and atoms misplaced might not match exactly.
+    if (g_vpr_ctx.atom().flat_placement_info().valid) {


Why even use the global flat placement info here? Why not just always just cast the p_placement object into the flat placement info? This would simplify the code and it would also remove your reliance on this global variable.

AlexandreSinger · 2025-07-08T22:56:41Z

vpr/src/base/load_flat_place.cpp

    for (AtomBlockId atom_blk_id : atom_netlist.blocks()) {
+        if (!atom_blk_id.is_valid()) {


Why is this check needed? I do not think the atom blocks can ever be invalid...

AlexandreSinger · 2025-07-08T22:59:19Z

vpr/src/base/load_flat_place.cpp

+        // Then compute the L1 distance from the block location to the projected
+        // position. This will be the minimum distance this point needs to move.
+        float dx = std::abs(proj_x - blk_x);
+        float dy = std::abs(proj_y - blk_y);


I really like this change. I would call the method though instead of inlining the code:

vtr-verilog-to-routing/vpr/src/base/flat_placement_utils.h

Lines 24 to 50 in 1ed4a94

/**

* @brief Returns the L1 distance something at the given flat location would

* need to move to be within the bounds of a tile at the given tile loc.

*/

inline float get_manhattan_distance_to_tile(const t_flat_pl_loc& src_flat_loc,

const t_physical_tile_loc& tile_loc,

const DeviceGrid& device_grid) {

// Get the bounds of the tile.

// Note: The get_tile_bb function will not work in this case since it

// subtracts 1 from the width and height.

auto tile_type = device_grid.get_physical_type(tile_loc);

float tile_xmin = tile_loc.x - device_grid.get_width_offset(tile_loc);

float tile_xmax = tile_xmin + tile_type->width;

float tile_ymin = tile_loc.y - device_grid.get_height_offset(tile_loc);

float tile_ymax = tile_ymin + tile_type->height;

// Get the closest point in the bounding box (including the edges) to

// the src_flat_loc. To do this, we project the point in L1 space.

float proj_x = std::clamp(src_flat_loc.x, tile_xmin, tile_xmax);

float proj_y = std::clamp(src_flat_loc.y, tile_ymin, tile_ymax);

// Then compute the L1 distance from the src_flat_loc to the projected

// position. This will be the minimum distance this point needs to move.

float dx = std::abs(proj_x - src_flat_loc.x);

float dy = std::abs(proj_y - src_flat_loc.y);

return dx + dy;

}

AlexandreSinger · 2025-07-08T23:04:00Z

...r_reg_nightly_test7/ap_reconstruction/constraints/cholesky_bdti_stratixiv_arch_timing.fplace

How big are these files? These files look massive. I do not recommend commiting them into the git repostory if they are large.

I would recommend compressing them, and then in the CI test uncompressing them. See what I did for the Z1000 architecture for example. The issue is that these files add a lot of bloat to the repository which we do not want.

If this blocks you, I would recommend keeping these tests behind for now and adding them later.

As long as we don't do enormous circuits these files shouldn't be too big. One line per primitive, so 100,000 lines for neuron; if the primitive names are long that could be 10 MB. Not 0, but should be OK as long as we don't do the biggest designs or too many. If we do those we probably have to bite the bullet and compress them as @AlexandreSinger suggests.

AlexandreSinger · 2025-07-08T23:05:11Z

.../regression_tests/vtr_reg_strong/strong_ap/flatrecon_full_legalizer/constraints/frisc.fplace

This one is fine in size; but we want to keep the file sizes for these to on the order of MBs if possible.

vaughnbetz

Just did a quick review since it looks like there will be some changes due to @AlexandreSinger 's comments.
The output messages look good, except I suggest adding something to make clear what reconstruction pass and neighbour pass mean (or change the terminology to something that's more obvious to the reader). You want to make clear that 'reconstruction pass' means it went where you said it should go, and 'neighbour pass' means it was modified (moved and/or grouped with other primitives than specified).

vaughnbetz · 2025-07-09T18:04:30Z

doc/src/vpr/command_line_usage.rst

@@ -1286,6 +1286,12 @@ Analytical Placement is generally split into three stages:

    * ``appack`` Use APPack, which takes the Packer in VPR and uses the flat atom placement to create better clusters.

+    * ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement
+      that is as close to the incoming flat placement as possible. It can be used to read flat placement from an '.fplace' file


read flat -> read a flat
from an '.fplace' -> from a '.fplace'

vaughnbetz · 2025-07-09T18:05:32Z

doc/src/vpr/command_line_usage.rst

+      that is as close to the incoming flat placement as possible. It can be used to read flat placement from an '.fplace' file
+      or with Global Placement output. In both cases, it expects the given solution to be close to legal. If used with
+      an '.fplace' file, each atom of a molecule should share same location information. If none of the atoms in a molecule
+      have a specified location in the file, the molecule will be assigned to the center of the device.


Have to explain what this means. You won't really leave it in the center of the device. I suggest just saying it is legal to leave some molecules unconstrained; the reconstruction phase will choose where to place them but does not attempt to optimize those locations.

Need a link to the file format (which has to be described somewhere).

vaughnbetz · 2025-07-09T18:07:41Z

vpr/src/analytical_place/analytical_placement_flow.cpp

@@ -245,6 +264,15 @@ void run_analytical_placement_flow(t_vpr_setup& vpr_setup) {
    // Print the device utilization
    print_device_utilization(target_device_utilization);

+    // Write out a flat placement file at the end of Full Legalization if the


It seems strange to not be able to write out the final placement; maybe we need another command line option to specify this.

soheilshahrouz · 2025-07-10T17:17:00Z

vpr/src/analytical_place/partial_placement.cpp

+                 * In gen_ap_netlist_from_atoms, we add 0.5f to each atom but do not consider
+                 * it here. Adding these buffers here as well if the flat placement provided.
+                 * TODO: This should be handled more explicitely.
+                 */


To be consistent with the coding style guide, use // for implementation-related comments.

haydar-c added 30 commits March 28, 2025 16:08

pass 1st, 2nd, 3rd with a rough implementation of grids

c27bacc

tile compability with destroying incompatible ones

1535d9a

tile compability with checking before creating cluster

ef8c354

instepting speed with simple pass

3d26bfb

simplified first pass and search grids in manhattan

cdff896

sorting before processing

c67da89

added placement step

db36eff

reporting total clusters

e737efa

corrected reporting

2c04563

added neigbour search for cluster creation as well

41229b5

added neighbour search for orphan clusters as well

924073f

This works well for the reconstruction of provided flat placement. However, if used directly after GP output, if we have too many orphan molecules the runtime goes really high.

just added parsin max_rss for FL and whole run

72760c0

Merged master into this branch, solved conflict in qor parsin for ap

37a42fa

Makes my changes compilable after master merge

5761983

Merge branch 'master' into reconstruction_grids_with_LegalizationClus…

c02d48c

…terId

Merge branch 'master' into reconstruction_grids_with_LegalizationClus…

942274a

…terId

[AP][FL] Added Stats for debugging block num increase

d7a95b9

Stat reporting for reconstruction

a2efbe3

[AP][FL] Corrected displacement tile and atom reporting

1183393

Debugging the discrepancy between initial_placement and first pass

28b5836

Merge branch 'master' into reconstruction_grids_with_LegalizationClus…

5ba4a95

…terId

[AP] Reconstruction Legalizer

5ad9f8d

Prioritizing the long carry chains in the reconstruction pass.

haydar-c added 18 commits July 5, 2025 13:11

Merge branch 'master' into reconstruction_grids_with_LegalizationClus…

0e4e928

…terId

Reconstruction Legalizer: Clean physical_types.h

419aaf7

Reconstruction Legalizer: Clean analytical_placement_flow flat placement

de6285a

reading interface

Reconstruction Legalizer: Clean includes and args in full legalizer

21555fd

Reconstruction Legalizer: document get_molecule_logical_block_type

0285f59

Reconstruction Legalizer: Class structure

11fcc17

Reconstruction Legalizer: Clean full legalizer spaces

f1d1f97

Reconstruction Legalizer: full legalizer header spaces

95b8974

Reconstruction Legalizer: fix forward declarations in FL header

40b7786

Reconstruction Legalizer: Big Picture comment

41beb52

Reconstruction Legalizer: Class data types clean&document

70d9e92

Reconstruction Legalizer: FL header commenting finished

0471d36

Reconstruction Legalizer: clean partial_placement update

b708d33

Reconstruction Legalizer: clean load flat place

8936f3d

Reconstruction Legalizer: Add vtr_reg_strong/strong_ap/flatrecon_full…

ebb73b4

…_legalizer and its dependances

Reconstruction Legalizer: Added vtr_reg_nightly_test7/ap_reconstruction

1994250

Reconstruction Legalizer: Conver BasicMinDisturbance to FlatRecon and…

fdbb13c

… update documentations

Merge branch 'master' into reconstruction_grids_with_LegalizationClus…

876e6b2

…terId

github-actions bot added VPR VPR FPGA Placement & Routing Tool libarchfpga Library for handling FPGA Architecture descriptions docs Documentation lang-cpp C/C++ code lang-python Python code labels Jul 8, 2025

haydar-c marked this pull request as ready for review July 8, 2025 11:37

haydar-c added 3 commits July 8, 2025 07:48

make format

2e2299f

Revert librtlnumber internal_bits.hpp make format change

c94043a

Update std:sort and results with better ones

d414ab6

AlexandreSinger requested changes Jul 8, 2025

View reviewed changes

vaughnbetz requested changes Jul 9, 2025

View reviewed changes

soheilshahrouz reviewed Jul 10, 2025

View reviewed changes

		for (AtomBlockId atom_blk_id : atom_netlist.blocks()) {
		if (!atom_blk_id.is_valid()) {

	/**
	* @brief Returns the L1 distance something at the given flat location would
	* need to move to be within the bounds of a tile at the given tile loc.
	*/
	inline float get_manhattan_distance_to_tile(const t_flat_pl_loc& src_flat_loc,
	const t_physical_tile_loc& tile_loc,
	const DeviceGrid& device_grid) {
	// Get the bounds of the tile.
	// Note: The get_tile_bb function will not work in this case since it
	// subtracts 1 from the width and height.
	auto tile_type = device_grid.get_physical_type(tile_loc);
	float tile_xmin = tile_loc.x - device_grid.get_width_offset(tile_loc);
	float tile_xmax = tile_xmin + tile_type->width;
	float tile_ymin = tile_loc.y - device_grid.get_height_offset(tile_loc);
	float tile_ymax = tile_ymin + tile_type->height;

	// Get the closest point in the bounding box (including the edges) to
	// the src_flat_loc. To do this, we project the point in L1 space.
	float proj_x = std::clamp(src_flat_loc.x, tile_xmin, tile_xmax);
	float proj_y = std::clamp(src_flat_loc.y, tile_ymin, tile_ymax);

	// Then compute the L1 distance from the src_flat_loc to the projected
	// position. This will be the minimum distance this point needs to move.
	float dx = std::abs(proj_x - src_flat_loc.x);
	float dy = std::abs(proj_y - src_flat_loc.y);
	return dx + dy;
	}

FlatRecon: Flat Placement Reconstruction Full Legalizer #3193

Are you sure you want to change the base?

FlatRecon: Flat Placement Reconstruction Full Legalizer #3193

Uh oh!

Conversation

haydar-c commented Jul 8, 2025

Uh oh!

AlexandreSinger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!