Skip to content

Conversation

haydar-c
Copy link
Contributor

@haydar-c haydar-c commented Jul 8, 2025

This PR brings the FlatRecon full legalizer. It aims to reconstruct a given flat placement with minimum disturbance to the given solution. It can be used with a solution read from an '.fplace' file or with Global Placement output. However, it expects the given solution to be close to legal in both cases.

It has beed run on the MCNC, VTR Chain, Koios, and Titan benchmarks. It also been tested on the elfPlace placements generated from the aug-elfPlace on Titan benchmarks. [Results to be appended]

I have added 2 regression tests for that legalizer. The first into the vtr_reg_strong with fast checks on MCNC benchmarks and the second one into the vtr_reg_nightly_test7 with some of the Titan benchmarks.

The logged output of that legalizer looks like below for an example of the LU32PEEng.v from VTR Chain:
lu32_for_PR

haydar-c added 30 commits March 28, 2025 16:08
This works well for the reconstruction of provided
flat placement. However, if used directly after GP
output, if we have too many orphan molecules the
runtime goes really high.
The legalizations strategy of cluster_legalizer is set to
SKIP_LB_ROUTING for the first passs. After first pass, each
checked and illegal ones destroyed and their molecules are
passed to second pass.

The legalization strategy is set to FULL after first pass
in that commit.

Be careful with the place where you compress the legalizer
and extract the atom lookup for placement.
Before that, the strategy was converted to full after the
first reconstruction pass. After that, I was doing a neighbour
search with full strategy. Now, I am doing neighbour search
with SKIP_LB_ROUTING as well. After that, it requires last
neighbour pass with FULL strategy.

Also added ugly timer prints to be cleaned.
Changed the way of processing mainly in the first pass. In this
version, they are first sorted by external pin number and then
grouped into tiles. Then processed similar to naive by tiles but
this one still checks the compatibility and capacity. Cleaning
after each tile clusters created in that pass. Then, checking
each cluster legality after they are created in the first
neighbour pass. Second is already done with FULL strategy. The
memory usage is decreased by nearly by %35. We lost runtime but it
seems near 1-2%.

TODO for that one: I added an if in neighbour pass that checks if
the molecule is already clustered before checking its compatibility.
This works now but ideally my algorithm should not try to add an
already clustered molecule to another one. When removed, it failed
on vtr_chain_largest for only LU32PEEng.v.
This is the version that results are presented in 2 May
Friday meeting. The memory improved version is presented with master
merged. We have things to try on top of that version.

Handling the illegal clusters right away. Also added a 0.5f buffer
into fixed blocks in partial placement verification to pass the
assert. This should be handled more explicitely later.
In this version, after doing my packing for reconstruction, I am
calling the initial_placement for ap flow. The results seems
promising for this version and being added to the presentation.
Especially reduces the max displacement as expected.
The displacement for each atom was being calculated as distance
to the head of the tile. Now it considers the offset of the tile
as well. That affects the percent of atoms displaced, max atom
displacemetn and average atom displacement.

Did no touched the cluster error for now. This can also be
rewritten. Instead of being calculated solely on cluster location
and centroid, we can try to use cluster information.
first pass if any failing cluster occurs

In the first pass of reconstruction, creating clusters at a tile
and checking whether they are legal or not. If illegal, try to
cluster with FULL strategy there before going to next tile.
Molecules being still unclustered passed to neighbour pass.
Prioritizing chained molecules in the reconstruction pass.
Ensuring initial placer placing the clusters created in
reconstruction pass first. Its current sorting also results
in the same ordering due to standart deviation and size ordering.
However, if used in reconstruction legalizer, I want to ensure that
clusters created in reconstruction pass processed first.
Prioritizing the long carry chains in the reconstruction pass.
@github-actions github-actions bot added build Build system lang-make CMake/Make code labels Aug 18, 2025
@vaughnbetz
Copy link
Contributor

The output looks nice, except the neighbour clustering step doesn't print detailed stats while the other two steps do. I suspect that is an oversight @haydar-c ?

@haydar-c
Copy link
Contributor Author

The output looks nice, except the neighbour clustering step doesn't print detailed stats while the other two steps do. I suspect that is an oversight @haydar-c ?

Good catch. This was intentional but I see how it reads like an omission.
In the neighbor clustering stage we never create new clusters; we only join molecules to already created clusters. So I included its contribution only in the “molecules clustered in each stage breakdown" and left “clusters created” blank.

To make this explicit, maybe I should update the summary to always show “0 clusters created” for this stage.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall; some suggested changes though.
After resolving the comments, you should re-run the full tests / QoR runs (nothing should change much, but we should be sure).
I suggest adding a small VTR design (spree) to the flat_recon tests in the basic tests. spree is very small (1229 primitives) but has DSP and RAM as well as logic so it gives good code coverage. Putting it in the basic tests also means we'll run the sanitized tests on it. Check it is fast though, as the sanitized tests slow down by 10x or more.


* The x, y, and sub_tile location of the cluster that contains this atom.
* The flat site index of this atom in its cluster. The flat site index is a
linearized ID of primitive locations in a cluster. This may be used as a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove flat site index.
Add the file format (with a short example) to the file formats list in the appropriate .rst.

@@ -1291,6 +1303,12 @@ Analytical Placement is generally split into three stages:

* ``appack`` Use APPack, which takes the Packer in VPR and uses the flat atom placement to create better clusters.

* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add .fplace format description to the file format documentation.

@@ -1291,6 +1303,12 @@ Analytical Placement is generally split into three stages:

* ``appack`` Use APPack, which takes the Packer in VPR and uses the flat atom placement to create better clusters.

* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`)
or with Global Placement output. In both cases, it expects the given solution to be close to legal. If used with a ``.fplace`` file (see :option:`--read_flat_place`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or on the (in memory) output of VTR's integrated Global Placement algorithm

* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`)
or with Global Placement output. In both cases, it expects the given solution to be close to legal. If used with a ``.fplace`` file (see :option:`--read_flat_place`),
each atom of a molecule should share same location information. It is legal to leave some molecules unconstrained; the reconstruction phase will choose where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each atom of a molecule should share --> each atom in a molecule should have compatible location information.

*/
friend bool operator<(const t_physical_tile_loc& lhs, const t_physical_tile_loc& rhs) {
if (lhs.layer_num != rhs.layer_num) return lhs.layer_num < rhs.layer_num;
if (lhs.x != rhs.x) return lhs.x < rhs.x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into two lines:
if (lhs.x != rhs.x)
return lhs.x < rhs.x


// Cast the partial placement to flat placement here. So that it can be
// used to guide the initial placer and for logging results. This enables
// the flow to be used with direct output of GP as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain p_placement has been set by the GP (if using internal VTR AP), or by reading in the flat placement file. Cast / copy it to the flat_placement data structures so we can always use them.

}

// Run the initial placer on the clusters created.
// TODO: Currently, the way initial placer sort the blocks to place is aligned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort -> sorts


// Run the initial placer on the clusters created.
// TODO: Currently, the way initial placer sort the blocks to place is aligned
// how self clustering pass clusters created, so there is no need to explicitely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass -> passes the
explicitely -> explicitly

@@ -1841,6 +1841,11 @@ argparse::ArgumentParser create_arg_parser(const std::string& prog_name, t_optio
"VPR's (or reconstructed external) placement solution in flat placement file format; this file lists cluster and intra-cluster placement coordinates for each atom and can be used to reconstruct a clustering and placement solution.")
.show_in(argparse::ShowIn::HELP_ONLY);

file_grp.add_argument(args.write_legalized_flat_place_file, "--write_legalized_flat_place")
.help(
"VPR's (or reconstructed external) placement solution after legalization and before anneal in flat placement file format; this file lists cluster and intra-cluster placement coordinates for each atom and can be used to reconstruct a clustering and placement solution.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lists (x, y, layer) coordinates for each atom
(we aren't using intra-cluster coordinates)

@github-actions github-actions bot added the infra Project Infrastructure label Aug 26, 2025
Copy link

@vaughnb-cerebras vaughnb-cerebras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! This is a great new feature.

@vaughnbetz vaughnbetz merged commit 4eec9fe into master Aug 28, 2025
30 checks passed
@vaughnbetz vaughnbetz deleted the reconstruction_grids_with_LegalizationClusterId branch August 28, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system docs Documentation infra Project Infrastructure lang-cpp C/C++ code lang-make CMake/Make code lang-netlist lang-python Python code libarchfpga Library for handling FPGA Architecture descriptions VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants