-
Notifications
You must be signed in to change notification settings - Fork 426
FlatRecon: Flat Placement Reconstruction Full Legalizer #3193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlatRecon: Flat Placement Reconstruction Full Legalizer #3193
Conversation
This works well for the reconstruction of provided flat placement. However, if used directly after GP output, if we have too many orphan molecules the runtime goes really high.
The legalizations strategy of cluster_legalizer is set to SKIP_LB_ROUTING for the first passs. After first pass, each checked and illegal ones destroyed and their molecules are passed to second pass. The legalization strategy is set to FULL after first pass in that commit. Be careful with the place where you compress the legalizer and extract the atom lookup for placement.
Before that, the strategy was converted to full after the first reconstruction pass. After that, I was doing a neighbour search with full strategy. Now, I am doing neighbour search with SKIP_LB_ROUTING as well. After that, it requires last neighbour pass with FULL strategy. Also added ugly timer prints to be cleaned.
Changed the way of processing mainly in the first pass. In this version, they are first sorted by external pin number and then grouped into tiles. Then processed similar to naive by tiles but this one still checks the compatibility and capacity. Cleaning after each tile clusters created in that pass. Then, checking each cluster legality after they are created in the first neighbour pass. Second is already done with FULL strategy. The memory usage is decreased by nearly by %35. We lost runtime but it seems near 1-2%. TODO for that one: I added an if in neighbour pass that checks if the molecule is already clustered before checking its compatibility. This works now but ideally my algorithm should not try to add an already clustered molecule to another one. When removed, it failed on vtr_chain_largest for only LU32PEEng.v.
This is the version that results are presented in 2 May Friday meeting. The memory improved version is presented with master merged. We have things to try on top of that version. Handling the illegal clusters right away. Also added a 0.5f buffer into fixed blocks in partial placement verification to pass the assert. This should be handled more explicitely later.
In this version, after doing my packing for reconstruction, I am calling the initial_placement for ap flow. The results seems promising for this version and being added to the presentation. Especially reduces the max displacement as expected.
The displacement for each atom was being calculated as distance to the head of the tile. Now it considers the offset of the tile as well. That affects the percent of atoms displaced, max atom displacemetn and average atom displacement. Did no touched the cluster error for now. This can also be rewritten. Instead of being calculated solely on cluster location and centroid, we can try to use cluster information.
first pass if any failing cluster occurs In the first pass of reconstruction, creating clusters at a tile and checking whether they are legal or not. If illegal, try to cluster with FULL strategy there before going to next tile. Molecules being still unclustered passed to neighbour pass.
Prioritizing chained molecules in the reconstruction pass. Ensuring initial placer placing the clusters created in reconstruction pass first. Its current sorting also results in the same ordering due to standart deviation and size ordering. However, if used in reconstruction legalizer, I want to ensure that clusters created in reconstruction pass processed first.
Prioritizing the long carry chains in the reconstruction pass.
The output looks nice, except the neighbour clustering step doesn't print detailed stats while the other two steps do. I suspect that is an oversight @haydar-c ? |
Good catch. This was intentional but I see how it reads like an omission. To make this explicit, maybe I should update the summary to always show “0 clusters created” for this stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall; some suggested changes though.
After resolving the comments, you should re-run the full tests / QoR runs (nothing should change much, but we should be sure).
I suggest adding a small VTR design (spree) to the flat_recon tests in the basic tests. spree is very small (1229 primitives) but has DSP and RAM as well as logic so it gives good code coverage. Putting it in the basic tests also means we'll run the sanitized tests on it. Check it is fast though, as the sanitized tests slow down by 10x or more.
doc/src/vpr/command_line_usage.rst
Outdated
|
||
* The x, y, and sub_tile location of the cluster that contains this atom. | ||
* The flat site index of this atom in its cluster. The flat site index is a | ||
linearized ID of primitive locations in a cluster. This may be used as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove flat site index.
Add the file format (with a short example) to the file formats list in the appropriate .rst.
doc/src/vpr/command_line_usage.rst
Outdated
@@ -1291,6 +1303,12 @@ Analytical Placement is generally split into three stages: | |||
|
|||
* ``appack`` Use APPack, which takes the Packer in VPR and uses the flat atom placement to create better clusters. | |||
|
|||
* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is | |||
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add .fplace format description to the file format documentation.
doc/src/vpr/command_line_usage.rst
Outdated
@@ -1291,6 +1303,12 @@ Analytical Placement is generally split into three stages: | |||
|
|||
* ``appack`` Use APPack, which takes the Packer in VPR and uses the flat atom placement to create better clusters. | |||
|
|||
* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is | |||
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`) | |||
or with Global Placement output. In both cases, it expects the given solution to be close to legal. If used with a ``.fplace`` file (see :option:`--read_flat_place`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or on the (in memory) output of VTR's integrated Global Placement algorithm
doc/src/vpr/command_line_usage.rst
Outdated
* ``flat-recon`` Use the Flat Placement Reconstruction Full Legalizer which tries to reconstruct a clustered placement that is | ||
as close to the incoming flat placement as possible. It can be used to read a flat placement from a ``.fplace`` file (see :option:`--read_flat_place`) | ||
or with Global Placement output. In both cases, it expects the given solution to be close to legal. If used with a ``.fplace`` file (see :option:`--read_flat_place`), | ||
each atom of a molecule should share same location information. It is legal to leave some molecules unconstrained; the reconstruction phase will choose where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each atom of a molecule should share --> each atom in a molecule should have compatible location information.
*/ | ||
friend bool operator<(const t_physical_tile_loc& lhs, const t_physical_tile_loc& rhs) { | ||
if (lhs.layer_num != rhs.layer_num) return lhs.layer_num < rhs.layer_num; | ||
if (lhs.x != rhs.x) return lhs.x < rhs.x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split into two lines:
if (lhs.x != rhs.x)
return lhs.x < rhs.x
|
||
// Cast the partial placement to flat placement here. So that it can be | ||
// used to guide the initial placer and for logging results. This enables | ||
// the flow to be used with direct output of GP as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain p_placement has been set by the GP (if using internal VTR AP), or by reading in the flat placement file. Cast / copy it to the flat_placement data structures so we can always use them.
} | ||
|
||
// Run the initial placer on the clusters created. | ||
// TODO: Currently, the way initial placer sort the blocks to place is aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort -> sorts
|
||
// Run the initial placer on the clusters created. | ||
// TODO: Currently, the way initial placer sort the blocks to place is aligned | ||
// how self clustering pass clusters created, so there is no need to explicitely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass -> passes the
explicitely -> explicitly
vpr/src/base/read_options.cpp
Outdated
@@ -1841,6 +1841,11 @@ argparse::ArgumentParser create_arg_parser(const std::string& prog_name, t_optio | |||
"VPR's (or reconstructed external) placement solution in flat placement file format; this file lists cluster and intra-cluster placement coordinates for each atom and can be used to reconstruct a clustering and placement solution.") | |||
.show_in(argparse::ShowIn::HELP_ONLY); | |||
|
|||
file_grp.add_argument(args.write_legalized_flat_place_file, "--write_legalized_flat_place") | |||
.help( | |||
"VPR's (or reconstructed external) placement solution after legalization and before anneal in flat placement file format; this file lists cluster and intra-cluster placement coordinates for each atom and can be used to reconstruct a clustering and placement solution.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lists (x, y, layer) coordinates for each atom
(we aren't using intra-cluster coordinates)
site index from flat placement info data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! This is a great new feature.
This PR brings the FlatRecon full legalizer. It aims to reconstruct a given flat placement with minimum disturbance to the given solution. It can be used with a solution read from an '.fplace' file or with Global Placement output. However, it expects the given solution to be close to legal in both cases.
It has beed run on the MCNC, VTR Chain, Koios, and Titan benchmarks. It also been tested on the elfPlace placements generated from the aug-elfPlace on Titan benchmarks. [Results to be appended]
I have added 2 regression tests for that legalizer. The first into the vtr_reg_strong with fast checks on MCNC benchmarks and the second one into the vtr_reg_nightly_test7 with some of the Titan benchmarks.
The logged output of that legalizer looks like below for an example of the LU32PEEng.v from VTR Chain:
