The timing driven optimization tried to measure actual physical performance rather the node/depth by individually techmapping each partition and comparing STA results.
The big flaw in the method was failure to account for timing burdens from parasitics and fanouts between partitions. K-means partitioning tries to minimize the number of connections between partitions, which creates partitions with high internal connectivity and critical high-fanout connections which span partitions. Without accounting for the fanout or timing the results were of little improvement over node/depth.
If we do a full circuit techmap or optimization, we have to try and reverse engineer the boundary of each partition, which causes a problem if optimization or mapping eliminates that original point in the graph. A solution is to create a PO for each wire, which forces the logical function at that point to be maintained, so it cannot be eliminated when treated as an intermediate result. SDC files can now target those points for timing constraints even after flattening.
Followup on #110 yosys integration, add a yosys command capture_wires which takes each wire in between partition sub-modules and adds a pseudo-PO. These could be marked with an attribute or id. Then if the circuit is flattened (e.g. to pass to abc) the partitions can then be reconstructed, perhaps an unflatten command which reverses.
A secondary problem is that the partition graph is possibly cyclic. To do a piecewise techmap, you would need to know the input delay coming from the previous partition (including register/input fanout!). If you treat the partition graph as a set of blackboxes, you could potentially piecewise calculate subcircuit behavior. Given other blackbox connection properties, the SDC file can be given those relative delays on the inputs which would preserve the timing. Capturing fanout requirements on partition outputs to other partitions is trickier. It may be possible to create a bunch of dummy buffer gates equal to the fanout so that abc has to account for a real fanout, but maybe that gets optimized out. Or maybe explicitly add in the first layer of gates from other partitions to be realistic. I don't think that yosys currently has direct timing integration. Cycles in the partition graph make it difficult to reason about timing or to incrementally calculate. I'm not sure how OpenSTA handles cycles in the graph, but it has to do it somehow.
The timing driven optimization tried to measure actual physical performance rather the node/depth by individually techmapping each partition and comparing STA results.
The big flaw in the method was failure to account for timing burdens from parasitics and fanouts between partitions. K-means partitioning tries to minimize the number of connections between partitions, which creates partitions with high internal connectivity and critical high-fanout connections which span partitions. Without accounting for the fanout or timing the results were of little improvement over node/depth.
If we do a full circuit techmap or optimization, we have to try and reverse engineer the boundary of each partition, which causes a problem if optimization or mapping eliminates that original point in the graph. A solution is to create a PO for each wire, which forces the logical function at that point to be maintained, so it cannot be eliminated when treated as an intermediate result. SDC files can now target those points for timing constraints even after flattening.
Followup on #110 yosys integration, add a yosys command
capture_wireswhich takes each wire in between partition sub-modules and adds a pseudo-PO. These could be marked with an attribute or id. Then if the circuit is flattened (e.g. to pass to abc) the partitions can then be reconstructed, perhaps anunflattencommand which reverses.A secondary problem is that the partition graph is possibly cyclic. To do a piecewise techmap, you would need to know the input delay coming from the previous partition (including register/input fanout!). If you treat the partition graph as a set of blackboxes, you could potentially piecewise calculate subcircuit behavior. Given other blackbox connection properties, the SDC file can be given those relative delays on the inputs which would preserve the timing. Capturing fanout requirements on partition outputs to other partitions is trickier. It may be possible to create a bunch of dummy buffer gates equal to the fanout so that abc has to account for a real fanout, but maybe that gets optimized out. Or maybe explicitly add in the first layer of gates from other partitions to be realistic. I don't think that yosys currently has direct timing integration. Cycles in the partition graph make it difficult to reason about timing or to incrementally calculate. I'm not sure how OpenSTA handles cycles in the graph, but it has to do it somehow.