Wednesday, March 7, 2012

Wed., Mar. 7th

Continue fiddling with CTU time resolution.  

I didn't yet try 56 bits WITH the enable fanout pipeline.  364-390 MHz.  Finally, a good bit better than originally.

Let's try setting Fitter -> Optimize Multi-Corner Timing.  347-379 MHz.  This actually did worse!  Possibly it is doing better at the fast corner.  Anyway, we probably want to optimize the worst case instead (especially since the timing analyzer isn't even calculating Fmax for the fast corner).

Let's try Analysis & Synthesis -> Perform WYSIWYG Primitive Resynthesis.  Still 364-390.  No improvement; so we'll turn that option back off.

Turn on Fitter -> More Settings -> Optimize Timing for ECOs.  Also in more fitter settings, turn placement & routing effort multiplier up from 1.0 -> 2.0.  Also set Router Timing Optimization Level to Maximum.  

249-272!!  Quite a lot worse!  What did I do wrong?

Let's back off the routing stuff (routing effort multiplier back to 1.0, Router Timing Optimization Level back to Normal).  340-371 MHz.  Weird.  Still not quite back to where we were.  

Let's turn the placement effort multiplier back to 1.0.  Back to 364-390.  Even though "Optimize Timing for ECOs" didn't help, I'll leave it on since it sounds good.

What if we turn the placement effort multiplier up to 4.0?  373-406!  Finally, better!

Now, let's try the routback to 1.0.er effort multiplier at 4.0.  360-392.  Worse!  What about 8.0?  Tried 2.0 again, 363-395.  Back to 1.0

Router Timing Optimization up to Maximum - 377-402.  The hot speed is a little faster but the cold speed is a little slower.  Still, 375 MHz --> 750 Msps, which is a 1+1/3 ns time resolution, or +/- 0.667 ns.  Not too shabby.

Let's now try a single-edge-triggered version.  Borrowing those files from the FEDM project.  

Wow, looks like Aarmondas didn't ever actually implement the sync_error output from the timing-sync datapath!  Commenting it out, and cs_bits.  Calling the new module se_rise_cap_56.vhd.

OK, the speed here is:
  • Hot (85C) - 876 MHz!! but restricted to 617 MHz due to high minimum pulse width
  • Cold (0C) - 951 MHz!! but restricted to 609 MHz due to high minimum pulse width
Given the restriction, neither is as fast in terms of samples per second as the dual-edge triggered version.  But to be fair, that version didn't have the reset/enable function.  Let's add it.  Actually, I take that back; Darryl never finished adding the reset functionality to the pulse-cap module!  Really, I should add it to both the single- and dual- edge-triggered versions of rise_cap_56.vhd.  

OK, now the top-level file for the single-edge-triggered version is:
  • COSMICi_DE3_GPSapp_top_se.bdf
    • Hot:  799.36 MHz restricted to 612.75 MHz
    • Cold: 860.59 MHz restricted to 606.8 MHz
And for the pseudo-dual-edge-triggered version is:
  • COSMICi_DE3_GPSapp_top_pde.pdf
    • Hot:   343.64 MHz (687.28 Msps)
    • Cold: 376.22 MHz (752.44 Msps)
OK, we went down quite a bit compared to our earlier 377-402 result, probably due to adding the resets.  But supposing we could run at 350 MHz, that's 700 Msps, still better than the single-edge-triggered version.

Trying changing Router Timing Optimization back to Normal.  Worse, 337-366.  Turn it back to Max.

Tried Placement Effort Multiplier at 8.0.  Much worse.

Trying Placement Effort Multiplier at 1.0.  Better!  (358-389 MHz).  At least it's a solid 700 Msps, or 1.43 ns time resolution, or about +/- 0.7 ns time error.  Possibly that's the best we're going to get with the reset & enable fully in there.

Let's modify the PLL clock speed target to 350 MHz (35x multiplier from the 10 MHz OCXO clock), since we definitely seem to be able to achieve that at least.  Next we'll need to do some testing.

One more thing to try first:  Take the enable-fanout pipeline out.  (You never know what permutations might speed things up.)  349-381 MHz.  No, that's worse; let's put it back in.

Just for fun, tried a placement effort of 2.0 again but it did worse.  Going back to 1.0.

OK, now changing the PLL clock speed target to 350 MHz.  We should be able to hit this target.

Now we've got 356-388 MHz.  Slightly worse than when we were shooting for 500 MHz, but still OK. At least we can meet the 350 target.

OK, now putting high_speed_logic_pde.bdf into the root LogicLock region.  Soon, we'll see if its speed keeps stable as we add more logic.

Compiling again with it in LogicLock just to make sure that didn't change anything.  Nope.

Now, what if I turn on "Compilation Process Settings -> Smart compilation" and "Comp. Proc. Settings -> Incremental Compilation -> Compatible placement and routing"?  Seems that might help since the Logic Locked stuff shouldn't be moved around anyway.

Shoot, I forgot to include the reset/enable logic in the dual-edge carry-save counter; need to do that now.  Will have to redo compile from scratch.

Also forgot the stuff to generate the timing-sync output pulse.  Added that back in too.

Currently we're AND'ing bits 6-17 of the "sum" outputs of the dual-edge-triggered carry-save counter; at 700 Mcps (million counts per second) this should be high for ~91.43 ns every ~373.5 us.  This will work even though the carry bits haven't yet been added in.  (As soon as the carries for the rollover start propagating in, the output of the AND will fall.)

Let's see what we get for speed now.  Met the target.  (355-382).

Oops, I added the Nios core in - need to take it back out & recompile.  Put it in my cut buffer temporarily.

Worse?  (327-351)  I swear, Quartus makes no frickin' sense sometimes...

OK, backed out from that change so we meet the target again.

Now putting the new version of high_speed_logic_pde back into the root LogicLock region.  No change in speed.  (355.49 MHz - 381.53 MHz).  Good.

I think that's a good stopping point for today.  Tomorrow I can finish wiring the Nios system back up to the high-speed module and test.  Oh, we also still need to add some version of cs_combine_56 (or whatever it was called) to add the sum and carry bits together; this is probably preferable to creating two more PLLs and doing it in software.

No comments:

Post a Comment