Sunday, February 26, 2012

Sun., Feb. 26th

Came in to do a couple more improvements to the FEDM input-capture architecture.  Unfortunately the main building doors were locked, but fortunately the loading dock door wasn't.  I think now that the only reason the main doors were unlocked yesterday is that apparently there was some kind of symposium going on here - that also explains why the parking lot was full.  There is a sign in the parking lot that says "reserved for symposium attendees."  I ignored it though and parked there anyway, since the parking lot is not very full today, so I conclude that the symposium was yesterday.  (It would be very unlikely for it to be held on a Sunday.)

NOTE:  If I put a new pipeline stage for the counter value in the pulse-capture datapaths, but don't put one in the timing-sync capture datapaths, then the timing-sync capture datapath will return time values that are 1 cycle ahead, relative to those obtained from the pulse-capture datapaths.  This can be corrected for in software, but it might be cleaner to do it in hardware if we have room to fit all 4 of the new 112-bit pipeline registers (instead of just 3).  I think I'll try it for now, but remove it if we have fitting problems later.

OK, so I've now created the following new module in Q:\:
  • cscnt_pipeline_register_56.bdf - This just uses two 56-bit-wide Altera DFF megafunction instances to buffer the sum and carry values.  I didn't bother including any reset/enable functionality, since the output of this module should quickly reflect the counter's behavior (with only a 1-cycle delay) in any case.  If there are speed problems, I could try replacing it with a custom VHDL module, since this seems to work better in some cases (perhaps only in the case where reset/enable bits are included).  But, I doubt this pipeline stage will end up being a performance bottleneck on either its input or output sides.  Its input is a register, and it fans out to only 5 places now (as opposed to 16 for the original counter), with only a little bit of logic delay (probably just 1 LAB's worth) in each place.
And, I inserted an instance of this module in the counter input of the following three modules:
  • pmt_ic_datapath2_56.bdf
  • pmt_ic_datapath_v3_56.bdf
  • tsedge_datapath_v2_56.bdf
I guess I could have done it at the top level, but that schematic is getting kind of crowded.  Or, I could have done it in pulseform_cap_56.bdf, and avoided having to put it in both versions of the pulse-capture datapath - but I wanted it not to be buried so deeply.  Anyway, these schematics may all end up needing to be reorganized later anyway, if it turns out that we have to split up the design in order to put the parts we want into a LogicLock region.  I'm hoping, though, that there might be a way to add instances into a LogicLock region without having to do that.  Maybe by adding them one-at-a-time through the region's Properties dialog?  That might allow clicking down into substructures...  Check for this capability later.

Anyway, for now, let's go ahead and try the compile...  This will take a while...  19 minutes.

Yikes, we're back down to 214.96 MHz, as opposed to 271 MHz yesterday!  Possibly the extra resource usage from the pipeline registers is making the fitter have to stretch more in general.  Let's look at the bottlenecks...  First one is:

inst23|inst3|inst3|inst3|fall_c_reg|\byte_arr:4:bit_arr:3:sedff_inst|prim_dffe_inst|datain

OK, so this node is apparently the input to the falling-edge time-capture register for the carry value.  Perhaps I was wrong to think that the output of the new pipeline register wouldn't still be a bottleneck?  What to do?  Add 15 more pipeline registers, one for each of the individual pulse-cap instances?  That will increase our resource usage quite a bit, and may lead to fitting problems.

Maybe first I'll try LogicLocking just the high-speed components.  OK, it looks like all you have to do is drag a representative instance into the region (its parents will get chosen arbitrarily), and then edit the entries in the Properties window to appropriately wild-card portions of the instance name so as to capture all of the desired instances.  So, now I've got the high-speed counter and the 4 new pipeline registers and the 15 pulse-capture modules and the timing edge-capture module all assigned to the root region.  That should be it for the high-speed components.

Let's retry the compile now...  What I'm hoping at the moment (at least, it would be nice) is that Quartus will first compile and optimize all the stuff that goes in the root LogicLock region, lock it down in its place, and then fill in the remaining parts of the design around it.  I don't know if it's really smart enough to do this.  If that doesn't work, then we'll have to do something more complicated, like dividing up the entire design into high-speed and low-speed portions, stubbing out all the low-speed parts, THEN freshly put the high-speed parts into the root logic-lock region, add the low-speed parts back in, and do an incremental recompilation.  That is straightforward but time-consuming and so I should probably leave it for the students to tackle.  Anyway, I am freezing cold in here, and need to leave soon.

Another thing to try is a different implementation of the pipeline registers, although you'd expect Altera's own DFF megafunction not to do TOO badly...

The new compile changed nothing, speed-wise:  Still 214.96 MHz.  Bottleneck still the same.  Darn, it looks like we'll still have to divide up the design.

One more thing to try: Allow Quartus to automatically insert pipeline stages as needed to meet timing constraints.  I've been avoiding this out of worry that it might mess up the timing of the high-speed logic.  But it might be worth a try.  Hm, looking...  So far, I've only found an option that adds pipeline stages for asynchronous reset signals.  That isn't the problem we're having.

Finally, one more thought:  We could remove the reset/enable inputs from the pulse-cap modules to hopefully improve their performance.  If we resort to this though, we'd need to think carefully about how this will affect the behavior of subsequent logic on startup/reset.

Anyway, that's all for today, I'm going to get some lunch and work on my visa application...

No comments:

Post a Comment