Tuesday, March 6, 2012

Tue., Mar. 6th

I am thinking today that it might be a good idea to work on speeding up the edge-capture datapath in the GPS app gelware.  Currently, I am just driving it directly from the 10 MHz OCXO clock.  But I should be able to use a 10x PLL to run it at 100 MHz, and maybe even a 50x PLL to run it at 500 MHz, and if I can get a dual-edge-triggered version of the high speed logic to run at that clock rate, that would give us true 1-ns resolution for the arrival times of the PPS edges from the GPS.  That takes away one source of uncertainty (the low 100-ns resolution of our previous measurements) when characterizing the precise relationship the OCXO vs. GPS timing references - it will make that particular journal paper a lot better (if we ever finish writing that one).

This can also be a good point of comparison for the LogicLock stuff the students are doing in the FEDM design.

I think I want to start by creating a new top-level file:  A schematic version of the current top-level file in Verilog, DE3_GPSapp.v.  This will allow us to drag-and-drop components into a LogicLock region.  I also just prefer a visual design for my top-level files.

Let's start by recreating all of the I/O ports, namely:

output                     CLK_OUT;
input                     EXT_CLK;
input                     OSC1_50;
input                     OSC2_50;
input                     OSC_BA;
input                     OSC_BB;
input                     OSC_BC;
input                     OSC_BD;

////////// LED //////////
output     [7:0]           LEDB;
output     [7:0]           LEDG;
output     [7:0]           LEDR;

////////// SEG7 //////////
output     [6:0]           HEX0;
output                     HEX0_DP;
output     [6:0]           HEX1;
output                     HEX1_DP;

////////// BUTTON //////////
input     [3:0]           Button;

////////// SW (SLIDE SWITCH) //////////
input     [3:0]           SW;

////////// DIP_SW (DIP SWITCH) //////////
input     [7:0]           DIP_SW;

////////// MAX1619 (TEMPERATURE SENSOR) //////////
output                     TEMP_CLK;
inout                     TEMP_DATA;
input                     TEMP_INTn;

////////// GPIO0 (J13, GPIO 0) //////////
inout                     GPIO0_CLKINn0;
inout                     GPIO0_CLKINp0;
inout                     GPIO0_CLKOUTn0;
inout                     GPIO0_CLKOUTp0;
inout     [31:0]           GPIO0_D;

////////// GPIO1 (J14, GPIO 1) //////////
inout                     GPIO1_CLKINn1;
inout                     GPIO1_CLKINp1;
inout                     GPIO1_CLKOUTn1;
inout                     GPIO1_CLKOUTp1;
inout     [31:0]           GPIO1_D;
//inout [31:23] GPIO1_D;
//input [22] GPIO1_D;
//inout [21:0] GPIO1_D;


////////// REGULATOR //////////
output                     JVC_CLK;
output                     JVC_CS;
input                     JVC_DATAIN;
output                     JVC_DATAOUT;

Now adding the structural contents.  OK, I finished putting in that schematic.  Fixed one typo and it compiles.  Here is a bird's eye view of that schematic (hard to see w/o zooming in):

Initial version (v0.0) of the schematic version of the CTU gelware.
I really should test it, but we can always put that off until the point if/when we're trying to diagnose problems in later versions of the design.  For now, I will just archive this version of the file as _top_0v0 and proceed with my modifications.

The next thing we want to do is create the PLL slaved to the OCXO clock.  Let's jump in and try a 50x multiplier.  OK, created that PLL (wizard says feasible) and having it drive CLK_OUT as a test.

P.S. - All this work today is happening in my local working directory C:\f\DE3\S3\SB+SOPC\GPS_FPGA_app\Quartus_II_Project\DE3_GPSapp\ on COSMICi.  I should make a backup on Dropbox soon.

OK, it compiles.  Need to get the timing analysis settled though.  Opened TimeQuest.  Created a new SDC file, COSMICi_DE3_GPSapp_RevC.sdc.  Looks like it did the right thing, based on the PLL parameters.   Added a "derive_clock_uncertainty" command, so we don't get warnings about the clock uncertainty not being set.  Let's redo the compilation & analysis with the new SDC file.  Oops, looks like the SDC file generator assumed that the period of the board clock was 100 ns instead of the actual 20 ns - fixed that.  Oops, that happened b/c I was feeding the wrong clock to the PLL!  OK, after some fiddling around with the SDC file, both input clocks are correctly entered now.

For some reason, we are failing the timing analysis with a -0.877 ns hold time violation on OSC2_50 (the 50 MHz board clock).  Maybe I should take out all my logic and try again.  :)

OK, with all the core logic removed, we meet the timing constraints.  When I go through adding the logic back in, I will have to be careful about that.

The Fmax at 85C is 820.34 MHz for the 50 MHz clock.  How do I get it to report it for the PLL clock?
Ah, it's not b/c no registers are being generated for the PLL clock at the moment.

OK, next I am going to create a module called "high_speed_logic.bdf" that will contain all the high-speed parts of the design, to facilitate logic-locking.

OK, now that module has just the PLL and a 64-bit dual-edged carry-save counter.  The output of that module feeds to a couple of 64-input AND gates for dummy output purposes.

OK, now Fmax for the PLL is 503.52 MHz at 85C and 545.85 MHz at 0C.

What happens if we turn on aggressive optimizations?

* Fitter - Go from Auto Fit to Standard Fit.
* Compilation Process/Physical Synthesis Optimizations - Turn on Perform Physical Synthesis for Combinational Logic, Perform Register Retiming, Extra Effort level, Perform Register Duplication.
* Analysis & Synthesis - Optimization for Speed, Timing-Driven Synthesis.

OK, after all that, the 85C speed is 548.25 MHz and the 0C speed is 590.32 MHz (would have been 603.14 MHz without the minimum pulse-width constraint).

In other words, this is reaching almost 1.2 GHz update rate for the counter.

However, we still need to add the input-capture logic.  Let's get the original version of pulse_cap.vhd from Q:\ and trim out the stuff to capture the falling edge.

pulse_cap needs pde_dff2, pde_dff_en, pde_reg_en, and pde_shift_reg.  Copy those too.

We might also need the stuff to pipeline the fanout of the enable signal, for this refer to Q:\se_pulse_cap_tsedge_56.vhd.

OK, now I've added pulse_cap to the high-speed logic module.  Haven't pipelined the enable signal yet; also haven't taken out the falling-edge capture logic.  Let's try it though.

Oops, now we're down to 356-390 MHz.  Let's try pipelining that enable signal.

While I'm at it, I also removed the extra state & logic to capture the fall time, and renamed the module to pde_rise_cap_64.vhd.  (Pseudo-dual-edge-triggered rising-edge time-capture module, 64-bits wide.)

326-357 MHz.  We got worse, not better!  Ouch!  However, it's worth noting that the update rate for 350 MHz is 700 Msps, since this design is dual-edge triggered.  Anyway...  I guess I should back out from the pipelined enable, huh...

What the hell, let's stick this puppy in a logic-lock region, just to see if it makes any diff.  Nope, it didn't make any difference at all - guess I already had enough optimization settings turned on.

OK, taking out the pipelined enable now.  Some more things to try:  (1) Go back to single-edge triggered.  (2) Cut down from 64 bits width to 56 bits width.  Not sure why the latter would help tho.

Now we're up to 338-371 MHz, a bit better, but still less than before I removed the falling-edge capture logic?  This makes no sense at all.

Let's first try going to 56 bits, since that's a pretty easy change.

Now we got 357-393.  Only very slightly better than before I started "simplifying" the logic!

I think I'm going to have to revert to a single-edge-triggered design...  On the other hand, 700 Msps isn't too shabby...  It's a heckuva lot better than the 20 Msps (50 ns) we are getting currently.

Actually no, we're only getting 10 Msps (100 ns resolution) because we're not even using the pseudo-dual-edge registers in the current CTU design.  So it would be an improvement by a factor of 70.


No comments:

Post a Comment