Monday, May 9, 2011

Phasers locked, captain...

What to do this week?

At some point, I should probably go back to working on the journal article. I was thinking for a while that if I did some analysis of the ring oscillator as a short-time-slice TDC element, that could be usefully integrated into the article. That may still be true, but it is taking a while.

Let's consider, for a moment, what the next steps would be along that path.

1. Create an input-capture circuit (with appropriate synchronizer stages) to register the OCXO edges against the ring-oscillator half-cycles. At 1.7 ns half-cycle for the RO, we would expect to see about 29.4 RO half-cycles per OCXO half-cycle. 6 bits (unsigned values 0-63) would be adequate to encode these deltas. Or, if there isn't too much variance in the RO period, even just 4 bits (signed values -8 to +7) would be more than adequate to encode the discrepancies in the deltas relative to some "expected" value (say 30). So that's 8 bits per 100 ns period. So it would take about 26.8 seconds to entirely fill up a 256 MB block of DDR SDRAM with 2^29 = 536,870,912 of those 4-bit samples.

2. Add a command to the firmware to (at a desired time) initiate one of these half-minute data-collection runs, and then stream the data to the server for processing. The data collection routine itself will probably have to execute in a custom state machine, because since the DE3 board has only a 50 MHz built-in clock, so we will generate a new data point once every 2.5 CPU clock cycles (50 ns), and this is almost certainly not enough cycles for a software loop (whether polling or interrupt-based) to pull the data from a PIO register and then write it to SDRAM using the type of HLL call used in the demo code. Therefore, this gets tricky because we have to replicate what such a call is doing in our own custom state machine. In other words, we need to create our own host device for the Avalon bus fabric.

Also, at the 57,600 baud rate we're using for the serial comm. link to the EZURiO board, and with a minimum of 10 bit-periods per byte (8 data, start, stop), the data rate for the data upload to the server is at best 5,760 BPS, so a 256 MB data transfer will take ~46,603 secs. = 776.7 min. = 12.95 hr. = basically one overnight. To avoid this bottleneck, we should perhaps consider interfacing to an Ethernet card (there isn't one already built into the DE3, unfortunately) and thereby sending the data directly to the server in real time. Unfortunately, there isn't an Ethernet port already built into the DE3, so we would have to add a daughter card, like this one: http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=71&No=355. If we added a little Wi-Fi client-mode router (like this: http://www.dlink.com/products/?pid=346), that would re-establish wireless connectivity to the server. But still, we have to deal with all the complexity of interfacing to the network (using a whole TCP/IP stack and the like).

Or, the other option is to forget about offloading the data to the server, and instead just do the desired data analysis directly in the embedded firmware. This should be pretty straightforward, and makes a lot more sense. It shouldn't take long. Then all we have to transmit to the server is, say, the Allan deviation results for (say) about a thousand points on a logarithmic time scale, ranging from 1 to 512M half-cycles (basically 9 orders of magnitude).

Finally, if we decide to actually use the ring oscillators for timing of individual sensor events on the FEDM board, then we may want to think about doing some calibration & measurement of their frequency variations on the fly in the sensor application.

One other thing to think about: Using the phase-locked-loop (PLL) module included in the Stratix II to create faster clocks that are synced-up with the board clock. We have an EP2S30 class FPGA which has "fast PLLs" 1-4 and "enhanced PLLs" 5-6. The "enhanced" PLLs support clock frequency multiplication up to 512x, and the "fast" PLLs support up to 32x.

This raises the possibility that we could use the PLLs to sync up the FEDM's clocks with the 409.6-us sync pulses from the CTU, in a simpler manner than by constantly registering all these multiple clocks against each other. A simple circuit using the built-in 10MHz clock could convert the 409.6us-period, 100ns-pulse-width pulse from the FEDM into an approximately 50% duty-cycle clock (with a precisely-timed rising edge) suitable for feeding into a PLL. After going through one "enhanced" stage with a 512x multiplier, this gives us an 8-microsecond (125 kHz) clock slaved to the CTU. After a 2nd 512x "enhanced" stage, we have a 15.625ns (64 MHz) clock. Then after a 16x "fast" stage, we have a ~0.986ns (1.024 GHz) clock. Let's see if that's too fast for the FPGA.

OK, the EP2S30 is at a "-3" speed grade, the minimum clock high and low times are 612 ps each (table 5-37 from Stratix III datasheet). This implies a minimum clock period of 1.224 ns, or a maximum frequency of 816 MHz. We could get close to that by using a multiplier of 12x in the third PLL; then the period would be 1.30 ns and the half-cycle would be 0.65 ns, if we can get away with using that to drive a PDEDFF-based carry-save counter, then that would give us less than 1 ns time resolution on the input capture circuit that finds the level-crossing times in terms of the half-cycles of this fast clock. As long as the PLLs are doing their job, these times should be precisely defined relative to the master clock that comes from the CTU sync pulses.

Oh, actually, it's not going to be quite that good... The maximum PLL output frequency for the Stratix II is only 550 MHz. Still, twice that is still over 1 GHz. The Stratix III goes up to 600 MHz.

There may be an issue with the minimum frequency of the PLL. The minimum input clock frequency is 2 MHz. So, we cannot go directly from the CTU's 409.6-us sync pulses. However, we could base it off the 50 MHz TCXO board clock instead... If we multiply this by 11x, we get the max PLL output frequency of 550 MHz. The period is 1.81 ns and the half-period is 0.91 ns. Still a little better than 1 ns. And we can measure the sync pulse arrival time in units of that, and the cosmic ray shower pulse arrival time in units of that, and thereby get in the neighborhood of the desired accuracy.

OK, I instantiated an ALTPLL Megafunction variation for an 11x clock multiplier, and used it to generate a 550 MHz (later 600 MHz) clock from the 50 MHz board clock. That worked just fine, although as before, the waveform at that speed looked pretty rounded (sine-wave) - although again that may be just due to the board/probe cable. On-chip the signal may look better. The acid test will be to use this signal to drive the carry-save counter.

Phase-locked loop test on DE3 board. Top: 50 MHz board clock (digital trace).
Bottom: 12x (600 MHz) output from PLL (analog & digital traces superimposed).

Talked to Ray for a while about the strategic issue of whether to proceed with trying to reverse-engineer Sachin's stuff well enough that we figure out how to add more TDCs as needed for our absolute time measurements, or instead just redo the design by just counting cycles (or half-cycles) of a single fast oscillator (like the 500 MHz one I just made with the PLL). Really, it comes down to the question of whether we really need better any than 1 ns resolution on the pulse width. Ray is going to look at the science (e.g., difference between shower front development in neutrino vs. hadron initiated shower) and give me an answer on that. However, his feeling is that pulse width differences below 1 ns probably aren't going to matter. In which case, we should probably proceed by just redoing the gelware with our own design. We can rip out much of what Sachin has done and redo it. We still need the ability to program the DACs, but everything else can be re-done from scratch in our own way. We can design a little input-capture circuit to get the rise/fall times of each pulse, and just replicate it for each of the threshold comparators (LVDS inputs). Then we can have firmware transfer the data to the PC however we want.

On tap for tomorrow: (1) Make sure that I can actually drive my pseudo-dual-edge triggered carry-save counter using this 600 MHz clock (i.e., @1.2M counts per second). (2) Design input-capture circuit around that counter. (3) Use it to capture rise/fall times of an input pulse.

No comments:

Post a Comment