Tuesday, May 31, 2011

Troubled Tuesday

The day started with my cellphone being half-dead. No data service from AT&T anywhere in the Tallahassee metro area. The day is two-thirds over now, and it is still down. Stupid AT&T!!

Next, spent $100 at the vet for some medication for the dog, AGAIN, for his persistent skin problem that never seems to go away, no matter how many treatments we give him. Stupid parasitic organisms! (Am I talking there about infectious bacteria, dogs, veterinarians, or humans in general? Hmm...)

Third, went to the auto shop, only to find that my car needs a new engine, $3,000, which is more than the car is worth. Should I fix or junk it? Buy a new car? Or used car? Every path has its risks... Stupid auto industry, they always manage to bleed you dry one way or another... Anyway, before I decide, need to find out first whether I still even will have a job this Fall...

Fourth, went to talk to someone to ask about a possible job for the Fall, only to find that he wasn't there today, and nobody knew where he was... Guess he forgot about our meeting... Guess I'll try again another day. But I am not optimistic. [long political rant elided]

On that cheery note, let's turn our attention to the tasks we need to do today... (Forgive the rant; I'm in somewhat of a dour mood today, as you can see...)

ACTUAL WORK STARTS HERE...

OK, the guys are here, and they are (1) downloading an Ubuntu ISO file for installation on a USB flash stick so we can boot the Acer into Linux from that, and then hopefully repair the problem we were having bootloading into XP and Linux.

They are also looking for output pins to use to test, say, the low 8 bits of the difference between rising and falling edge times.

Mike (or somebody) needs to figure out whether converting to integer through unsigned preserves nonstandard bit-position values. OK, I did a little test and it does not - the rightmost bit in the bit-vector is always bit 2^0 (1's place), and as you go left, the bit-values double. OK, so that's easy to fix, just multiply the carry bit-vector by 2 after converting it to integer.

Mike still needs to finish writing the FIFO_reader module which he started (barely) last week.

The guys finished identifying an array of pins (nicely arranged in sequence) which could be used to output the trailing-minus-leading edge time difference. We hooked up Ray's pulse generator, set it to generate a 3.3V (Z=50 ohm) pulse at variable pulse width. We setup an 8-bit "bus" digital input configuration on the scope. They found an input pin we can use.

Unfortunately, two problems:
  1. Since we only have spaghetti wire connecting from coax cable (from pulse generator) to the board, there is lots of ringing and it is a very ugly pulse shape.

  2. There is no data showing on the 8-bit output bus (constant 00 value).
To eliminate the messy input signal as the cause of the problem, the guys are writing a little FPGA-internal pulse generator module in VHDL which will generate a clean 20-ns pulse (one board clock cycle) periodically. This can be used as a baseline for testing the pulse_cap (and pulse_cap_test) modules.

Back on the XP booting issue. We got Ubuntu onto the USB stick, booted into it, and copy-and-pasted that special folder with the 3-letter name back from the Temp folder on that one partition back to the "System Reserved" (boot) partition where it was originally.

Now we can boot into Mint again, but still can't boot into XP. Emailed Juan for more assistance.

Thursday, May 26, 2011

Blasting them Bytes...

Yesterday I found a DB-25 parallel cable gender bender at Fouraker's. And the parallel-USB adapter arrived. Today I brought in these items & my old parallel cable from home.

Darryl is here, along with the others; getting him set up with accounts.

We tested the parallel cable with the ByteBlaster on Mike's computer (with our programming files from last Tuesday; the 450 MHz counter test); that worked.

Now the guys are installing the ByteBlaster driver on the Acer (Windows 7). Oops, there's a note that says that it doesn't support 64-bit Windows 7, and that's what we have! So now, getting the XP partition up and running again (assuming it's 32-bit Windows) is a bigger priority. Tried booting into Linux, but apparently the bootloader is what is complaining that the files in C:\NST aren't available. We may need to boot off a Linux boot CD. Emailed Juan for assistance. We'll set this aside for now.

After that, I think the first thing they will do today is just add an additional counter probe bit, say bit 32. Meanwhile, I am working on the FIFO_reader module.

Great, bit 32 works - its frequency, according to a scope measurement, is 104.8 mHz, which is 2^32 times slower than 450 MHz.


Top to Bottom: Counter bits 0 (violet), 4 (blue), 8 (yellow), and 32 (green) with horizontal scale set to 40 ns/div (first screenshot) and 2 s/div (second screenshot) respectively.

Now they are working again on the pulse_cap_test module.

Spent some time talking with Darryl about the project and going through some of the code.

Wednesday, May 25, 2011

Following the Datapath...

Today the plan is just to continue with the detail design of the input capture datapaths.

I'm thinking that perhaps I could ask the students to do the detail design of the simpler IC datapath for the timing sync pulse, while I continue working on the more complicated one for capturing the PMT pulse waveform.

I brought a parallel cable in from home, since our ByteBlaster cable is not long enough, making testing awkward at present, but I need a F/F gender bender for it. Maybe I can pick one up at Fouraker's or Radio Shack on the way home this evening - need to leave early though to do that, since Fouraker's closes at 6, if I recall. If not I can swing by on my way in to work tomorrow.

Getting reading to write the FIFO_writer and FIFO_reader modules. Studying the FIFO datasheet (SCFIFO, synchronous variant). From the timing diagrams, it looks like the I/O should be done on the falling clock edge, so that all levels are valid on the rising edge when the FIFO module operates.

I finished writing the FIFO_writer module, although it is not yet tested. It is dual-edge-triggered, so that it can communicate with our other modules on the usual (rising) edge, while communicating with the FIFO on the falling edge as required by the FIFO datasheet. However, to enable me to still use variables & behavioral code in PROCESS statements, I compute the current state dynamically (i.e., combinationally) from a bunch of one-hot state flag variables which are each managed in pseudo-dual-edged fashion using the XOR trick.

The FIFO_reader module should be pretty easy, and rather similar to FIFO_writer in its internal design. All it needs to do is:
  1. Wait (on falling edges) for the FIFO to be non-empty; and then...
  2. Raise the RDREQ signal for one cycle to read out one data packet from the FIFO; then...
  3. Place the data on the output lines and assert a signal that (via a control-register PIO) generates an IRQ telling the CPU that there is data ready to read; then...
  4. Wait for the CPU to assert a bit (in a PIO control register) telling us that it has finished consuming the last data packet that we sent; then...
  5. Return to the initial state (waiting again for the FIFO to be non-empty).
It would probably be safe for FIFO_reader to be entirely falling-edge-triggered, since besides the FIFO, the only other thing it communicates with is the set of PIO devices on the Avalon bus, and those devices are designed to be able to handle asynchronous input anyway.

Tuesday, May 24, 2011

Pulse Width Capture Testing

Turned in my timesheet today but it was late again... They're due Mondays this summer, due to the compressed furlough schedule. :(

Finished placing the requisition for the SMA connectors we need, asked Ray to approve it ASAP.

Got an email from Vaibhav, saying he will be available to help looking forwards.

Tyler and David are here, and I've asked them to start designing a new module pulse_cap_test.vhd to test the pulse_cap module - simply by copying the input data to input registers, adding the sum and carry bits, and writing the result to output registers. Then we can send a pulse of known width into the signal input pads with the waveform generator, and look at (say) the low 8 bits of each output register, or (measuring one at a time) the low 16 bits. If, say the input pulse is 10 ns wide, then at 450 MHz there should be 9 counts in between leading and trailing edge. Actually, even better, we can subtract (trailing - leading) in the test module itself, and then just observe the difference between these, as up to a 16-bit number. So in this test, we should be able to measure pulse widths up to about 72.8 microseconds (delta of 65,535 half-cycle time steps, each 1.11... ns at 450 MHz).

Mike is doing some filing while Tyler & David are working on that.

OK, got my desk cleared off. Now I am working on implementing the next group of modules for the PMT datapath. Here is what I have envisioned currently: An intermediate-level module our_FIFO that includes 3 submodules:
  1. FIFO_writer: Consumes a data packet from pulseform_cap and inserts it at the tail of the FIFO queue. If the FIFO is full, asserts a complaint signal (BUF_FULL) which can be serviced by a CPU interrupt (to generate a warning in the serial text output stream to tell the system operators that some pulse data may have been lost).

  2. pulseform_FIFO: This is just a generic FIFO megafunction variation. It is 774 bits wide and (for now) 16 stages deep. 774 = 6x(1+128), since we have 6 thresholds, and for each threshhold we have 1 "crossed" bit and 2*64=128 bits to indicate the leading/trailing edge time. Note that, if needed, we could compress the representation significantly (and thus, save memory in the FIFO) by storing a 64-bit "base" time, and then say 8-bit offsets from this for all the time values. This, however, assumes that we will never see a pulse more than 256 ticks long (about 284 ns). Is this a good assumption? Probably, but I'm not certain yet... Need to ask Ray at some point. Let's wait and see if we run into memory problems first though.

  3. FIFO_reader: Consumes a data packet from the FIFO, and then delivers it to the CPU via a set of PIO interfaces. The actual data can be delivered in a set of twelve 32-bit input-only PIOs, with all handshaking done via a single 8-bit bidirectional control PIO, as envisioned earlier. Oh, and the "crossed" bits can be delivered in another 8-bit input-only PIO.
So far, I created the FIFO megafunction variation, and started wiring up the top-level our_FIFO schematic. (Doing it as a schematic rather than VHDL since it itself doesn't require any logic.)

Monday, May 23, 2011

Capture of the Inputs

This week my goal is finish implementing and testing the input-capture datapaths for the PMT inputs as well as the APD input. Below is the (greatly simplified) datapath for the APD input which provides the 409.6-us-period timing sync pulse. The pulse input capture module here can be the same as the one for the PMT input capture datapaths. But, because there is a guaranteed delay between subsequent pulses, we can do away with the FIFO, and rely on the CPU responding to the PIO edge-capture interrupt immediately after each pulse arrives, before the next one arrives.
First today, I got the pulse_cap.vhd module to the point of compiling, at least up to block symbol generation.

Next, Tyler and David were here, and they worked on getting the PLL+DECS counter ready to test on the FEDM board. We verified that bit 4 and bit 8 worked at up to 450 MHz PLL frequency (900 MHz count rate). This gets us very close to the desired 1 ns resolution (actually it is 1.11 ns), but it can perhaps be further optimized.

Later on, we should test additional counter bits, up to at least, say, bit 32. (At 900 MHz count rate, bit 32 should have a period of 9.54 seconds.)

Mike is ordering the SMA connectors which we will need for board testing.

Friday, May 20, 2011

Furlough Friday

Although FAMU's summer furlough does not officially apply to research staff (the lights are still on in the research building on Fridays), I am voluntarily restricting my work days to Monday-Thursday this summer, with the goal of saving gas money by only driving into Tallahassee four days a week instead of 5. I am only working half-time anyway, and it is easy enough to just work 5 hours a day on M-Th afternoons.

However, I may still work from home some Fridays, when I feel like it. I'm spending a little time from now working on the design of the input capture pathway.

Just now, I investigated the FIFO device for Avalon ("On Chip FIFO Memory") in SOPC Builder. It assumes that both the producer and consumer of the data are on the bus fabric, so is not suitable for our application, in which the data producer is custom gelware outside of the SOPC-Builder-generated system. Instead, we will put a FIFO in our custom gelware, and just use PIOs for the communication to the processor.

Let's go ahead and start outlining the design in more detail. The following input capture path will be need to be replicated in parallel for each input channel (that is up to 5 times, including one channel for the 409.6us timing sync input pulse, and one for each of the four PMTs at a site). However, all of the input capture paths will reference the same DECS (dual-edged carry-save) time counter (for consistency between their reported time values).

Current plan: For each input capture channel, have four 32-bit data input PIOs, as follows:
  • leading-low: Least significant 32 bits of the time value of the pulse leading edge.
  • leading-high: Most significant 32 bits of the time value of pulse leading edge.
  • trailing-low: Least significant 32 bits of the time value of the pulse trailing edge.
  • trailing-high: Most significant 32 bits of the time value of pulse trailing edge.
In addition, have one 8-bit bidirectional control/handshaking PIO called icpath_control, with the following bits:
  • Bit 0: nRESET (used as output only) - CPU sets this to 0 to reset the IC (input capture) path, 1 to let it operate normally.

  • Bit 1: nPAUSE_RUN (used as output only) - CPU sets this to 0 to freeze this input-capture channel, 1 to let it run freely. When paused, input pulses will be ignored entirely.

  • Bit 2: HANDSHAKE (used bidirectionally) - The FSM at the FIFO head sets this bit to 1 when data for a new pulse is in the PIOs and is ready for the CPU to consume. This causes an interrupt. The CPU then sets this bit to 0 when it has consumed the data; after this gelware is allowed to write new values to the PIO data input registers.

  • Bit 3: BUF_FULL (used as input only) - If an input pulse is captured but the FIFO is full because the CPU has not consumed previous pulses quickly enough, so that there is nowhere to put the pulse data, and the input capture circuit is stuck in dead time, the gelware asserts this signal to indicate to the CPU that any new input pulses will be inadvertently lost. This causes an interrupt, so that the firmware may report a warning about this condition. Later, when the CPU has consumed some pulse data, so that the input FIFO is no longer full, the gelware will deassert this signal.

  • Bits 4-7: Reserved for future expansion.
In addition to the above for each IC channel, we can have one more 8-bit output PIO called cntr_ctrl (counter control), which can be similar to the one I already am using in the GPS time application.

The overall architecture of the input capture system is as follows:
One thing that is a bit odd about this architecture at the moment is that each threshold level has its own input capture path, which is strange because actually we expect the different thresholds to be crossed at around the same time.

What we could do, then, is modify the input capture FSM for the PMT inputs so that, after the first threshold is crossed, it collects level-crossing times for all of the thresholds, until we reach the trailing edge for the first threshold. Then, that packet of data (leading and trailing edge times for all 5 thresholds) is stuffed all at once into the FIFO. The width of the input capture data path is then 5x64 = 320 bits.

The only problem is, this makes the structure of the input capture FSM more complex, which means the state-update logic might have trouble executing within one PLL clock cycle. So we may need to hand-optimize the state-update logic to some extent.

OK, so here is the new architecture:
All that is different here is that the input capture path for the PMTs is 5x wider, but there are 5x fewer paths.

We need to come up with a representation for threshold levels that are not crossed in a given pulse. For each level above the lowest one, we can include a bit that indicates whether that level was crossed at all in the current pulse. This indicates whether the next 128 bits (denoting the 64-bit leading/trailing edge times) are valid, or should be ignored.

Bad things will happen if there is glitching on any of the LVDS inputs during the leading edge - specifically, we might capture the glitch instead of the actual pulse. However, these bad data points can always be identified and thrown away in later data analysis.

One thing that will help our performance is to have a separate high-speed state machine for each threshold that just stuffs the pulse data in a register; and a slower state machine that gathers the data for all thresholds from the registers and stuffs them into the FIFO when everything is ready.

Just drafted the following needed modules:
  • pde_dff_en.vhd - Pseudo dual-edged D flip-flop with enable.
  • pde_reg_en.vhd - Pseudo dual-edged register with enable.
  • pde_shift_reg.vhd - Pseudo dual-edged shift register (for synchronizers & edge detectors).
  • pulse_cap.vhd - Module to capture rise/fall times for a single digital pulse signal (i.e., level-crossing times for a single analog signal to cross a single threshold level).
NOTE: Currently, pulse_cap.vhd uses a bidirectional single-wire handshaking signal to hand off data to its consumer. This could lead to problems since this signal may be left floating for extended periods unless we add a pulldown. But that may cause problems too, since we need it to stay high by itself for at least 20 ns until the next consumer clock edge. If we can't get this working, we should back off and go to a 2-wire handshaking protocol.

Next up, need to write the following modules:
  • pulseform_cap - A 50 MHz FSM (in behavioral VHDL) that (in parallel) gathers up the rise/fall data for all 5 thresholds for a given analog input signal, and writes that data to a bank of output registers which are then handed off to a consumer process.

  • FIFO_writer - Another 50 MHz FSM which takes the data packet from pulseform_cap and appends it to the tail of a FIFO queue. If/when the FIFO becomes full, it complains by asserting a BUF_FULL signal that alerts the CPU of the problem.

  • pulse_FIFO - This can be a standard LPM/ALT megafunction.

  • FIFO_reader (PIO feeder) - Pulls data items out of the FIFO and presents them to the PIO so that the CPU can consume them.
One more note: Actually, we can greatly simplify the architecture for the input capture datapath for the timing sync pulses. Since we only get one of those every 409.6 us, we can safely assume that the CPU can service the interrupt for each one before the next one occurs. So the FIFO and associated feeder modules are not necessary.

Here's the new design for the input capture datapath for just 1 of the 4 PMT inputs:

Thursday, May 19, 2011

Serial Savior

Adding the Sparkfun level shifter (powered by the +3.3VDOWN net, for LVTTL compatibility) to the FEDM serial port fixed the serial communication problem I was having yesterday. Did a successful output test at 9,600 baud. Recompiling gelware now for 57,600 baud. Not going to bother trying 115,200 baud, since I experienced intermittent problems with that speed previously (in the GPS-->DE3 link). Yes, 57600 works as well. Cleaned up the types/names of the serial port bits in the top-level gelware schematic.

David is here and working towards testing the PLL fast clock and DECS (dual-edged carry-save) counter on the FEDM. Tyler has a family emergency & will be here next week. I haven't heard from Darryl lately-if we don't hear from him soon, we may have to proceed to the next student on our list.

David and I are working to identify pins for viewing the high-speed clock output. Mike referred back to the fabrication drawing for instructions about how to identify controlled-impedance traces. We found a 6.5 mil controlled-impedance (50 ohm) trace going to 2-SIP header J79 pin 1 (FPGA pad H7), and a 5-mil controlled-impedance (100 ohm differential impedance) dual trace pair going to 2-SIP J77 (pin 1+2; FPGA pads C10 & D10). Using ball H7 we were able to confirm a 250 MHz output frequency from a 5x PLL on the scope (although the signal was highly distorted and not full-swing; probably in part because we didn't have a proper 50 ohm probe on it). However, when we tried looking at the same signal as an LVDS output on C10/D10, it was too noisy to even measure its frequency with any confidence. Perhaps shielding would have helped. Basically I think the problem is, we don't have probes capable of measuring LVDS signals. We should probably give up on directly measuring really high-speed signals from this board (anything over a couple of hundred MHz).

We did these tests from Mike's Dell Precision, since the ByteBlaster cable plugs into a parallel port, and the Acer doesn't have a parallel port. Mike ordered a USB-to-parallel cable from Buy.com (under $10) and this will enable David (and/or the other students) to run tests directly from his computer in the future.

Anyway, David's first job for next Monday is to look at the slower counter bits (bit 4 and bit 8) on the scope using some of the available pins that we identified today. Those should enable us to verify both the PLL output frequency and whether the counter is working. He can run the experiment to find the maximum speed at which the counter still works. Then, he can also work on porting the input capture circuit over as well. I explained to him about metastability and the likely need for synchronizer stages in a circuit like this that samples an asynchronous input.

After David left, I spent some time thinking about the design for the input capture circuit. It needs to use the PDE flip-flops. It should probably include a simple state machine which waits for and captures both the leading and trailing edge of the pulse, in case they occur in quick succession. Then it can signal to a consumer that the data (four 64-bit words, sum and carry bits for each of two 64-bit leading and trailing edge times, in units of the doubled fast clock) is ready, and when the consumer signals that the data has been consumed, the IC circuit can resume looking for the next pulse.

The consumer of the data can be another state machine with ordinary (single-edge-triggered) flip-flops with (say) a 50 MHz (20 ns) clock, which takes the data from the IC circuit and stuffs it into a FIFO. At the other end of the FIFO can be yet another circuit which passes the data to a set of eight 32-bit PIOs for the CPU to consume. Another PIO could signal back to this machine when the CPU is ready to consume another pulse. Alternatively, I could check whether there is a FIFO device that already can talk directly to the Avalon bus fabric.

Also, the sum and carry bits could be added together inside one of the 50 MHz state machines (the one at the tail end of the queue, say), thereby cutting the number of PIOs needed down to four, and cutting the memory requirements of the FIFO in half.

Wednesday, May 18, 2011

Serial Killer

Today my plan is to test serial communication with the FEDM board. First over the JTAG port (jtag_uart), then over the separate (DE9/RS-232) uart_0 connection.

We had a bed smell in the lab this morning. Smelled like an organic solvent. This has happened several times before. Fixed it by pouring water down drains to fill the traps. Ray says the traps were improperly installed - if they had been installed correctly, they would be kept wet automatically, and filling them manually would not be necessary. Then Ray spent most of the day calling around to various university, municipal, state, and federal offices to try to get someone to fix the problem permanently. Eventually, some guy in a suit from EHS came by and said that he would fix it, perhaps by sealing the drains. Hopefully, their doing this doesn't violate building codes (what if there was a leak and the lab flooded? Then the drains would be needed). The fire department told us, next time this happens, just call 911, and a hazmat team will be there in 5 minutes with sniffers.

JTAG I/O, the embedded Nios core, and interactive debugging from the IDE are working. Important note: One of the 10-pin headers is for Active Serial EEPROM programming, the other one is for JTAG.

Trying to get serial comm working. Output is garbled. At first I thought maybe it was an incorrect clock speed setting. However, then I verified that the board clock is the expected 50 MHz. During this process I discovered a bad pin assignment on the clock output was causing an intermittent short. There could be other shorts in the gelware as well, which could account for the high power consumption. Emailed Sachin & Vaibhav about the bad pin assignments.

On the way home, I realized that the DB9 connector was wired directly to the FPGA - it should be going through a level shifter to interface to RS-232 line protocol. Thus, the voltage levels are wrong, and inverted. This explains the garbled serial data. Fortunately, I still have the level shifter from SparkFun (as well as my breadboarded version of it), so I can easily fix that problem.

Tuesday, May 17, 2011

Savior of the Carries

Today I tried clocking my recursive pseudo-dual-edged carry-save counter with the 600 MHz clock from the PLL that I configured yesterday. No dice! ( Of course, the single-edged version of the counter works just fine.)

I tried some tweaks: Inserting a CLKCTRL unit after the PLL, and inserting a CARRY_SAVE primitive on the carry/save outputs of each half-adder cell, to tell Quartus to use dedicated carry-chain resources. These seemed to help a little, but still no dice. Of course, after turning the PLL output frequency down to 300 MHz, it worked fine - that is, up to bit 4 of the counter; but then I still had trouble with bit 8!

After some more fiddling, I got up to 400 MHz (dual-edge). Weird, I'm finding that things work better if I take out my manual clock buffering. That means I don't even really need the recursive register design (with the clock buffer tree) any more. But, then I tried an array design and now it doesn't work again! Argh. Everything is so sensitive to seemingly irrelevant changes. Who knows, maybe the recursive design got fitted in a way that reduced local clock skew... Now I can only seem able to get up to 350 MHz, even in the design that I thought got to 400 before...

OK, I got back up to 400 MHz now, after taking out KEEP attributes from the PDE_DFF. Let's try 500 MHz... OK, that works. Now 600 MHz (where we started out): OK, there it breaks down again. Let's try 550: That works.


The traces are, from top to bottom: (a) 50 MHz board clock (digital trace), (b) 550 MHz PLL output (analog trace, too fine to see anyway), (c) digital trace of the foregoing, (d) bit 4 of the counter (34.375 MHz digital trace), (e) bit 8 of the counter (2.148 MHz digital trace).

This is good, because reaching 550 MHz with the dual-edge counter means we can do 1.1 billion counts per second, and this translates to 0.9 ns time resolution (+/- 0.45 ns time uncertainty). This, then meets our goal of better than 1 ns time resolution.

Rip out its Guts and Start Over!

My plan for today is to start developing the new skeleton Quartus project for the FEDM board. Actually, instead of entirely ripping out the guts of the current design, I am thinking of initially just segregating it into a sub-module, so that we can quickly refer back to parts of it if/when that is needed.

First, though, I wrote a recommendation later for David Grosby, which he needs to give to payroll for them to file along with his other employment paperwork for his summer appointment. (I'll wait to do the letter for the other intern, Darryl, until he confirms that he is joining us for sure.)

Earlier today I also found an email from one more prospective intern, Michael Sprouse, but unfortunately I had to tell him that we already had made the offers. However, I invited him to still volunteer to help out if he wished.

Darryl still needs to get with Dr. O'Neal to talk about his appointment - he missed him yesterday.

I stuffed Sachin's design into a submodule so we can still access it as needed in our design, while removing all the clutter from the top-level schematic.

I then verified that we can re-load the original design onto the FEDM board and it still works correctly. (Ditto for my new version where I put it in a sub-module, although I only tested the threshold-setting VI, not the high-speed data communication one.)

We then configured Quartus on the Acer XP partition to use the license server on COSMICi (Mike's Dell), so the students could start working.

Then I showed the students how to put together a basic Nios system design in SOPC Builder. I started with just a NiosII/f, 64K on-chip memory (128K wouldn't fit on the FPGA with Sachin's stuff still on there, although to be fair I don't know if he's actually using the on-chip memory), and regular+JTAG UARTs. (The goal here is to do a quick test of our serial communication capability (which Ray wanted to see) before we start doing more complicated stuff.)

We created the skeleton firmware development project for the Nios II IDE, based on a "hello world" template. Still need to insert code to open UART_0 and print to it.

I identified the pins needed for the serial port and created the pin assignments for them and wired them to the SOPC system symbol in the top-level schematic. Next, need to test this system within the Nios II IDE, and see if it prints "Hello World" to the console as expected.

After that, add code to print some text to the extra serial port, and view it in UwTerminal or something.

Monday, May 16, 2011

A New Week, New Students

This week, my goal is to get the new summer students (Tyler, David, and maybe Darryl) up and running with the summer development plan.

Tyler and David met me at 1 pm and I talked with them about the project. David is coming back at 3:30 pm to meet with Dr. O'Neal about his summer appointment paperwork.

Tyler is installing Quartus 9.1 on the Acer XP partition. Mike is going to email Tyler and David the key Quartus files from his present design, so they can begin studying them. OK, did that.

Next, Mike is going to create a Dropbox folder for working on the new project, and share it with Tyler and David. I have started on that. Then Mike needs to create the Quartus project framework for everyone to work in.

Darryl came by late in the afternoon and Mike spoke with him too. He still needs to meet with Dr. O'Neal.

Mike also needs to write David a letter of recommendation to feed to the payroll bureaucracy.

Blogger Blues

Blogger.com was down late last week so I was unable to post these notes and sent them to Ray and Tyler instead. Now Blogger is back up so I am posting them now!

Following is a list of major steps that need to be taken, including some new gelware components that need to be developed, and other related action items, based on the approach that we're basically going to throw away all of the programming work (Quartus gelware, LabView code) that Sachin has done and start over (instead of reverse-engineering everything):
  1. Port my dual-edge-triggered carry-save counter over to the FEDM board, and experimentally determine its max frequency on the Stratix II using PLL clock drivers.

  2. Instantiate a Nios system for the Stratix II, for use in firmware development.

  3. Create a simple serial interface module (probably just a PIO device) to allow programming the DAC voltages in firmware, write that firmware, and test it.

  4. Create input-capture circuits for measuring start/stop times of threshold crossings, with CPU interface (probably just a PIO again), and test them.

  5. Develop firmware that puts together all of this data for each PMT pulse received (including the absolute time information obtained from the sync pulses), and sends it over the serial port to the server (this can use our existing EZURiO Wi-Fi modules).

  6. Develop server-side code (in Python) for data analysis and visualization.
Items #1 and #2+3 are pretty independent of each other, so Tyler and/or the other student(s) can potentially help with (or be primarily responsible for the development of) one or both of these. Other than this, later items depend on earlier ones before they can be fully tested, but potentially the student(s) can help with these steps as well.

Overall, I think our new design will be much simpler than Sachin's, apart from its not being directly usable from LabView. But, we can always still develop a LabView interface for it if we want to (since we'll understand how it works, this should be easy to do).

Wednesday, May 11, 2011

Savior of the Carries

Today I tried clocking my recursive pseudo-dual-edged carry-save counter with the 600 MHz clock from the PLL that I configured yesterday. No dice! ( Of course, the single-edged version of the counter works just fine.)

I tried some tweaks: Inserting a CLKCTRL unit after the PLL, and inserting a CARRY_SAVE primitive on the carry/save outputs of each half-adder cell, to tell Quartus to use dedicated carry-chain resources. These seemed to help a little, but still no dice. Of course, after turning the PLL output frequency down to 300 MHz, it worked fine - that is, up to bit 4 of the counter; but then I still had trouble with bit 8!

After some more fiddling, I got up to 400 MHz (dual-edge). Weird, I'm finding that things work better if I take out my manual clock buffering. That means I don't even really need the recursive register design (with the clock buffer tree) any more. But, then I tried an array design and now it doesn't work again! Argh. Everything is so sensitive to seemingly irrelevant changes. Who knows, maybe the recursive design got fitted in a way that reduced local clock skew... Now I can only seem able to get up to 350 MHz, even in the design that I thought got to 400 before...

OK, I got back up to 400 MHz now, after taking out KEEP attributes from the PDE_DFF. Let's try 500 MHz... OK, that works. Now 600 MHz (where we started out): OK, there it breaks down again. Let's try 550: That works.

Monday, May 9, 2011

Phasers locked, captain...

What to do this week?

At some point, I should probably go back to working on the journal article. I was thinking for a while that if I did some analysis of the ring oscillator as a short-time-slice TDC element, that could be usefully integrated into the article. That may still be true, but it is taking a while.

Let's consider, for a moment, what the next steps would be along that path.

1. Create an input-capture circuit (with appropriate synchronizer stages) to register the OCXO edges against the ring-oscillator half-cycles. At 1.7 ns half-cycle for the RO, we would expect to see about 29.4 RO half-cycles per OCXO half-cycle. 6 bits (unsigned values 0-63) would be adequate to encode these deltas. Or, if there isn't too much variance in the RO period, even just 4 bits (signed values -8 to +7) would be more than adequate to encode the discrepancies in the deltas relative to some "expected" value (say 30). So that's 8 bits per 100 ns period. So it would take about 26.8 seconds to entirely fill up a 256 MB block of DDR SDRAM with 2^29 = 536,870,912 of those 4-bit samples.

2. Add a command to the firmware to (at a desired time) initiate one of these half-minute data-collection runs, and then stream the data to the server for processing. The data collection routine itself will probably have to execute in a custom state machine, because since the DE3 board has only a 50 MHz built-in clock, so we will generate a new data point once every 2.5 CPU clock cycles (50 ns), and this is almost certainly not enough cycles for a software loop (whether polling or interrupt-based) to pull the data from a PIO register and then write it to SDRAM using the type of HLL call used in the demo code. Therefore, this gets tricky because we have to replicate what such a call is doing in our own custom state machine. In other words, we need to create our own host device for the Avalon bus fabric.

Also, at the 57,600 baud rate we're using for the serial comm. link to the EZURiO board, and with a minimum of 10 bit-periods per byte (8 data, start, stop), the data rate for the data upload to the server is at best 5,760 BPS, so a 256 MB data transfer will take ~46,603 secs. = 776.7 min. = 12.95 hr. = basically one overnight. To avoid this bottleneck, we should perhaps consider interfacing to an Ethernet card (there isn't one already built into the DE3, unfortunately) and thereby sending the data directly to the server in real time. Unfortunately, there isn't an Ethernet port already built into the DE3, so we would have to add a daughter card, like this one: http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=71&No=355. If we added a little Wi-Fi client-mode router (like this: http://www.dlink.com/products/?pid=346), that would re-establish wireless connectivity to the server. But still, we have to deal with all the complexity of interfacing to the network (using a whole TCP/IP stack and the like).

Or, the other option is to forget about offloading the data to the server, and instead just do the desired data analysis directly in the embedded firmware. This should be pretty straightforward, and makes a lot more sense. It shouldn't take long. Then all we have to transmit to the server is, say, the Allan deviation results for (say) about a thousand points on a logarithmic time scale, ranging from 1 to 512M half-cycles (basically 9 orders of magnitude).

Finally, if we decide to actually use the ring oscillators for timing of individual sensor events on the FEDM board, then we may want to think about doing some calibration & measurement of their frequency variations on the fly in the sensor application.

One other thing to think about: Using the phase-locked-loop (PLL) module included in the Stratix II to create faster clocks that are synced-up with the board clock. We have an EP2S30 class FPGA which has "fast PLLs" 1-4 and "enhanced PLLs" 5-6. The "enhanced" PLLs support clock frequency multiplication up to 512x, and the "fast" PLLs support up to 32x.

This raises the possibility that we could use the PLLs to sync up the FEDM's clocks with the 409.6-us sync pulses from the CTU, in a simpler manner than by constantly registering all these multiple clocks against each other. A simple circuit using the built-in 10MHz clock could convert the 409.6us-period, 100ns-pulse-width pulse from the FEDM into an approximately 50% duty-cycle clock (with a precisely-timed rising edge) suitable for feeding into a PLL. After going through one "enhanced" stage with a 512x multiplier, this gives us an 8-microsecond (125 kHz) clock slaved to the CTU. After a 2nd 512x "enhanced" stage, we have a 15.625ns (64 MHz) clock. Then after a 16x "fast" stage, we have a ~0.986ns (1.024 GHz) clock. Let's see if that's too fast for the FPGA.

OK, the EP2S30 is at a "-3" speed grade, the minimum clock high and low times are 612 ps each (table 5-37 from Stratix III datasheet). This implies a minimum clock period of 1.224 ns, or a maximum frequency of 816 MHz. We could get close to that by using a multiplier of 12x in the third PLL; then the period would be 1.30 ns and the half-cycle would be 0.65 ns, if we can get away with using that to drive a PDEDFF-based carry-save counter, then that would give us less than 1 ns time resolution on the input capture circuit that finds the level-crossing times in terms of the half-cycles of this fast clock. As long as the PLLs are doing their job, these times should be precisely defined relative to the master clock that comes from the CTU sync pulses.

Oh, actually, it's not going to be quite that good... The maximum PLL output frequency for the Stratix II is only 550 MHz. Still, twice that is still over 1 GHz. The Stratix III goes up to 600 MHz.

There may be an issue with the minimum frequency of the PLL. The minimum input clock frequency is 2 MHz. So, we cannot go directly from the CTU's 409.6-us sync pulses. However, we could base it off the 50 MHz TCXO board clock instead... If we multiply this by 11x, we get the max PLL output frequency of 550 MHz. The period is 1.81 ns and the half-period is 0.91 ns. Still a little better than 1 ns. And we can measure the sync pulse arrival time in units of that, and the cosmic ray shower pulse arrival time in units of that, and thereby get in the neighborhood of the desired accuracy.

OK, I instantiated an ALTPLL Megafunction variation for an 11x clock multiplier, and used it to generate a 550 MHz (later 600 MHz) clock from the 50 MHz board clock. That worked just fine, although as before, the waveform at that speed looked pretty rounded (sine-wave) - although again that may be just due to the board/probe cable. On-chip the signal may look better. The acid test will be to use this signal to drive the carry-save counter.

Phase-locked loop test on DE3 board. Top: 50 MHz board clock (digital trace).
Bottom: 12x (600 MHz) output from PLL (analog & digital traces superimposed).

Talked to Ray for a while about the strategic issue of whether to proceed with trying to reverse-engineer Sachin's stuff well enough that we figure out how to add more TDCs as needed for our absolute time measurements, or instead just redo the design by just counting cycles (or half-cycles) of a single fast oscillator (like the 500 MHz one I just made with the PLL). Really, it comes down to the question of whether we really need better any than 1 ns resolution on the pulse width. Ray is going to look at the science (e.g., difference between shower front development in neutrino vs. hadron initiated shower) and give me an answer on that. However, his feeling is that pulse width differences below 1 ns probably aren't going to matter. In which case, we should probably proceed by just redoing the gelware with our own design. We can rip out much of what Sachin has done and redo it. We still need the ability to program the DACs, but everything else can be re-done from scratch in our own way. We can design a little input-capture circuit to get the rise/fall times of each pulse, and just replicate it for each of the threshold comparators (LVDS inputs). Then we can have firmware transfer the data to the PC however we want.

On tap for tomorrow: (1) Make sure that I can actually drive my pseudo-dual-edge triggered carry-save counter using this 600 MHz clock (i.e., @1.2M counts per second). (2) Design input-capture circuit around that counter. (3) Use it to capture rise/fall times of an input pulse.

Friday, May 6, 2011

Ashes to Ashes

There's nothing in the Stratix III datasheet about ring oscillators, but the clock tree (of the C2) is supposed to handle speeds up to 730 MHz.

Spent a little time reading a thread on the Altera forum about ring oscillators. The posters recommend using an LCELL primitive and the assignment editor to control placement. Routing variation is still an issue, but they say that if all cells are in the same logic array block, and the RO is driving a register in that block, then the routing should stay consistent.

So, what I'm thinking now is that a pseudo dual-edged FF could be implemented in the same block as the ring oscillator, and configured in a T flip-flop configuration, so that its output would have the same frequency as the ring oscillator. Then the output of this PDEFF register could be sent to the destination logic (such as the carry-save counter) to hopefully well isolate the placement/routing within the RO block from that of the destination logic.

Another idea: Merge the ring oscillator with the PDEDFF, by using a slightly delayed version of the TFF output as the TFF clock. The advantage here is that we make sure that the ring oscillator does not run any faster than the PDEDFF can handle.

OK, I drew that circuit and simplified it. Basically it is just two T flip-flops (rising-edge triggered and falling-edge triggered) with their outputs XOR'ed together, and the output of that XOR (delayed slightly by a buffer, say) is used is the clock of both flip-flops.

I'm going to try building that now, as a schematic. I found the Technology Map Viewer is helpful to see exactly how the design compiles into cells. Here's what I came up with, after realizing I needed LCELL on both the rising and falling edge triggered T flip-flops (originally I didn't have an LCELL after the NOT):


Unfortunately, the Technology Map viewer reveals that Quartus is reorganizing the logic in some unexpected way. To get more control, I am redoing the design in VHDL as follows:

library ieee;
use ieee.std_logic_1164.all;
use work.rtl_attributes.all; -- Borrowed from the IEEE 1076.6 (2004) spec. Needed for KEEP attribute.


entity pde_tff_ro2 is

port ( clk_out : out std_logic );

end entity pde_tff_ro2;

architecture impl of pde_tff_ro2 is

signal int_clk : std_logic; -- Internal clock signal.
signal int_clk_d1 : std_logic; -- Internal clock signal, delayed by 1 LUT propagation delay.

signal rq,fq : std_logic; -- Rising- and falling-edge TFF outputs.

-- Prevent certain key signals from being optimized away.

attribute KEEP of int_clk : signal is True;
attribute KEEP of int_clk_d1 : signal is True;
attribute KEEP of clk_out : signal is True;

begin

int_clk <= rq xor fq; -- Exclusive OR of rising & falling edge TFF outputs.
int_clk_d1 <= int_clk; -- Hoping this inserts an extra LCELL due to KEEP attribute.
clk_out <= int_clk; -- Output gets another copy of the internal clock.

-- Rising-edge-triggered toggle flip-flop.

re_tff: process is begin
wait until rising_edge(int_clk_d1);
rq <= not rq;
end process;

-- Falling-edge-triggered toggle flip-flop.

fe_tff: process is begin
wait until falling_edge(int_clk_d1);
fq <= not fq;
end process;

end architecture impl;

OK, that seems to give me the design I want:


This image is from the Technology Map Viewer. The double-boxed elements are LUTs and the registers are individual flip-flops in the LABs.

I need to look at the design in the Chip Planner as well, to make sure all the LUTs are being placed in the same LAB. They seem to be, except that post-fitting it looks like the internal clock is being routed through a CLKCTRL module...

This could be good or bad. It is good in that it reduces skew (uses low-skew interconnect resources), but bad in that it can increase delay to get down there and back. So it may reduce the RO frequency.

I should perhaps use an ALTCLKCTRL megafunction to do regional clock generation from the main output of this module... Anyway, let's worry about that later.

DERP, I just realized that without an external kick, this clock will never get started because it will never generate its own edges. So, I need to design a start-up circuit for it.

OK, I tried to fix that problem by just gating the int_clk with an AND gate controlled by a slider switch, but still no dice.

It occurs to me that, even with the kicker to get it started, this design is perhaps fatally delicate, in that if it ever settles down, it will never spontaneously start going again.

I tried various ways to fix the problem, but no dice. I think I'm going to back off from this whole register-based oscillator idea, and just revert to doing an ordinary ring oscillator.

Before I left, I did that, and it worked (with the PDEDFF-based carry-save counter); the half-cycle period is about 1.7 ns.

Wednesday, May 4, 2011

Ring Around the Rosie

Today, set up to do a direct test of the ring oscillator frequency/period. That seems to work fine. Here's the scope trace showing a ~1V logic swing "very rounded square wave" output measured via a 75ohm cable (I couldn't find any 50ohm ones lying around). The period is about 1.7ns, frequency about 580 MHz. Possibly it's smeared out in part because the board trace to the CLK_OUT connector isn't rated for signals at that high a frequency - there might not be enough impedance control. Or maybe the problem is the cable.

Let's try a 5-stage ring oscillator, and see if the longer period means the wave shape and amplitude will fare better. Ideally, the period should be 67% longer (5/3x). It turns out to be 2.5 ns, which is only about 50% longer, interesting, though possibly the discrepancy is due to measurement error. Oh and the amplitude is larger, as expected: 1.35V. The wave shape is different, too, and in an interesting way:

The lows are flatter than the highs, which makes sense since nFETs are faster than pFETs (since electron mobility. Frequency is now 395 MHz, and meanwhile the period of bit 20 of the counter is 5.3 ms, which is consistent with the 2.5 ns for the ring oscillator period. I don't think the counter was working at all with the 3-stage ring oscillator on the last test, but I should probably check again to make sure. Nope, it wasn't. OK, now let's try 7-stage:


At 7 stages, the period is 3.42 ns (still only about 2x that of the 3-stage RO, instead of 2.5x) and frequency is 293MHz. Interesting. Amplitude is now about 1.6V (peak-to-peak). Period of bit 20 is 7.16 ms. So, if I did dual-edge-triggered registers on that signal, it would be about the same period as single-edge triggered with the original 1.7ns 3-stage period, and it would probably work more reliably due to the larger signal swing and more flat-topped waveform. So, that is probably in fact the best approach. However, I'm still baffled by why I was able to get *faster* ring oscillators on the DE2 board than on the DE3. I probably need to do some more experiments on the DE2 at home. And study the datasheets for both devices some more.