Friday, March 30, 2012

Fri., Mar. 30th

David and Darryl were here today, and were thinking and talking with Dr. O'Neal about what to do next on the paper wrt updating the figures.

David went through part of a tutorial on PADS and learned about a lot of nice features.  Perhaps before the Senior Design students disappear, he can sit down with Samad and they can work together on designing the new OCXO board - this is an extremely simple board which will be good practice for both of them.

George stopped by with some SMA cables that he had picked up at Radio Shack, but they didn't have the right connectors to interface with our boards.  Brian thinks he can make his own cable, but I told George that they really need to order some off-the-shelf ones online as a backup, and also because making a good quality coax cable with no impedance nonuniformities (which can cause signal reflections) is in general quite difficult; making a cable that really offers a reliable high-quality connection suitable for GHz-bandwidth signals can be a challenge.  Brian supposedly has experience making coax cables, but we can't risk any more delays if they can't build a working cable.  We found 12" cables at Cables to Go (hopefully these will be long enough; 16" or 18" might also be suitable, if we can find them) and George says he will order them and they should arrive on Tuesday.  But hopefully he & Brian can make a cable before then that will work well enough to at least wire up & test the main electronics box assembly.

The main thing I worked on today was trying to configure the Wi-Fi connection using my Verizon MiFi 4G/LTE mobile hotspot.  We tried WEP 64-bit, but the modules wouldn't connect, perhaps because the MiFi would only let me enter a 10 hex digit (40-bit) key?  The Norton might have been blocking the connection, but I switched off the relevant features and this didn't fix the problem.  If I continue having problems, I'll uninstall Norton; it only has 11 days before the demo license expires anyway.  I went ahead and configured rules in Windows Firewall to open the needed ports.

In the meantime, I changed the Wi-Fi script to turn on info-level output, so I can see exactly where it is having problems.  I'm also trying to configure it for WPA2 Personal since that is the most secure level supported by the MiFi - as long as we're trying different things, we might as well try the most secure mode first.  The EZURiO does support WPA2 Personal, according to the docs; however I haven't tested this mode on it before.  I'm working on augmenting the script to support this mode as an option selectable in the site configuration.

* * *

Working while on a trip this evening, I finished the script changes to support WPA2-Personal security, and got the script to compile.  (As usual when adding code, this involved moving more strings out to the strings.txt file.)  The changes still need to be tested on a real Wi-Fi board next time I am in the lab (or at home).

Thursday, March 29, 2012

Thu., Mar. 29th

It occurred to me this morning that we'll need a wireless access point at the Senior Design Fair. An easy solution would be my Verizon mobile hotspot. I can rename it "MikeRoCosm" (for Mike's Roaming COSMOS, or Micro-COSMOS) and configure it with minimal or no security and the modify the boards' Wi-Fi script to connect to it. Also, the server can run on my (or someone else's) laptop which can also be set up to connect to the AP. I can test this in lab on Friday. We need to arrive real early on Thursday to start setting up, duct-tape cables to the floor, find a good spot for the GPS antenna and the detectors, run electrical power, etc. I need to ask Donte about the power arrangements. It might be possible to write a real quick-and-dirty visualization over the next few days, maybe give that a shot. It will not be a permanent solution though.

Wednesday, March 28, 2012

Wed., Mar. 28th

Juan is here and working on timing optimization.  He found this nice tool called Timing Optimization Advisor that made suggestions of ways to improve the timing performance.  It had a couple of suggestions which looked promising that we are trying.

To help out, I am going to make a list of all the settings I have fiddled with:
  • Compilation Process Settings:
    • Use smart compilation - I don't know if this affects performance at all; it supposedly can improve compilation speed, so I've been leaving it turned on.
    • Incremental Compilation:
      • Incremental compilation = Full incremental compilation.  I think this is necessary for partitions & LogicLock to be helpful.
    • Physical Synthesis Optimizations:
      • Optimize for performance (physical synthesis):
        • Perform physical synthesis for combinational logic = ON
        • Perform register retiming = ON
        • Effort Level = Extra
      • Fitter netlist optimizations:
        • Perform register duplication = ON
  • Analysis & Synthesis Settings:
    • Optimization Technique = Speed
    • Timing-Driven Synthesis = ON
  • Fitter Settings:
    • Fitter effort = Standard fit (highest effort)
    • More Settings...:
      • Placement effort multiplier = 4.0
      • Router Timing Optimization Level = Maximum
At least, these are all the settings that I think were helping.

I'm also looking at possibly changing the state encoding for the pulse_cap FSM to a one-hot encoding (this was one of the things that the Timing Optimization Advisor suggested).  Not sure yet if this will help at all because that logic was, I think, pretty well-optimized already.

589.28 MHz - Baseline speed for high-speed datapath.
455.37 MHz - Speed after switching to the one-hot encoding.  Nope, it didn't help!

Continued working on server coding.  When I left off at the end of the day, I was about to write the code for the GPS_Manager that would cause it to watch for the condition where the TRAIM is not eliminating all satellites from the solution and is reporting a non-null accuracy value (and maybe also verify that the current time as reported by the GPS is consistent with the system clock), and in this case, it should inform the application's main entity (the CosmicIServer instance) that the GPS time is good, by calling its yo_GPS_time_is_good() method, which will then relay this information to the RunManager instance, causing the RunManager to proceed to the next step in its startup sequence, which has not been implemented yet (empty method bodies for now);  that step will be to start up the Timekeeper and then the DataCollector (both of which have also not been written yet) and then (next step) start the CTU - that last step should be really easy to implement though, since all the infrastructure for it is already there.  (We can go ahead soon and test this & the rest of the startup sequence, leaving empty method bodies in the routines to start up the Timekeeper and DataCollector temporarily.)

Tuesday, March 27, 2012

Tue., Mar. 27th

My plan for today:  Work on the startup sequence.  Maybe create a new class/worker thread called RunManager that is responsible for overall management of the startup sequence and maintenance of the data-collection run?

Michael Dean is supposed to come in today; he can continue working on the refactoring of the FEDM gelware into high-speed vs. low-speed modules.

Darryl (if he's feeling better) & David are also supposed to be here; I can maybe work with them to collect some higher-quality data for inclusion in the paper.

Sometime, David might also work on the PADS layout modifications.  The FlexNet license server seems to be in a good mood today and David was able to run PADS on the Acer with no complaints.  I told David he also needs to install the OrCAD demo version on the Acer sometime so that he can inspect the schematics.  To modify the schematics, though, we'll have to use the OrCAD at the College of Engineering since the demo version we're using here won't let us save changes.  Sometime I need to go over with David in more depth how to actually modify layouts in PADS.

More thoughts about RunManager:

Here's what it does:
  1. It waits for the CTU node to be created, and for its host to become ready.
  2. It waits for the FEDM (ShowerDetector) node to be created, and for its host to become ready.
  3. It waits for the GPS_Manager to report at least one valid TRAIM reading with finite accuracy and w/o all satellites eliminated from the solution.
  4. It tells the CTU host to start running (generating PPSCNTR time-reference data and timing sync pulses).  (At the same time, the TimeKeeper worker thread is started, which is responsible for archiving/visualizing the time data.  Also, the DataCollector worker thread is started, which is responsible for archiving/visualizing the FEDM data.)  After this point, subsequent data output by the FEDM will be tagged with meaningful absolute time-reference data (previous data will have all "0" values for the time-reference fields).
Started writing the file, runmgr.py.

We managed to get some nice screenshots of PMT pulses crossing multiple thresholds for the paper:

Oscilloscope screenshot of a PMT pulse that crosses 4 out of 5 voltage thresholds,
spaced linearly at -200 mV intervals, together with the corresponding digital outputs from the threshold comparators.

We also have the data from this trace in spreadsheet form, so we can re-plot the traces for the figure in the paper if we want.  Darryl can work on incorporating this into the paper.

* * *

At home this evening, I spent a while making changes to the server code to parse FEDM messages.

Monday, March 26, 2012

Mon., Mar. 26th

David texted me earlier asking if he should come today or tomorrow - I told him tomorrow so he can work with Darryl on the paper, perhaps.

Ray stopped by to touch base on the project status.  We're hoping the ME students will get the wall/ceiling mounting brackets installed at CLC before the end of next week, and also get the electronics boards mounted to the chassis before the Senior Design Fair.

Juan is here, and I suggested he work on the Acer instead of the VirtualBox since it is faster, and loaned him my 8GB USB flash drive so he could transfer the files.  He is doing that.

Aarmondas is here, maybe I can start taking him through some things that need to be done on the Python code.  I re-shared the Server Code folder with him and it is downloading onto his laptop.  He's also installing Python v.3.1.4.

I saw some glitches in the 300 MHz pulseform-capture system when I tested it the other day; I wanted to work on that today (or soon).

Some things to do (coming up) on the server code:
  • Put FEDM messages into data structures; publish them using the Publisher interface.
  • Finish the startup sequence coding (automatically start CTU after FEDM is ready)
  • Write some simple early visualization modules - e.g. graphical display of CTU time data
Back to testing/debugging.  I re-soldered the jury-rigged SMA connector onto the OCXO output (it had come detached last time I was working with the CTU - I'll be really glad when Samad and I get the new OCXO board made).

Aha, forgot to add support for the LOST_PULSES message.  Did that (skeleton at least).

OK, we're still getting the glitches in the 300 MHz data.  I had an idea the other day to try to fix this, which was to add a synchronizer chain on the handshake return in pulse_cap.  Let me do that...

OK, I added a 4-stage synchronizer chain to the hs_cons inputs to both se_pulse_cap_56.vhd and se_pulse_cap_tsedge_56.vhd.  Hope 4 stages is enough!  We'll see.  Note: Adding this synchronizer increases slightly the minimum "dead time" between separate pulses we can detect on a single input channel.  At 500 MHz, the increase will be by 4*(2 ns) = 8 ns.  However, it is already quite a bit larger than this, as cs_combine already takes a few 20 ns cycles to return its handshake.

Speed of new design:  587.89 MHz (high-speed components only); 353.23 MHz (whole design).

The change seems to have done the trick WRT eliminating the glitches.  The data we're collecting at 300 MHz looks clean now.

Aarmondas is looking at a couple of possible solutions for the database:  SQLlib (in Python), vs. MySQL.  I told him he should consider which approaches are most flexible in terms of allowing us to actively query the database from another process while the main server is still writing data to the database.  A separate database server process might be needed to do this (if we don't want to integrate a SQL server into the main server app).


Friday, March 23, 2012

Fri., Mar. 23rd

I'm not sure if anyone else is coming in today.  Darryl is out sick.  David isn't coming today.  Aarmondas usually comes in from 4-5 on Fridays, but I have to leave early (4:30) today.  If Aarmondas shows up before I leave we will go over the FEDM rewrite together and he can pick up where M.D. left off yesterday.  I think one of the ME guys might drop by at some point?  (They didn't.)

Meanwhile, I think today I will just continue working on the startup sequence, and the server-side code for parsing the FEDM data and reading it into data structures.

Samad came in and we are working to get started on the new OCXO board design.  I put the files for the old OCXO board design on Dropbox in COSMICi_devel/OCXO_board/Old_CTU and shared OCXO_board with Samad; the new board design will be created in another subfolder of it.

We first tried to see if he could use PADS remotely on the ACER through Windows' built-in Remote Terminal support, but apparently it doesn't support having multiple people logged in simultaneously.  Guess you'd need some Server version of Windows for that.

So next, Samad is installing PADS on the ACER, and we will see if it can use the license hosted by the FlexNet server on COSMICi, at port 27000 (lmgrd process).

I spent some time with Samad going through various details of the old OCXO board design in PADS, and showing him how to use various features of the PADS software.

Samad hasn't yet received the new instance of the power distribution board from Donte (the old one had some broken rings around the vias); when he gets the new one, he can solder the various headers onto it and then we can test it in practice.

Thursday, March 22, 2012

Thu., Mar. 22nd

Came in briefly from ~1:45 - 2:15 so that Michael Dean could get in to continue work on the restructured FEDM gelware.  Check Michael Dean's blog post for details on what he did.  Later in the day I left a comment on his blog post.  We probably need to go over the new design carefully next week and check it for correctness before we try compiling.

Stopped back by at ~6:15 pm to make sure door was locked.  It was.  Came in for a few minutes anyway to check email and take notes.

Wednesday, March 21, 2012

Wed., Mar. 21st

Juan is here, and working on finishing up Michael Dean's changes to put all of the high-speed components into a single entity under the top-level schematic.

I am going to install my current Q:\ compile onto the FEDM.  The PLL frequency is 300 MHz.  The Fmax from this particular compile is 340.02 MHz.

Spoke to Ray about forthcoming goals:

Hardware goal:  Mounting hardware/brackets installed in CLC by April 7th, also electronics mounted in case, cooling system installed but not glued to the chip (thermal paste OK if easily removed).  We may need to wait a little longer than this to actually move the electronics box to CLC, in case we are still doing development work on the system.

Today I will continue working on the server-side code to support the FEDM.  First trying to see if my changes to recognize the FEDM messages are working.

[ ] Still need to fix the GPS Manager so that it automatically turns POSHOLD mode back on after a manual reset.

The GPS is acting a little odd today - it keeps eliminating all satellites from the timing solution.

Having an issue that we're not seeing the HOST_STARTING message from the server b/c it's sent before the Wi-Fi is ready?  Let me try again.

For some reason, the UART bridge connection from the FEDM board keeps closing itself shortly after it opens.  Not sure how.  Maybe a buffer overflow in the bridge implementation in the Wi-Fi firmware, caused by too much data being streamed to the Wi-Fi board before the bridge is fully established?

Having trouble even getting that Wi-Fi board (node #1) to establish a connection now.  Wonder if the script got erased?  That happens sometimes.

OK, the disconnects were a server bug.  Fixed that.  Now I'm having another problem: Flaky connection from the OCXO, which has come unsoldered again, for like the 3rd time.  We   However, by holding it in place manually I managed to get some pulses.  Here are a couple of adjacent output lines from the period during which the OCXO output seems to have been connected:

NC_PULSES,26595,23828622948,23828752409,118,78,203
NC_PULSES,28763,24055954150,24056162251,135,84,201

Let's just look at the two time-reference data points (first two fields of each line).  The difference in sync pulse counts is 28,763 - 26,595 = 2,168.  Divided by the new nominal sync pulse frequency of 2,861.022,949,218,75 kHz gives a time interval of 0.757,770,922,67 seconds.  Meanwhile, the difference in PLL clock cycle counts was 24,055,954,150 - 23,828,622,948 = 227,331,202.  Dividing that by the current PLL clock frequency of 300 MHz also gives 0.757,770,673 seconds.  The two clocks (OCXO vs. FEDM's TCXO) are thus out of calibration with each other by only about 0.33 ppm; not too shabby.

So, this verifies that the PLL cycle counter on the FEDM really is working at 300 MHz in the current compile, so that's good.  However, there were a few outputs from the pulseform-capture datapath that seem to have got corrupted somehow, e.g.:

Wed Mar 21 17:19:32 2012 + 697 ms: < CON_PULSE,0,0,1,1,1026250529,3,(0,(1,(2,6),7),7)
Wed Mar 21 17:19:32 2012 + 698 ms: < CON_PULSE,0,0,3,1,1026250530,4,(0,(0,(1,(3,5),8),6),11)
Wed Mar 21 17:19:32 2012 + 699 ms: < CON_PULSE,0,0,2,1,1026250531,3,(0,(1,(1,11),6),3)
Wed Mar 21 17:19:32 2012 + 701 ms: < CON_PULSE,0,0,2,2,1026250531,2,(0,(443296,7),4)

What happening here?  First, in the first 3 lines, we see a nice-looking shower event where all three detectors cross the 1st threshold (-200 mV) within 2 PLL clock cycles (i.e., 6.7 ns) of each other, and all of them cross 3 or 4 thresholds with roughly similar-looking patterns.  Then, we get a spurious 2nd pulse from channel #2 that supposedly starts in the very same clock cycle as the previous one (how is that even possible?), and only crosses 2 thresholds, with an anomalous time delta (1.48 ms) between crossing the first and second threshold (this could very easily be the time delta between two completely separate pulses).  Something is clearly screwy there!  And this same kind of glitch seems to happen during most of the other shower events.  I didn't notice these kinds of glitches when we were running at 200 MHz, so possibly there is some kind of timing issue at work.

Hm, I wonder if I should add a synchronizer chain when feeding the low-speed data consumer's (cs_combine's) handshake-acknowledge signal back into the high-speed front-end pulse-capture module?  Without that, there are possible metastability issues that could come into play, which could conceivably destabilize the updating of the pulse-cap module's high-speed state machine temporarily, possibly causing the spurious extra outputs.  This might be worth a try.  However, the day's about over so we'll worry about that tomorrow.

The ME students came by at one point today and took some more measurements for the chassis construction.

Tuesday, March 20, 2012

Tue., Mar. 20th

David is here.  Dr. O'Neal brought in his Windows 7 CD so David is using it to install Windows on the new hard drive for the Acer.  Unfortunately it wouldn't accept the license key (the CD is Professional, the license key is for Home), so we might have to tweak the installation later to downgrade it from Professional to Home, or something.  David is downloading a new ISO of W7 Home Premium (non-SP1) which he will burn onto a DVD-R.  We seem to be having a problem where this download keeps getting interrupted right at the end.

Darryl should be here later, and we can all talk a little about the paper if needed.  We were going to collect some new data to include in the figures in the paper, but it might be best to wait until we finish boosting the FEDM speed to 500 MHz so that the data we collect can come from the actual system (as opposed to a simplified mock-up).  An exception might be the figure showing an input PMT pulse next to the comparator outputs, since we don't actually need the high-speed clocks for that one.

Michael Dean is here.  He & the other Senior Design computer engineering students can work on making a single top-level entity for all of the high-speed components, which I think is the next logical step, since the lack of such an entity is the only reason I can think of why the speed of the high-speed stuff still isn't getting preserved when we add the other stuff - it's possible that, even though the individual high-speed instances are logic-locked, the routes between them aren't getting logic-locked currently.

Meanwhile, I could work on the FEDM model/proxy code, which has barely been started so far.  Basically I first just need to add the handlers for the various FEDM-specific messages.

What to load onto the FEDM for testing?  Looks like all my compiles from yesterday had the PLL speed set to 250 MHz.  I should be able to turn it up to at least 300 MHz based on the last Fmax from yesterday.  What if I compile with PLL=500 MHz, does that affect Fmax at all?  Let's try it...

Fast stuff only:    552.49 MHz.
Add slow stuff:   307.22 MHz.  Boo.

On the FEDM model coding:  Currently, we need to add support for these 3 message types:
  • NC_PULSES - 6 arguments
  • FIFO_FULL - 4 arguments
  • CON_PULSE - 7 arguments (last one has nested commas in parens)
Modified fedm.py to dispatch these 3 methods to (currently still empty) handlers, so that at least we won't get error messages for them.  David tested COSMICi_server.py to make sure it runs on Windows 7 - it does (at least up to the starting screen).

Meanwhile, Trying another compile on laptop.  That one had the pipeline registers.  Tried merging partitions after LogicLocking; that caused the multi-hier partition to get a LogicLock icon.  Maybe that will help.

Fast only:  564.33 MHz.
Add slow:  335.23 MHz.  (I think this is the best yet with the full system in there?)

The placement effort level on the laptop wasn't turned up all the way.  Fixed that; redoing.  Meanwhile, another compile on PC, with pipeline registers and entire merged partition in LogicLock:

Fast only:   582.07 MHz.
Add slow:  266.24 MHz

What did I do wrong?  Failed to recompile Top from source?  No, that only got us to 266.81 MHz.  Hm.  Not sure why results are so different between PC & laptop compiles right now.  Some minor settings difference?

Laptop:

Fast only:   596.66 MHz  (Is this about the best so far for just the high-speed components?)
Add slow:  345.9 MHz  (Is this the best so far with the full system?)

Now on laptop got 346.48 MHz for the whole system.  Can we hit 350 MHz?  We're so close!  No, blah, this time I only got 321.96 MHz!  Yuck!

Trying compile on desktop with less-aggressive settings for the top partition.  Maybe this will help now that the multi-hier is logic-locked?  No, blah.  Made desktop settings same as laptop.  Now getting 338.98 MHz.  At least it's over 300.


Monday, March 19, 2012

Mon., Mar. 19th

David is here.  The new hard drive came in and David installed it.  Now we just need to find the Windows 7 boot CD.  Ray has one at home - asked him to bring it in.

Juan is going to be here Wednesday instead of today due to a conflict with his RA job.

Aarmondas & Samad came in.  Samad had his PCB fabbed by Donte, but there were some fabrication problems so Donte is going to redo it.  Aarmondas is experimenting with compiles.  David is looking through Altera docs trying to find new things to try.

I heard the guys did some more work last Wednesday, but did not yet get the FEDM input-capture working at 500 MHz.  David is checking the group blog.  Juan says he tried changing the fitter placement effort but it only helped a little.  He said he didn't take out the extra pulse-cap instance yet.

The last test compile in Q:\ got 334.9 MHz.  Or no, that was with the high-speed components only?  Not sure; need to recompile.

Let's try:
  • Take out 6th pulse-cap from the pulseform-cap module.
  • Turn off "optimize multi-corner timing" (we only care about hot corner (worst-case) for now).
  • Change fitter placement effort multiplier from 2.0 to 4.0.
  • Change fitter routing effort multiplier from 2.0 to 1.0.  (These are the settings that worked best for the CTU app.)
  • Router timing optimization level:  Leave at Maximum.
OK, first I merged all the high-speed parts, and am compiling them from source with Top set to Empty.

Then (if that meets the speed target), plan is to unmerge them, logic-lock them, set them to compile from post-fit netlist (strict with placement/routing preservation), and set top to Source and recompile.

OK, we got 577.03 MHz for the high-speed stuff.  However, I'm not sure if this approach will really work - splitting the merged partition looks like it might be destroying the post-fit results.  If this doesn't work, I'll try it again and next time I won't split it; will instead add all the instances to LogicLock manually.

Another thing to try is turning down the aggressiveness of the compile for the non-time-critical parts.  It's possible that when it's aggressively optimizing those, it encroaches on the time-critical parts.

Interrupting the current compile b/c it seems to be stuck in the middle of analysis & synthesis - it's been at about 50% for 20 minutes.

OK, compiling just the high-speed stuff as a merged partition again.  Then I'll apply the LogicLock, manually using wildcards if I have to.  Then add in the low-speed stuff with relaxed compile settings.

I'm learning how to use wildcards more cleverly (I think) so that there are fewer separate entries that need to be added in the LogicLock list.

For the slow-speed logic, I relaxed the overall fitter setting from "Standard" to "Auto," reduced placement effort multiplier to 1.0, and turned router timing optimization level to Standard.

I could also try turning off physical synthesis optimization for speed in the low-speed compiles.

Bleh, 295.95 MHz.  Even worse than before!

Speed is currently 565.29 MHz for just the high-speed stuff.  Let's add the slow-speed stuff in without turning down the optimization settings at all.  Still got my high-speed stuff in LogicLock.  Compiling now...

Argh, 295.95 MHz again!!!

Trying one more thing:  Creating a reserved rectangular LogicLock region for the high-speed stuff.  This way it won't even be able to try optimizing other stuff by changing stuff in the LogicLock region.  So, hoping that might help.  However, I'm not sure the design will even still fit if a rectangular region is reserved - there might not be enough cells left, it was a really tight fit before - or that the high-speed part will still be fast enough.

OK, that last worry was unfounded - speed in rectangular region is 567.21 MHz.  It reserved a big region on the left side of the chip.  Now let's try adding the slow-speed stuff, outside that reserved region.

Nope, we are 160 RAM cells short.  I could try taking out the pipeline registers, there are 4x2x56 = 448 bits worth of those, so those might be enough to allow us to fit again.  However, I'm not sure whether it will run fast enough without them.  Worth a try though.

Another thing to try:  The compile without the merge, move all the instances individually into LogicLock, then turn down optimization for compiling the rest.

For some reason, analysis & synthesis always seems to hang on this computer when all the high-speed instances are in separate partitions.  (Something to do with parallelizing the compile on multiple CPUs, perhaps?)  So anyway, I'll skip that idea for now.  (It might work on another computer though.)

In the meantime, I'll try again taking out the pipeline registers at the front of the datapaths.  First, from source with no logic-lock.  Got 572.74 MHz.  So those front pipeline registers were never really necessary!

Now, from source w. HS elements in root logic-lock region.  Same speed.

Now, from source w. HS elements in their own logic-lock region (Region_0).  569.8 MHz.  No, wait, that logic-lock region came out the wrong size for some reason.  Redo so it's the right size.  Now 582.41 MHz!  Not sure what changed to get that.

Now, turn on Reserved, turn on post-fit (strict) for HS partition, switch Top partition from Empty to Source, and recompile...

Nope; still 160 RAM cells short!  I'm surprised that taking out the pipeline registers didn't seem to help at all.

Let's go back to putting things in the Root logic-lock region, non-reserved, and see what we get (haven't tried that since taking the pipeline registers out).  582.41 MHz still.  Adding low-speed parts & recompiling...

Another thing to try:  Put all the high-speed logic in a single entity.  It's possible that the instances in the logic-lock list are being preserved, but the routing between them isn't.  Putting them all into a single entity that is itself logic-locked might help.  This requires substantial reorganization of code though.  Still, it's a lot easier than revamping our whole data representation.  Probably makes sense as the next thing to try, if my current compile doesn't work.

304.23 MHz for the whole thing.  Still no dice.

I think the next logical step is to try putting all the high-speed stuff into a single entity under the top-level schematic, which can be in its own partition, and logic-locking just that entity.  It will have some humongous number of output ports, but no matter.  This is what worked for me when I was doing the other app (CTU GPS app on the DE3).

That's a big job, so this is a good stopping point for today.  Let the students work on it...

Tuesday, March 13, 2012

Tue., Mar. 13th

To do today:
  • [ ] Do a test run with new FW burned into FEDM. - Last night's Quartus compile stalled.  Finished it up.  Burned it onto board.  Got a high rate of garbage output - looks like PLL was set at 500 MHz.  Turned it down to 250 MHz (slightly above hot-corner speed of ~249 MHz) to see if that works; recompiling now.  Quartus keeps crashing!  I think my incremental compilation settings are confusing it.  More below.
  • [/] Ray should drop by after his class to sign timesheets.  Get it signed & turned in to Sonja. - We signed 'em and David ran 'em over.
  • [/] Reserve rental car for SEALER trip.  - Done (Avis), printed confirmation.  Hope they don't need a credit card - maybe I should call them later.  OK, the terms on the reservation say they do take debit cards at this location.  Printed out terms to bring with me in case there's any question.
  • [/] Installing Quartus on laptop. - Got full 91 base; adding SP2 now.
Other notes:

* Juan/Aarmondas's "high speed" partitions had some extra stuff in them.  Mike Dean is fixing that.  He also stubbed out the 6th pulse_prep in each pulseform-capture datapath to make more space & is putting the pipeline registers back in.

* I discovered that you can merge partitions.

* Seems to be necessary to use LogicLock also (not just partitions) to preserve performance - I tried using partitions without LogicLock, and Fmax slowed down when I added the less performance-critical logic back in.

* I tried dragging the merged partition into the LogicLock region, but it seemed to only add one instance.  I went through and added all the high-speed instances to the LogicLock region using wildcards.  Hope I did it right.

* After adding the slow-speed logic in Top to the merged partition, Fmax slowed down again to 300-something.  Guess I could go ahead and burn this version since I'm only asking for a PLL speed of 250 at the moment.

* Quartus has been acting really odd today.  It keeps hanging at specific places.  Maybe I need to reboot.

* Rebooted desktop; also copied Q:\ contents to laptop (under my desktop in q91sp2\FEDM\).  Amazingly, Quartus seems to run significantly faster on my new laptop than it does on my desktop!  Maybe it's the combination of the i3 processor and the SSD.

* Michael Dean left Quartus compiling under VirtualBox - this is a test of the logic-locked high-speed logic, in a post-fit (strict) partition preserving placement & routing, with the slower-speed stuff added back in around it.  Unfortunately, the Fmax at the hot corner only came out to 311.92 MHz.  So, something is still not right.  Did we forget to include one of the modules clocked by the PLL clock in the stuff that is included in the partitions & the LogicLock?  It looks to me like he got everything, although someone else should probably double-check.

One thing though:  The pulse_cap modules for the 6th threshold could be removed from the partitions and the LogicLock, since we are not using them anyway.  This might help - since it reduces the number of instances of that module from 18 to 15, and makes the fitting easier.  (Those modules aren't getting automatically eliminated, since they are locked in.)  I actually did this in the copy I'm working with on my laptop -- we'll see how that compile comes out.  - That one yielded 324.04 MHz; a little faster but not much, and still far from our 500 MHz target.  We'll have to play with it more another day.

Another thing to try:  Turn the fitter placement effort from the default 1.0 up to 4.0.  This helped (and seemed to be necessary) in my DE3 GPS app.  It's worth trying in the FEDM code as well.  Of course, this will make the fitter run even slower than it does now, but hey...  You gotta do what you gotta do.

Currently, placement and routing effort are set at 2.0.  I found in the other project that it actually did worse at 2.0 than it did at 1.0; but then it got better again at 4.0.  The same might be true here.


Monday, March 12, 2012

Mon., Mar. 12th

Things to maybe work on today/tomorrow:
  • [ ] Get timesheet signed (turn in tomorrow). - Forgot to give it to Ray today before he left but can do it tomorrow.
  • [.] More testing of GPS initialization code. - Some incremental improvements.
  • [.] Maybe start developing the (very important) Timekeeper module. - Wrote file header.
  • [ ] Maybe work some more on the automated startup-sequence code.  As things stand right now, we get through GPS initialization; but then we also need to wait for the FEDM to start up, and then start generating timing-sync pulses and sending them to it.
  • [ ] Write up some notes for the students on things they can work on later in the week while I am away.
  • [ ] Consider maybe starting a time data collection run before I leave town, so it can be running while I'm away.  One issue:  Will it keep running even if I'm logged out?
Juan came in and I showed him how to use Design Partitions.  He's going to try that for the FEDM design.
Aarmondas is here.

Tomorrow David & Darryl should be here, and we can all talk about the paper, maybe go over my latest round of markups which I sent to them over the weekend.

Remark:  Sometimes the GPS module eliminates all (or all-but-one) of the satellites from the solution.  This seems silly, since what then is it basing the "solution" on, and ?  In such cases, it might be good to do a hot restart.  Not sure yet; need more experience with this case.

Currently, if you invoke a restart manually, the server does not automatically turn on TRAIM and POSHOLD modes again, because it thinks they are already turned on.  Just fixed this in the case of TRAIM; haven't yet done it for POSHOLD - first need to subscribe to POSHOLD messages.

I wrote a header with a description for timekeeper.py, but haven't written any of the code for it first.  It will take some time to develop, b/c I have to research topics like how to do database interfaces in Python, how to draw graphs in TkInter, etc.

Before we start trying to archive real data in databases, it probably makes sense to work a little more on the startup sequence so that we can start runs in a more automated fashion, and also because the timekeeper will need to be informed when the start of the run actually happens - getting the startup sequence automated will help with that.

Let's get set up to do a full test run, so we can see where we are in terms of the startup sequence.  The GPS is already on and (last we checked) acquiring satellites.  So now, just:

  1. Plug in PMTs.
  2. Put cooling block on FEDM.
  3. Power on CTU Wi-Fi.
  4. Power on DE3.
  5. Power on FEDM + fan.
At this point, the FEDM goes thru its startup (HOST_STARTING, DAC_LEVELS, HOST_READY) starts spitting out various operating messages:  NC_PULSES, FIFO_FULL, and real PULSE messages.  (There also are a couple of ACK and ERR scattered in there, in response to the WIFI_READY message from the EZURiO module.)  Initially, all the pulse messages start with 0,0 meaning that no timing-sync data has been received yet, until one sends the "HOST START" command to the CTU.

Now, really, as soon as the server gets HOST_READY from the FEDM, it is OK to go ahead and send the "HOST START" command to the CTU.  Now, where to implement that behavior?

By the way, should NC_PULSES also include the most recent timing-sync data?  Probably so.  OK, did that (loaded new FEDM FW via JTAG; not yet compiled into Quartus programming files in Q:\).

Should we also be putting timestamps on our FIFO_FULL messages?  It might be a good idea.  OK, that's done too.

I'm renaming PULSE message to CON_PULSE (given that coincidence filtering is turned on), since this makes this message the same length as FIFO_FULL and NC_PULSES, which makes for a prettier display.

OK, this is looking pretty good now.  Here's some FEDM output right after starting the CTU:


NC_PULSES,1618,3352941653,3352948689,115,68,203
FIFO_FULL,2392,3407048110,3,1
CON_PULSE,3186,3462552667,3,1,3462608072,2,(0,(2,2),5)
CON_PULSE,3186,3462552667,2,3,3462608072,2,(0,(1,4),2)
NC_PULSES,3286,3469543165,3469557613,93,70,200
FIFO_FULL,4738,3571045202,3,1
NC_PULSES,4848,3578734750,3578768009,94,58,206
CON_PULSE,6044,3662341109,3,2,3662381026,2,(0,(0,8),3)
CON_PULSE,6044,3662341109,2,4,3662381026,2,(0,(1,5),2)
NC_PULSES,6559,3698342174,3698393175,85,65,200
CON_PULSE,7476,3762445043,2,5,3762466512,1,(0,4)
CON_PULSE,7476,3762445043,1,3,3762466515,1,(0,2)
NC_PULSES,8491,3833398601,3833429056,112,78,200
NC_PULSES,10654,3984603078,3984613854,118,71,201
NC_PULSES,12294,4099247249,4099252435,105,65,204

Let's sanity-check the relative speeds at which the sync-pulse counter and PLL clock cycle counter are increasing.  

Currently, with the 375 MHz PLL clock in the CTU, its dual-edged counter is incrementing at 750 Mcps.  The low 18 bits will take 2^18 = 262,144 counts to roll over, which is 349.5253 us (i.e., we now have a 2,861.023 kHz frequency for the timing-sync pulses).

Meanwhile, my version of the FEDM is still clocked using a 200 MHz version of the PLL (students still are working on the speedup task), so the PLL counter increments every 5 ns.

OK, so let's look at the time interval between the first two NC_PULSES message above (each pair of NC_PULSES should have about the same time interval between them, if there's a steady rate of background pulses):

NC_PULSES,1618,3352941653,3352948689,115,68,203
NC_PULSES,3286,3469543165,3469557613,93,70,200

So, in terms of the timing-sync pulses, we have 3,286 - 1,618 = 1,668 of those.  Multiplying by 349.5253 us gives 583.0082 ms between the last pulses reported in each of these two reports.  (This implies that the approximate pulse rates on the three input channels was 160 Hz, 120 Hz, and 343 Hz respectively.)

In terms of the PLL cycles, we have 3,469,543,165 - 3,352,941,653 = 116,601,512 of those.  Multiplying this by 5 ns gives 583.0076 ms.

In other words, the figures are only different by only roughly 1.03 ppm, which is in line with the specified frequency calibration of the FEDM board clock, which IIRC was about 1 ppm.  The additional excess could come from other sources, such as the OCXO clock, whose frequency calibration is usually off by about 1.3 ppm.  (Actually, this result suggests that the FEDM board clock and the OCXO clock have their frequencies off in the same direction, so that the relative discrepancy is less than that for the OCXO by itself.)

Anyway, this result is a good validation test, and is evidence that the FEDM is still not missing any timing-sync pulses (if it even missed one, then the two times would be off by hundreds of ppm).

It would probably be a good idea at some point to do a diagnostic graphical display showing the rates of non-coincidence pulses on the three detectors as a function of time.

Perhaps I'll start an overnight data-collection run as a test?  Actually, let me wait and not do it right now - first I want to actually burn the new FEDM code onto the board.  Starting a Quartus compile in Q:\ to prepare the new programming files.

Of course, at this point, all of the FEDM messages are still unrecognized by the server, so before we can do anything with the FEDM data (like graphing it) we have to remedy that situation.  More server programming...

Juan & Aarmondas have been working on trying to get the design partitions working in the FEDM.  They ran into some fitting issues which Aarmondas is working on.

Saturday, March 10, 2012

Sat., Mar. 10th

Today I wrote markups in red ink on a printout of the journal article draft, scanned them into a PDF, and emailed them to Ray, Darryl, and David.  Hopefully we can go over the changes on Tuesday and then start in on the next round of improvements.

Friday, March 9, 2012

Fri., Mar. 9th

To do today:  Debug new rising-edge time-capture datapath in CTU.  Yesterday it was not triggering interrupts.  The datapath could be stalling.  The raw counter bits appear to be working (3 of them, bits 24-26, are sampled on blue LEDs).

Added current-state bits of the rise-cap module as an extra debug output from the high-speed logic module.  Still need to wire them to an output port...  Also wiring up the handshake signals from pulse-capture to cs-combine.

Weird; no state change, no handshake.  Added taps for reset and enable signals.  Enable never goes high!  But why?  Tapping out 5 bits of the rise_sum register (bits 27-31), these bits should change every second.

After adding all these diagnostics my Fmax crept back below 350 MHz at some point.  Changing PLL speed temporarily to 300 for debugging.

OK, now it's working (although still not very reliably, and I still don't see anything on the scope) - seems like I have to try a more or less random sequence of STOP/GO/START/RESET/RESTART commands to get it to actually capture & count PPS pulses.  And even then, the datapath still seems to get hosed after a while to where it no longer responds.

Took the reset/enable out of the synchronizer chain for the PPS input; it's possible that was causing some problems due to the possibility of coming out of reset in the middle of a (half-second-long) PPS high period and perceiving that as a rising edge.

I think it might be fixed now; not sure though; needs more testing.

Now trying to get Fmax back above 350 MHz, with the PLL compiled at 350.  Did the thing of declaring other modules empty temporarily, then adding them back in.  Cool; now it's saying 376-408 MHz.  (Weird that it's faster.)  Wonder if it's worth trying 375 MHz as the PLL speed?  That would correspond to 750 Msps or a 1.33 ns resolution.  Perhaps we shouldn't get greedy, though.  Make sure this works first.

Duh, figured out why the scope wasn't reading anything - the digital read-in cable just wasn't plugged in all the way.

The thing seems to be working fine at 350 MHz.  This would be a good time to backup the project.  OK, it's copied to local file C:\LOCAL\Quartus_projects\q9v1sp2\GPS_FPGA_app.

With PLL @ 375 MHz we're now getting Fmax = 371-400 MHz.  We didn't quite make it at the hot corner, but I'll try it - it may work anyway since the junctions are unlikely to be as hot as 85 C.

Tested & works @ 375 MHz!  So we are now at 750 Msps sampling rate (or counts per second), a.k.a. 1.333 ns time resolution (a.k.a. +/- 0.67 ns maximum time measurement error).


Thursday, March 8, 2012

Thu., Mar. 8th

Modified cs_combine_tsedge_56.vhd to a new module cs_combine_56_re.vhd, with substantial rewriting.  Using it in new CTU gelware COSMICi_DE3_GPSapp_top_pde_0v1.bdf.

Also doing top-level wiring of Nios core.  I had to modify the PIOs a bit, because I am now using a separate producer-consumer handshaking signal pair to communicate with cs_combine_56_re and the CPU.  (Previously the CPU got interrupted whenever the low word of the PPS rise-time register changed.)  Also, the data width is now 56 bits instead of 48.  Making corresponding firmware changes as well.  The new time-capture performance should be 700 Mcps (million counts per second) as opposed to the 10 Mcps we were getting previously, for a 70x improvement.

Oops, the speed of my logic-locked components wasn't preserved!  Perhaps because I forgot to turn on "Compilation Process Settings --> Incremental Compilation --> Compatible Placement and Routing".  Blah.

OK, let's try it again.  Remove the high-speed thingy from Logic Lock, delete all the low speed stuff, recompile, put it back in logic-lock, recompile, turn on compatible placement/routine, add high-speed stuff back in, recompile, done (hopefully).  If that doesn't work, then we might have to look into creating design partitions.

Maybe I'll go ahead and try the partitions thing.  You just do Alt-D to bring up the Design Partitions window.  Then you drag the module into <<new>> and it creates a new partition for it under the top partition.  Under "Compilation," we set Netlist type to Post-Fit and set the Fitter preservation level to "Placement, routing, and high-speed tiles" for maximum preservation.

I'm not yet sure whether it's also necessary to move all the other components into a different partition.  Let me do that anyway, just in case.

Aha, I think I found a key.  Set the other partitions' Netlist type to "Empty", which puts placeholders in place of them.  This beats the hell out of all the manual crap we were doing before.

Getting closer!  338-367 MHz.  Let's see if we can get the slow corner up.  Put the IOV module into its own empty partition, and try 4.0 placement effort multiplier.

YES!!!  362.58 MHz (hot) to 393.39 MHz (cold).

Next, we'll put the high-speed module into the root Logic Lock region (is this even necessary?), and recompile with just that change - based on past experience, this should give the same result.  It did.

Then, we'll tell the Partitions thingy to please preserve the post-fitting netlist with all placement and routes for the high-speed module and change the other modules to non-empty.

Success!  Fmax was exactly preserved after adding all the other components back in.  The timing analyzer is complaining in red that setup time constraints are violated for signal paths crossing between timing domains, but that's not a "real" error, since this is accounted for in the design.  There's probably a way to tell it to ignore those paths, but no biggie.

Next up, I guess, is testing.  I doubt everything will work perfectly (I made a lot of changes with no testing yet), but I suppose it's worth a quick try.

Ah, the counter LEDs flash after "HOST START," but there is some bug with the way I am fiddling with control bits.  I think I've fixed that now.  Let's try the new firmware within the IDE.

There's still a problem somewhere - I suspect with the interrupt setup.  Lights flash and main loop responds, but no PPSCNTR messages after the initial one.

Hm, checked over the interrupt-related code and it all looks good.  So maybe the problem's not there after all.  Too bad.

Oh well, there might be a bug somewhere in one of my new gelware modules.  We'll do some lower-level debugging another day.

Wednesday, March 7, 2012

Wed., Mar. 7th

Continue fiddling with CTU time resolution.  

I didn't yet try 56 bits WITH the enable fanout pipeline.  364-390 MHz.  Finally, a good bit better than originally.

Let's try setting Fitter -> Optimize Multi-Corner Timing.  347-379 MHz.  This actually did worse!  Possibly it is doing better at the fast corner.  Anyway, we probably want to optimize the worst case instead (especially since the timing analyzer isn't even calculating Fmax for the fast corner).

Let's try Analysis & Synthesis -> Perform WYSIWYG Primitive Resynthesis.  Still 364-390.  No improvement; so we'll turn that option back off.

Turn on Fitter -> More Settings -> Optimize Timing for ECOs.  Also in more fitter settings, turn placement & routing effort multiplier up from 1.0 -> 2.0.  Also set Router Timing Optimization Level to Maximum.  

249-272!!  Quite a lot worse!  What did I do wrong?

Let's back off the routing stuff (routing effort multiplier back to 1.0, Router Timing Optimization Level back to Normal).  340-371 MHz.  Weird.  Still not quite back to where we were.  

Let's turn the placement effort multiplier back to 1.0.  Back to 364-390.  Even though "Optimize Timing for ECOs" didn't help, I'll leave it on since it sounds good.

What if we turn the placement effort multiplier up to 4.0?  373-406!  Finally, better!

Now, let's try the routback to 1.0.er effort multiplier at 4.0.  360-392.  Worse!  What about 8.0?  Tried 2.0 again, 363-395.  Back to 1.0

Router Timing Optimization up to Maximum - 377-402.  The hot speed is a little faster but the cold speed is a little slower.  Still, 375 MHz --> 750 Msps, which is a 1+1/3 ns time resolution, or +/- 0.667 ns.  Not too shabby.

Let's now try a single-edge-triggered version.  Borrowing those files from the FEDM project.  

Wow, looks like Aarmondas didn't ever actually implement the sync_error output from the timing-sync datapath!  Commenting it out, and cs_bits.  Calling the new module se_rise_cap_56.vhd.

OK, the speed here is:
  • Hot (85C) - 876 MHz!! but restricted to 617 MHz due to high minimum pulse width
  • Cold (0C) - 951 MHz!! but restricted to 609 MHz due to high minimum pulse width
Given the restriction, neither is as fast in terms of samples per second as the dual-edge triggered version.  But to be fair, that version didn't have the reset/enable function.  Let's add it.  Actually, I take that back; Darryl never finished adding the reset functionality to the pulse-cap module!  Really, I should add it to both the single- and dual- edge-triggered versions of rise_cap_56.vhd.  

OK, now the top-level file for the single-edge-triggered version is:
  • COSMICi_DE3_GPSapp_top_se.bdf
    • Hot:  799.36 MHz restricted to 612.75 MHz
    • Cold: 860.59 MHz restricted to 606.8 MHz
And for the pseudo-dual-edge-triggered version is:
  • COSMICi_DE3_GPSapp_top_pde.pdf
    • Hot:   343.64 MHz (687.28 Msps)
    • Cold: 376.22 MHz (752.44 Msps)
OK, we went down quite a bit compared to our earlier 377-402 result, probably due to adding the resets.  But supposing we could run at 350 MHz, that's 700 Msps, still better than the single-edge-triggered version.

Trying changing Router Timing Optimization back to Normal.  Worse, 337-366.  Turn it back to Max.

Tried Placement Effort Multiplier at 8.0.  Much worse.

Trying Placement Effort Multiplier at 1.0.  Better!  (358-389 MHz).  At least it's a solid 700 Msps, or 1.43 ns time resolution, or about +/- 0.7 ns time error.  Possibly that's the best we're going to get with the reset & enable fully in there.

Let's modify the PLL clock speed target to 350 MHz (35x multiplier from the 10 MHz OCXO clock), since we definitely seem to be able to achieve that at least.  Next we'll need to do some testing.

One more thing to try first:  Take the enable-fanout pipeline out.  (You never know what permutations might speed things up.)  349-381 MHz.  No, that's worse; let's put it back in.

Just for fun, tried a placement effort of 2.0 again but it did worse.  Going back to 1.0.

OK, now changing the PLL clock speed target to 350 MHz.  We should be able to hit this target.

Now we've got 356-388 MHz.  Slightly worse than when we were shooting for 500 MHz, but still OK. At least we can meet the 350 target.

OK, now putting high_speed_logic_pde.bdf into the root LogicLock region.  Soon, we'll see if its speed keeps stable as we add more logic.

Compiling again with it in LogicLock just to make sure that didn't change anything.  Nope.

Now, what if I turn on "Compilation Process Settings -> Smart compilation" and "Comp. Proc. Settings -> Incremental Compilation -> Compatible placement and routing"?  Seems that might help since the Logic Locked stuff shouldn't be moved around anyway.

Shoot, I forgot to include the reset/enable logic in the dual-edge carry-save counter; need to do that now.  Will have to redo compile from scratch.

Also forgot the stuff to generate the timing-sync output pulse.  Added that back in too.

Currently we're AND'ing bits 6-17 of the "sum" outputs of the dual-edge-triggered carry-save counter; at 700 Mcps (million counts per second) this should be high for ~91.43 ns every ~373.5 us.  This will work even though the carry bits haven't yet been added in.  (As soon as the carries for the rollover start propagating in, the output of the AND will fall.)

Let's see what we get for speed now.  Met the target.  (355-382).

Oops, I added the Nios core in - need to take it back out & recompile.  Put it in my cut buffer temporarily.

Worse?  (327-351)  I swear, Quartus makes no frickin' sense sometimes...

OK, backed out from that change so we meet the target again.

Now putting the new version of high_speed_logic_pde back into the root LogicLock region.  No change in speed.  (355.49 MHz - 381.53 MHz).  Good.

I think that's a good stopping point for today.  Tomorrow I can finish wiring the Nios system back up to the high-speed module and test.  Oh, we also still need to add some version of cs_combine_56 (or whatever it was called) to add the sum and carry bits together; this is probably preferable to creating two more PLLs and doing it in software.

Tuesday, March 6, 2012

Tue., Mar. 6th

I am thinking today that it might be a good idea to work on speeding up the edge-capture datapath in the GPS app gelware.  Currently, I am just driving it directly from the 10 MHz OCXO clock.  But I should be able to use a 10x PLL to run it at 100 MHz, and maybe even a 50x PLL to run it at 500 MHz, and if I can get a dual-edge-triggered version of the high speed logic to run at that clock rate, that would give us true 1-ns resolution for the arrival times of the PPS edges from the GPS.  That takes away one source of uncertainty (the low 100-ns resolution of our previous measurements) when characterizing the precise relationship the OCXO vs. GPS timing references - it will make that particular journal paper a lot better (if we ever finish writing that one).

This can also be a good point of comparison for the LogicLock stuff the students are doing in the FEDM design.

I think I want to start by creating a new top-level file:  A schematic version of the current top-level file in Verilog, DE3_GPSapp.v.  This will allow us to drag-and-drop components into a LogicLock region.  I also just prefer a visual design for my top-level files.

Let's start by recreating all of the I/O ports, namely:

output                     CLK_OUT;
input                     EXT_CLK;
input                     OSC1_50;
input                     OSC2_50;
input                     OSC_BA;
input                     OSC_BB;
input                     OSC_BC;
input                     OSC_BD;

////////// LED //////////
output     [7:0]           LEDB;
output     [7:0]           LEDG;
output     [7:0]           LEDR;

////////// SEG7 //////////
output     [6:0]           HEX0;
output                     HEX0_DP;
output     [6:0]           HEX1;
output                     HEX1_DP;

////////// BUTTON //////////
input     [3:0]           Button;

////////// SW (SLIDE SWITCH) //////////
input     [3:0]           SW;

////////// DIP_SW (DIP SWITCH) //////////
input     [7:0]           DIP_SW;

////////// MAX1619 (TEMPERATURE SENSOR) //////////
output                     TEMP_CLK;
inout                     TEMP_DATA;
input                     TEMP_INTn;

////////// GPIO0 (J13, GPIO 0) //////////
inout                     GPIO0_CLKINn0;
inout                     GPIO0_CLKINp0;
inout                     GPIO0_CLKOUTn0;
inout                     GPIO0_CLKOUTp0;
inout     [31:0]           GPIO0_D;

////////// GPIO1 (J14, GPIO 1) //////////
inout                     GPIO1_CLKINn1;
inout                     GPIO1_CLKINp1;
inout                     GPIO1_CLKOUTn1;
inout                     GPIO1_CLKOUTp1;
inout     [31:0]           GPIO1_D;
//inout [31:23] GPIO1_D;
//input [22] GPIO1_D;
//inout [21:0] GPIO1_D;


////////// REGULATOR //////////
output                     JVC_CLK;
output                     JVC_CS;
input                     JVC_DATAIN;
output                     JVC_DATAOUT;

Now adding the structural contents.  OK, I finished putting in that schematic.  Fixed one typo and it compiles.  Here is a bird's eye view of that schematic (hard to see w/o zooming in):

Initial version (v0.0) of the schematic version of the CTU gelware.
I really should test it, but we can always put that off until the point if/when we're trying to diagnose problems in later versions of the design.  For now, I will just archive this version of the file as _top_0v0 and proceed with my modifications.

The next thing we want to do is create the PLL slaved to the OCXO clock.  Let's jump in and try a 50x multiplier.  OK, created that PLL (wizard says feasible) and having it drive CLK_OUT as a test.

P.S. - All this work today is happening in my local working directory C:\f\DE3\S3\SB+SOPC\GPS_FPGA_app\Quartus_II_Project\DE3_GPSapp\ on COSMICi.  I should make a backup on Dropbox soon.

OK, it compiles.  Need to get the timing analysis settled though.  Opened TimeQuest.  Created a new SDC file, COSMICi_DE3_GPSapp_RevC.sdc.  Looks like it did the right thing, based on the PLL parameters.   Added a "derive_clock_uncertainty" command, so we don't get warnings about the clock uncertainty not being set.  Let's redo the compilation & analysis with the new SDC file.  Oops, looks like the SDC file generator assumed that the period of the board clock was 100 ns instead of the actual 20 ns - fixed that.  Oops, that happened b/c I was feeding the wrong clock to the PLL!  OK, after some fiddling around with the SDC file, both input clocks are correctly entered now.

For some reason, we are failing the timing analysis with a -0.877 ns hold time violation on OSC2_50 (the 50 MHz board clock).  Maybe I should take out all my logic and try again.  :)

OK, with all the core logic removed, we meet the timing constraints.  When I go through adding the logic back in, I will have to be careful about that.

The Fmax at 85C is 820.34 MHz for the 50 MHz clock.  How do I get it to report it for the PLL clock?
Ah, it's not b/c no registers are being generated for the PLL clock at the moment.

OK, next I am going to create a module called "high_speed_logic.bdf" that will contain all the high-speed parts of the design, to facilitate logic-locking.

OK, now that module has just the PLL and a 64-bit dual-edged carry-save counter.  The output of that module feeds to a couple of 64-input AND gates for dummy output purposes.

OK, now Fmax for the PLL is 503.52 MHz at 85C and 545.85 MHz at 0C.

What happens if we turn on aggressive optimizations?

* Fitter - Go from Auto Fit to Standard Fit.
* Compilation Process/Physical Synthesis Optimizations - Turn on Perform Physical Synthesis for Combinational Logic, Perform Register Retiming, Extra Effort level, Perform Register Duplication.
* Analysis & Synthesis - Optimization for Speed, Timing-Driven Synthesis.

OK, after all that, the 85C speed is 548.25 MHz and the 0C speed is 590.32 MHz (would have been 603.14 MHz without the minimum pulse-width constraint).

In other words, this is reaching almost 1.2 GHz update rate for the counter.

However, we still need to add the input-capture logic.  Let's get the original version of pulse_cap.vhd from Q:\ and trim out the stuff to capture the falling edge.

pulse_cap needs pde_dff2, pde_dff_en, pde_reg_en, and pde_shift_reg.  Copy those too.

We might also need the stuff to pipeline the fanout of the enable signal, for this refer to Q:\se_pulse_cap_tsedge_56.vhd.

OK, now I've added pulse_cap to the high-speed logic module.  Haven't pipelined the enable signal yet; also haven't taken out the falling-edge capture logic.  Let's try it though.

Oops, now we're down to 356-390 MHz.  Let's try pipelining that enable signal.

While I'm at it, I also removed the extra state & logic to capture the fall time, and renamed the module to pde_rise_cap_64.vhd.  (Pseudo-dual-edge-triggered rising-edge time-capture module, 64-bits wide.)

326-357 MHz.  We got worse, not better!  Ouch!  However, it's worth noting that the update rate for 350 MHz is 700 Msps, since this design is dual-edge triggered.  Anyway...  I guess I should back out from the pipelined enable, huh...

What the hell, let's stick this puppy in a logic-lock region, just to see if it makes any diff.  Nope, it didn't make any difference at all - guess I already had enough optimization settings turned on.

OK, taking out the pipelined enable now.  Some more things to try:  (1) Go back to single-edge triggered.  (2) Cut down from 64 bits width to 56 bits width.  Not sure why the latter would help tho.

Now we're up to 338-371 MHz, a bit better, but still less than before I removed the falling-edge capture logic?  This makes no sense at all.

Let's first try going to 56 bits, since that's a pretty easy change.

Now we got 357-393.  Only very slightly better than before I started "simplifying" the logic!

I think I'm going to have to revert to a single-edge-triggered design...  On the other hand, 700 Msps isn't too shabby...  It's a heckuva lot better than the 20 Msps (50 ns) we are getting currently.

Actually no, we're only getting 10 Msps (100 ns resolution) because we're not even using the pseudo-dual-edge registers in the current CTU design.  So it would be an improvement by a factor of 70.


Monday, March 5, 2012

Mon., Mar. 5th

Plan for today:  Test & debug my new GPS initialization code.

Some major tasks I could work on this week:
  • [ ] Plan/specify interfaces to some new code modules for the students to work on.
  • [ ] Start design of new OCXO board.
OK, let's set up the test.
  1. Archive log files.
  2. Start server.
  3. Power up CTU's WiFi/DE3 (in that order, quickly; GPS already powered on).
DE3's UART output (as wirelessly relayed to server) is:

HOST_STARTING,CTU_GPS,1.9
HOST_READY
$ACK,WIFI_READY*60
$GPRMC,184801.999,A,3025.668,N,08417.098,W,2.4,353.1,050312,4.0,W*61
$GPGGA,184801.999,3025.66805,N,08417.09799,W,1,05,1.4,080.70,M,-29.7,M,,*50
$PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
$PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A
...

So the GPS already has satellites (I left it turned on and plugged into the USB over the weekend), so there's no real initialization to be done right now.  However, I should probably still do a test a little later on to exercise the initialization code for real.  I can try power-cycling the GPS on startup; I can try manually sending commands for warm and cold restarts before startup so it starts with less info.

On console we see:

|------------------------------------------------------------|
|  Node 0 log started.                                       |
|VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV|
Node 0 reports its MAC address is 00:1E:3D:33:ED:CF.
Node 0 turned on at Mon Mar 05 13:47:55 2012 + 552 ms.
Starting AUXIO server for node 0 on port 52737...
Starting UART server for node 0 on port 63766...
Node 0 reports that its bridging mode has changed to NONE.
Node 0's bridge mode is now NONE.
Node #0's host (type CTU_GPS, firmware version 1.9) is starting up...
Node #0's host is ready to accept commands.
Node 0 reports that its bridging mode has changed to TREFOIL.
Node 0's bridge mode is now TREFOIL.
 WARNING: GPS_Module._GPRMC_Record.extract_datetime(): Reported time was offset -0.001 from the exact start of a second.  Rounding off and ignoring...
 WARNING: GPS_Module._GPRMC_Record.extract_datetime(): Reported time was offset -0.001 from the exact start of a second.  Rounding off and ignoring...
The GPS module is receiving signals from 5 satellites.
 WARNING: GPS_Module._GPRMC_Record.extract_datetime(): Reported time was offset -0.001 from the exact start of a second.  Rounding off and ignoring...
...

So this is fine given the data.  However, I might want to consider changing the warnings to info messages, since otherwise, they might start to get kind of annoying after a while.

Next step:  Engage POSHOLD and TRAIM codes - those methods haven't been implemented yet.

OK, wrote that code; let's test it.

Note: We should send "HOST UNMUTE" after getting HOST_READY to make sure that the host will go ahead and start relaying GPS data.  (It might not if it didn't catch the WIFI_READY.)

Would it be possible to have the Wi-Fi module respond to PINGs from the host on its UART port with WIFI_NOT_READY or some such, if such messages are received before the server connection is established?

Realized there's an issue where the PDME commands to enable POSHOLD and TRAIM actually return two replies, one that's just "OK" and another that echoes their arguments.  Really we should check the extended one to make sure the arguments match, but we're not doing that yet.

Next, we really need to do something with the TRAIM alarm codes.  OK, now I'm examining them and displaying various warnings if things are awry.

For some reason, today the CTU's Wi-Fi module seems slow to acquire a connection.  It's almost like the router's acting up.  Should I maybe try a different router?

OK, we're now getting various warnings, info messages, & normal reports from the GPS_Manager._checkTRAIM() fanboy (subscription delivery address callback) method .

Adding some attributes to GPS_Module based on contents of TRAIM line.

I need to figure out whether the VALID bit reported by TRAIM refers to whether the TRAIM algorithm is enabled or whether the solution is valid.

One major outstanding issue:  What exactly is the TRAIM algorithm doing, and how accurate are its timing solutions really?  How much are things being screwed up by our antenna being in the window?  Even if that weren't an issue, don't we really need to validate these absolute times by comparison with another timing reference?  We really should address this someday if we are really going to compare our data with that collected by other sites.

Tomorrow:  Work on code to automatically start up the FEDM once we're confident that we're getting a reasonable quality timing signal from GPS.

Sunday, March 4, 2012

Sun., Mar. 4th

Decided to spend some time today getting the latest server code changes to compile, and exercise them a little bit by mocking up some of the expected CTU behavior through UwTerminal.  For reference, here is the transcript from the start of the most recent run:


----------------------------------------------------------------------
At Fri Mar 02 16:25:49 2012 + 432 ms opened node0.uart.trnscr transcript...


Fri Mar 02 16:25:54 2012 + 321 ms: < 
Fri Mar 02 16:25:54 2012 + 323 ms: < HOST_STARTING,CTU_GPS,1.9
Fri Mar 02 16:25:54 2012 + 332 ms: < HOST_READY
Fri Mar 02 16:25:56 2012 +  62 ms: < $ACK,WIFI_READY*60
Fri Mar 02 16:25:56 2012 + 369 ms: < $GPRMC,212556.000,A,3025.675,N,08417.096,W,0.2,0.0,020312,4.1,W*6F
Fri Mar 02 16:25:56 2012 + 370 ms: < $GPGGA,212556.000,3025.67532,N,08417.09555,W,1,04,3.3,083.30,M,-29.7,M,,*53
Fri Mar 02 16:25:56 2012 + 372 ms: < $PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
Fri Mar 02 16:25:56 2012 + 374 ms: < $PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A
Fri Mar 02 16:25:56 2012 + 664 ms: < $ACK,GPS $PDME,1*24
Fri Mar 02 16:25:58 2012 + 111 ms: < $PDMEHEADER1: DeLORME GPS2058_HW_1.0.1
Fri Mar 02 16:25:58 2012 + 115 ms: < $PDMEHEADER2: DeLORME GPS2058_FW_2.0.1
Fri Mar 02 16:25:58 2012 + 155 ms: < $GPTXT,COSMICi Custom_Config_0.0.3
Fri Mar 02 16:25:59 2012 +  28 ms: < $GPGGA,212558.749,3025.67523,N,08417.09543,W,0,00,99.0,083.67,M,-29.7,M,,*67
Fri Mar 02 16:25:59 2012 +  29 ms: < $PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
Fri Mar 02 16:25:59 2012 +  30 ms: < $PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A

OK, code compiles (after fixing one minor syntax error).  Now exercising with the following mock_CTU_output.txt file, going thru fixing runtime errors...

HOST_STARTING,CTU_GPS,1.9
HOST_READY
$ACK,WIFI_STARTING,v0.19*67
$ACK,WIFI_READY*60
$GPRMC,212556.000,A,3025.675,N,08417.096,W,0.2,0.0,020312,4.1,W*6F
$GPGGA,212556.000,3025.67532,N,08417.09555,W,1,04,3.3,083.30,M,-29.7,M,,*53
$PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
$PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A

OK, now I get the following transcript:

----------------------------------------------------------------------
At Sun Mar 04 16:12:00 2012 + 437 ms opened node0.uart.trnscr transcript...

Sun Mar 04 16:12:27 2012 + 406 ms: < HOST_STARTING,CTU_GPS,1.9
Sun Mar 04 16:12:27 2012 + 421 ms: < HOST_READY
Sun Mar 04 16:12:27 2012 + 453 ms: < $ACK,WIFI_STARTING,v0.19*67
Sun Mar 04 16:12:27 2012 + 453 ms: < $ACK,WIFI_READY*60
Sun Mar 04 16:12:27 2012 + 468 ms: < $GPRMC,212556.000,A,3025.675,N,08417.096,W,0.2,0.0,020312,4.1,W*6F
Sun Mar 04 16:12:27 2012 + 484 ms: > HOST GPS $PDME,9,30.428236,-84.285,40,2012,3,4,21,12,27.468
Sun Mar 04 16:12:27 2012 + 484 ms: < $GPGGA,212556.000,3025.67532,N,08417.09555,W,1,04,3.3,083.30,M,-29.7,M,,*53
Sun Mar 04 16:12:27 2012 + 500 ms: < $PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
Sun Mar 04 16:12:27 2012 + 500 ms: < $PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A

and the following in the console:

|------------------------------------------------------------|
|  Node 0 log started.                                      |
|VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV|
Node 0 reports its MAC address is 00:1E:3D:33:FE:0D.
Node 0 turned on at Sun Mar 04 16:12:00 2012 + 359 ms.
Starting AUXIO server for node 0 on port 52737...
Starting UART server for node 0 on port 63766...
Node 0 reports that its bridging mode has changed to NONE.
Node 0's bridge mode is now NONE.
Node 0 reports that its bridging mode has changed to TREFOIL.
Node 0's bridge mode is now TREFOIL.
Node #0's host (type CTU_GPS, firmware version 1.9) is starting up...
Node #0's host is ready to accept commands.
 WARNING: GPS_Manager._checkTime(): GPS time is more than 10 seconds behind system time.

This is correct as far as it goes, since the time in the file is indeed out of date, so it is appropriate that the server responds with a $PDME,9 command to attempt to correct the time.

OK, now the test input file includes a fake $PDME,9,OK acknowledgement:

HOST_STARTING,CTU_GPS,1.9
HOST_READY
$ACK,WIFI_STARTING,v0.19*67
$ACK,WIFI_READY*60
$GPRMC,212556.000,A,3025.675,N,08417.096,W,0.2,0.0,020312,4.1,W*6F
$GPGGA,212556.000,3025.67532,N,08417.09555,W,1,04,3.3,083.30,M,-29.7,M,,*53
$PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
$PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A
$PDME,9,OK

and produces this UART transcript:

----------------------------------------------------------------------
At Sun Mar 04 16:32:53 2012 + 890 ms opened node0.uart.trnscr transcript...

Sun Mar 04 16:33:07 2012 + 156 ms: < HOST_STARTING,CTU_GPS,1.9
Sun Mar 04 16:33:07 2012 + 171 ms: < HOST_READY
Sun Mar 04 16:33:07 2012 + 187 ms: < $ACK,WIFI_STARTING,v0.19*67
Sun Mar 04 16:33:07 2012 + 203 ms: < $ACK,WIFI_READY*60
Sun Mar 04 16:33:07 2012 + 203 ms: < $GPRMC,212556.000,A,3025.675,N,08417.096,W,0.2,0.0,020312,4.1,W*6F
Sun Mar 04 16:33:07 2012 + 203 ms: > HOST GPS $PDME,9,30.428236,-84.285,40,2012,3,4,21,33,7.203
Sun Mar 04 16:33:07 2012 + 218 ms: < $GPGGA,212556.000,3025.67532,N,08417.09555,W,1,04,3.3,083.30,M,-29.7...
Sun Mar 04 16:33:07 2012 + 218 ms: < $PDMETRAIM,2,0,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0*43
Sun Mar 04 16:33:07 2012 + 218 ms: < $PDMEPOSHOLD,0,0000.000,N,00000.000,E,000.00*4A
Sun Mar 04 16:33:07 2012 + 218 ms: < $PDME,9,OK

and this console output:

|------------------------------------------------------------|
|  Node 0 log started.                                       |
|VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV|
Node 0 reports its MAC address is 00:1E:3D:33:FE:0D.
Node 0 turned on at Sun Mar 04 16:32:53 2012 + 796 ms.
Starting AUXIO server for node 0 on port 52737...
Starting UART server for node 0 on port 63766...
Node 0 reports that its bridging mode has changed to NONE.
Node 0's bridge mode is now NONE.
Node 0 reports that its bridging mode has changed to TREFOIL.
Node 0's bridge mode is now TREFOIL.
Node #0's host (type CTU_GPS, firmware version 1.9) is starting up...
Node #0's host is ready to accept commands.
 WARNING: GPS_Manager._checkTime(): GPS time is more than 10 seconds behind system time.
The GPS module is receiving signals from 4 satellites.
Heartbeat #1 received from node 0 at Sun Mar 04 16:34:07 2012 + 296 ms.

So now it is also correctly parsing the number of satellites out of the GPGGA message.

Further testing will be difficult without the real GPS module in place, since the initialization algorithm's behavior depends on the timing of responses from the GPS module, which is difficult to emulate by manually streaming text to the Wi-Fi board using UwTerminal.  So, we'll just wait to do further testing until in lab tomorrow when we can just test with the real GPS module.