Thursday, February 9, 2012

Thu., Feb. 9th

Stopped by lab briefly just to get oriented.

Planning to meet Ray and David (and Darryl, if he can make it) tomorrow at my office to have a bag-lunch meeting to go over the paper.

I might have time right now to fix the UNK_CMD errors that the CTU firmware currently gives in response to the WIFI_STARTING and WIFI_READY messages.

A slightly longer-term goal is fix both the CTU and the FEDM firmware so that they both wait for the WIFI_READY before they initiate their normal operation sequence.  In addition, the CTU firmware should wait for notification that the FEDM is ready before it begins to send timing-sync pulses.  See my blog post of Jan. 30th for more discussion of the new startup sequence.

OK, added code to the CTU firmware to process WIFI_STARTING and WIFI_READY.  Also, the initial delay on power-up is gone; instead we start with the GPS pass-thru muted and later we'll only unmute it in response to WIFI_READY.  Also, we don't start up the OCXO counter on powerup now - we'll wait to do that until we receive a "START" command - of course, the server code to send that command (and the Wi-Fi script code to pass it thru to us, and the firmware code to handle it) still needs to be written.

OK, now the UART bridge data received by the server after CTU start-up is:


----------------------------------------------------------------------
At Thu Feb 09 18:25:00 2012 + 847 ms opened node0.uart.trnscr transcript...


Thu Feb 09 18:25:06 2012 + 470 ms: < $ACK,*65
Thu Feb 09 18:25:06 2012 + 471 ms: < $ERR,UNK_CMD,*00
Thu Feb 09 18:25:07 2012 + 930 ms: < $ACK,
Thu Feb 09 18:25:07 2012 + 931 ms: < *68
Thu Feb 09 18:25:07 2012 + 933 ms: < $ERR,UNK_CMD,*00
Thu Feb 09 18:25:07 2012 + 934 ms: < $ACK,WIFI_READY
Thu Feb 09 18:25:07 2012 + 935 ms: < *6D

What's going on here, I think is this:
  • The NULL character emitted by the Wi-Fi board on power-up is getting interpreted as an empty command line, which is then acknowledged by the firmware, which then generates an error because an empty command line is an unknown command.  Solution:  Change the firmware to strip all nulls out of the input character stream and ignore them.
  • The Wi-Fi board emits a NEWLINE character (actually CR+NL) just before it sends WIFI_READY, to ensure that WIFI_READY begins a new line.  This CR/NL is getting picked up by the firmware and acknowledged and treated as an unknown command.
  • Finally, at the end of the WIFI_READY, that line is terminated by another CR/NL, and the firmware isn't stripping off the CR (I think) before returning the AC.
So just this small bit of output illustrates three bugs.  These are easily fixed however.  Do that tomorrow (or whenever I'm next in the lab).

Another observation:  We don't echo the WIFI_STARTING line, not sure why that is.  Perhaps it is sent too early, before the serial input is even opened?  Hm, not sure that makes sense.  Tested again; this time I got:

----------------------------------------------------------------------
At Thu Feb 09 18:35:23 2012 + 149 ms opened node0.uart.trnscr transcript...

Thu Feb 09 18:35:28 2012 + 827 ms: < 
Thu Feb 09 18:35:28 2012 + 828 ms: < HOST_STARTING,CTU_GPS,1.9
Thu Feb 09 18:35:28 2012 + 830 ms: < HOST_READY
Thu Feb 09 18:35:28 2012 + 830 ms: < $PDME,21,OK*1B
Thu Feb 09 18:35:28 2012 + 831 ms: < $PDME,22,OK*18
Thu Feb 09 18:35:28 2012 + 831 ms: < $ACK,WIFI_STARTING,v0.18*66
Thu Feb 09 18:35:28 2012 + 832 ms: < $ERR,BAD_CHK,[$GPRMC,212057.015,V,3025.676,N,08417.112,W,0.0,0.0,161211,4.1,W
Thu Feb 09 18:35:28 2012 + 832 ms: < $GPRMC,212058.020,V,3025.676,N,08417.112,W,0.0,0.0,161211,4.1,W*7B]*40
Thu Feb 09 18:35:28 2012 + 833 ms: < $ERR,BAD_CHK,[$GPGGA,212058.020,3025.67587,N,08417.11218,W,0,00,99.0,051.73,M,M,,*6E]*1B
Thu Feb 09 18:35:30 2012 + 289 ms: < $ACK,
Thu Feb 09 18:35:30 2012 + 290 ms: < *68
Thu Feb 09 18:35:30 2012 + 290 ms: < $ERR,UNK_CMD,*00
Thu Feb 09 18:35:30 2012 + 291 ms: < $ACK,WIFI_READY
Thu Feb 09 18:35:30 2012 + 292 ms: < *6D

We can see some of the same errors, plus a couple of others.  The CTU reports it got a couple of lines with bad checksums from the GPS module.  It looks like a GPRMC message was interrupted by another one!  Then, I'm not sure what's wrong in the GPGGA message.  However, at least this time we do see the HOST_STARTING & HOST_READY messages, and acknowledgement of the WIFI_STARTING message.  Let's do one more run for good measure:

The discovery protocol seems to be having problems a lot.  Perhaps I should increase the frequency with which the broadcast messages are sent by the server?  Say from 1/sec. for 10 seconds to 10/sec. for 10 seconds?  And the frequency with which the autorun script retries rebroadcasting the query?  Say from once every 2 seconds to twice a second?  And, the autorun script COULD send a diagnostic message to the host when it's having trouble connecting, and the host COULD display a corresponding error code on the 7-segment display, which might help us diagnose what's going on when we have problems of this sort.  Not sure that's worth the trouble, though...  Anyway, let's worry about that tomorrow.

Anyway, finally got another run to go through, and the results were basically the same as the last one:

----------------------------------------------------------------------
At Thu Feb 09 18:49:30 2012 + 147 ms opened node0.uart.trnscr transcript...

Thu Feb 09 18:49:35 2012 + 900 ms: < 
Thu Feb 09 18:49:35 2012 + 901 ms: < HOST_STARTING,CTU_GPS,1.9
Thu Feb 09 18:49:35 2012 + 902 ms: < HOST_READY
Thu Feb 09 18:49:35 2012 + 903 ms: < $PDME,21,OK*1B
Thu Feb 09 18:49:35 2012 + 903 ms: < $PDME,22,OK*18
Thu Feb 09 18:49:35 2012 + 904 ms: < $ACK,WIFI_STARTING,v0.18*66
Thu Feb 09 18:49:35 2012 + 905 ms: < $ERR,BAD_CHK,[$GPRMC,212104.014,V,3025.676,N,08417.112,W,0.0,0.0,161211,4.1,W
Thu Feb 09 18:49:35 2012 + 905 ms: < $GPRMC,212105.020,V,3025.676,N,08417.112,W,0.0,0.0,161211,4.1,W*72]*3F
Thu Feb 09 18:49:35 2012 + 906 ms: < $ERR,BAD_CHK,[$GPGGA,212105.020,3025.67587,N,08417.11218,W,0,00,99.0,051.73,M,-29*67]*0B
Thu Feb 09 18:49:37 2012 + 345 ms: < $ACK,
Thu Feb 09 18:49:37 2012 + 346 ms: < *68
Thu Feb 09 18:49:37 2012 + 347 ms: < $ERR,UNK_CMD,*00
Thu Feb 09 18:49:37 2012 + 348 ms: < $ACK,WIFI_READY
Thu Feb 09 18:49:37 2012 + 349 ms: < *6D

We'll worry tomorrow about dealing with the various problems illustrated here.

At home this evening, I:

  • Sped up the broadcast protocol roughly along the lines of what I described above.  Now the server broadcasts every 100 ms for 5 s after receiving a query.  The Wi-Fi script checks for replies every 50 ms 20 times, or in other words over 1 sec., before it resends the query.  It re-sends the query 20 times.  If it doesn't receive the server IP in 20 seconds, it gives up...  (It might still work if the correct server IP was hard-coded.)
  • Cleaned up the output to the host.  There now are no carriage returns, and there is a line feed both before and after the WIFI_STARTING, WIFI_READY, and WIFI_EXIT command lines.  I am toying with a new WIFI_ERROR message to inform the host of certain serious error conditions (although I'm not sure what the host is supposed to do about it).
Try these changes in lab tomorrow.  Remember to switch the site back to lab in the script.

No comments:

Post a Comment