Juan came in this morning, & we think we got USB 2.0 support installed on VirtualBox, but we weren't sure how to test. Emailed Sachin & Vaibhav for guidance. To help with testing, Mike installed wires on the screw terminals of the board for a more reliable power connection (pic below).
In the meantime, while awaiting instructions, we just went ahead and ran a couple of the .vi files. They seemed like they might be working, but it is hard to tell since we don't know what they are supposed to do exactly. Here is one of them. We may try supplying a pulse from a waveform generator to one of the PMT inputs, and see if we get anything out.
Juan is also working on installing XP on a partition of the Acer, as a secondary option. However, he is being slowed now by persistent problems with the on/off power button on the machine. At the moment, we are unable to turn the power on. Based on gradual degradation of the button's reliability over time, I think it's a mechanical problem. We opened the case and looked at the power switch mechanism but it doesn't seem to be user-serviceable. We are probably going to have to return it to the manufacturer for a warranty repair/replacement. Hopefully they can preserve the hard drive. Or maybe we should just send it back without the drive.
Actually, we just now managed to "hot-wire" it using the connector to the front panel (hanging loose in middle of image below) - blue wire and white wire 2nd from end closest to blue wire; i.e., pins 2 and 4 counting from that end. Only a momentary contact is required.
The XP install on the Acer is almost finished - we are just waiting now for Juan to email the license key.
Meanwhile, Ray wanted me to check on Xmath status from LeCroy, but I can't log into iRattler - I think they must have expired my FAMMAIL password. Need to fix. OK, did that, but the budget check on XMath is still not working. Asked Ray to try it from his account.
Meanwhile, my data-collection run that I started last Friday is still going. Going to try to let it continue for a few more days. Still seeing some weird behavior of the LED color codes though, not always matching what's expected given the GPS data. Reviewed my code & it looks correct. But I don't want to interrupt my run to debug this, so I guess I'll just worry about it later.
Besides all the above, my major task for this week: Start working on graphs in Scilab.
Monday, February 28, 2011
Friday, February 25, 2011
Plans looking forward
Most of this week was shot, due to Mike's eye surgery, other medical issues, and then catching up on various things that had fallen behind due to all that.
Gordon did, however, generate result data from runs (13) and (14). Also, he reported that there were no serial errors detected in these runs. Yay!
My plan looking forward is:
Gordon did, however, generate result data from runs (13) and (14). Also, he reported that there were no serial errors detected in these runs. Yay!
My plan looking forward is:
- Start another data-collection run this afternoon to go over this weekend (& for as long as possible afterwards, maybe get a whole week this time).
------------------------------------------------------------- - Do graphs in SciLab based on data runs we have so far.
------------------------------------------------------------- - Now that we have the FEDM board, maybe fiddle around with Sachin's code and see if I can add a Nios to it, for purposes of doing data transfer via the serial port (as we had originally intended).
------------------------------------------------------------- - Also look at LabView's capabilities in more detail, and consider what's involved in doing the data analysis and results plotting in it instead of in Python (as I was originally planning).
Wednesday, February 23, 2011
Some updates...
Things have been kinda crazy the last week.
On Thursday & Friday, Sachin was visiting from BNL. He gave his talk on Thursday, and on Friday we worked with him in the lab to get his front-end board connected up with our setup. The board seemed to be drawing more power than Sachin expected (over an amp) and we were not quite sure why. Ray found an old ByteBlaster cable and we managed to load the firmware through Quartus on Mike's PC (which had a parallel port). But the board's USB driver wasn't compatible with Vista, so to communicate we used XP under VirtualBox on one of the Macs instead. We had some trouble getting the USB device to be recognized there at first too, but Juan came in & helped to resolve it. The team got to the point where they were able to remotely access the board's control registers through the LabView interface, but it turned out that the USB port under VirtualBox wasn't fast enough to handle the desired rate of data streaming. So we will probably have to buy a dedicated XP box for purposes of interfacing to the board.
This week, Mike was out Mon.-Wed. due to an eye surgery. On Wednesday, Gordon emailed that he would probably be unable to do the graphs in Scilab on time, and that Mike should probably just do them instead. Mike replied asking if Gordon could reformat the latest raw data files at least.
Last week I checked the paper deadline for the SPIE Orlando symposium; it is March 28th, so we have a few weeks still, so we'll probably be OK.
Wednesday, February 16, 2011
Things are Going Smoothly Today (Knock on Wood)
I picked up the two NI softwarez ("LabView Full Development System") from my mailbox yesterday, and brought them into the lab with me today. Perhaps I will go ahead and install one of them on the Acer PC (don't want to install on COSMICi because I'm in the middle of data collection, and the install may choke it off or require a reboot). Did receiving on the NI stuffs. Did activation of one of the units, & installing it now on the Acer. Gave the other unit to Ray for him to give to Eliot for physics dept. use.
Also tried to do budget check on XMath, but it isn't working for some reason - try again next week.
The GPS time data collection run I started on Monday is still going strong. It seems I definitely solved the crashing problem, alright! Although I have been making some firmware changes that I want to test, I think instead I will let this run keep going through the rest of this week and this weekend. That way we will have a nice, full week-long run (our first) - a good candidate for inclusion in the paper.
Gordon is working on the graphs from home this week based on my instructions. I emailed him with some additional info on status of recent runs.
Another thing I can work on today is adding capability to the server to work with the data from the GPS time app. Basically, the generic sensor node model needs to be subclassed to create the CTU_GPSapp node type. The UART BridgeServer needs to have a message handler added to it to parse input lines, and look for the $NODE_TYPE message. When it sees this message, if the type is CTU_GPSAPP, it should turn the generic node instance into a CTU_GPSapp subclass instance, and pass subsequent lines to this object. This object will maintain internal data structures needed to store the needed time data & various derived quantities in a useful form, and will support queries (from other server modules) to look up the best estimate of the absolute real time (together with an uncertainty estimate) for a given OCXO-based time (specified with precision down to a hundredth of a cycle, or 1 ns) relative to the start of the run. It will also pop up and maintain a TkInter window for displaying its state, including, say, warning lights for various GPS conditions, real-time graphs of cumulative relative phase wander & frequency over the run (maybe zoomable and scrollable!), uncertainty estimates from TRAIM (& also combined with OCXO phase uncertainty using the quadrature rules), # of satellites / good satellites, etc. This window could also have buttons for remotely commanding the CTU, although this might be dangerous - we might not want to make it *too* easy to interrupt a run in case a casual user tinkers with the controls. Probably best to have a "lock-down mode" for all sensitive controls.
Also tried to do budget check on XMath, but it isn't working for some reason - try again next week.
The GPS time data collection run I started on Monday is still going strong. It seems I definitely solved the crashing problem, alright! Although I have been making some firmware changes that I want to test, I think instead I will let this run keep going through the rest of this week and this weekend. That way we will have a nice, full week-long run (our first) - a good candidate for inclusion in the paper.
Gordon is working on the graphs from home this week based on my instructions. I emailed him with some additional info on status of recent runs.
Another thing I can work on today is adding capability to the server to work with the data from the GPS time app. Basically, the generic sensor node model needs to be subclassed to create the CTU_GPSapp node type. The UART BridgeServer needs to have a message handler added to it to parse input lines, and look for the $NODE_TYPE message. When it sees this message, if the type is CTU_GPSAPP, it should turn the generic node instance into a CTU_GPSapp subclass instance, and pass subsequent lines to this object. This object will maintain internal data structures needed to store the needed time data & various derived quantities in a useful form, and will support queries (from other server modules) to look up the best estimate of the absolute real time (together with an uncertainty estimate) for a given OCXO-based time (specified with precision down to a hundredth of a cycle, or 1 ns) relative to the start of the run. It will also pop up and maintain a TkInter window for displaying its state, including, say, warning lights for various GPS conditions, real-time graphs of cumulative relative phase wander & frequency over the run (maybe zoomable and scrollable!), uncertainty estimates from TRAIM (& also combined with OCXO phase uncertainty using the quadrature rules), # of satellites / good satellites, etc. This window could also have buttons for remotely commanding the CTU, although this might be dangerous - we might not want to make it *too* easy to interrupt a run in case a casual user tinkers with the controls. Probably best to have a "lock-down mode" for all sensitive controls.
Monday, February 14, 2011
Monday Blues
Arrived at work on Monday hoping to find a nice 3-day log file collected over the weekend. However, apparently for some reason the server rebooted itself early Sunday morning (about 3 am, possible due to automatic OS updates), causing all log file collection to halt (and in turn causing the WiFi board & Nios app to hang). So we only ended up with a 2-day data file. Still, it's better than nothing, and is probably still worth running through processing, at least Gordon's initial translation, to see if the frequency of serial glitches has gone down. The data file is "node0.uart - Copy (13)cut.trnscr" in the Server Code folder. (I had to manually cut out some short sections left over from earlier runs from last week's debugging.)
Yep, I just checked, and the system was set up to automatically install updates every Sunday at 3 am, which involves automatic reboot. To prevent this from happening again, I turned off automatic installation of updates; the system should now just automatically download updates, and notify me when installation is needed.
Some other things to do today (or soon):
Yep, I just checked, and the system was set up to automatically install updates every Sunday at 3 am, which involves automatic reboot. To prevent this from happening again, I turned off automatic installation of updates; the system should now just automatically download updates, and notify me when installation is needed.
Some other things to do today (or soon):
- Check out that glitch with the PPSCNTR value skipping a beat that I found during data analysis.
- I examined the raw data file, and, sure enough, there was an extra spurious PPSCNTR event, approximately 1/3rd of way between two valid PPSCNTR events. Possibly due to noise on the PPSCNTR input, or some kind of very rare bug or glitch in the input-capture circuitry, or the Nios interrupt mechanism. We'll have to be careful to watch for these events. - Talk to Gordon about what analysis he needs to do in Scilab. He will be here between 1:30 and 2:30.
- Thinking about reconfiguring a few LEDs on the DE3 board to show the instantaneous status of the various data fields from the GPS NMEA messages that will tell us right away when we're not getting a good time signal. It's easier than staring at data records.
- Maybe also soon start doing the server-side processing of the incoming GPSAPP data. After the last round of analysis I have a clearer idea of how to handle this. For example, below is the relative OCXO frequency variation (at two different window sizes) during the 3.5-hour GPS time outage in the Copy(11) run. We can see that at the start of the outage, the frequency quickly wanders outside the range of both the usual +/- 0.1 Hz variations (seen for the 60s window size) and +/- 0.01 Hz variations (seen for the 600s window size). When this is seen to happen, or when the phase wanders unexpectedly far from where we'd expect it to be based on earlier data, we can "lock in" the present OCXO frequency based on (say) the average of the last 10 minutes (or last hour) before the frequency started to wander, and ignore the GPS times at least until TRAIM comes back online. Anyway, there are lots of options for detecting and patching over outages.
Sunday, February 13, 2011
Data Analysis
With the serial crashing problem fixed (apparently) by the end of the day Friday - big sigh of relief from Mike - I spent a few hours this weekend analyzing (in Excel) the Copy(11)First60K data file that Gordon prepared last week. I obtained some interesting results. Some notes:
- After PPM counter value 3183, there was a glitch where the PPM counter value rolled straight over to 3185 (skipping 3184) even though only 1 second had passed according to the OCXO counter as well as the PC clock. I decided there must have been a spurious PPM event, so I just adjusted all the subsequent PPM counter values down by 1 second. However, we should look back at the raw data file to see if we can piece together what happened in more detail.
- There are still occasional temporary glitches in the counter value. Some of these are correlated with TRAIM alarms, so may represent temporary phase shifts in the GPS unit's internal clock, as opposed to say timing problems in our counter register latching. But not all of the glitches are obviously ascribable to this.
- After initial warm-up of the OCXO, during the 1st 8 hours, there were a number of points at which the relative frequencies shifted slightly. It is not yet certain whether this was due to adjustments of the OCXO frequency (e.g., due to environmental temperature variations), or due to adjustments of the frequency of the clock internal to the GPS unit to track changes in the idea of the current time inferred from the satellite data. However, the magnitude of the frequency variations seen after warm-up (based on average frequencies measured over 10-minute windows) is within only about +/- 0.01 Hz of the long-term mean (see plot below). Given the 10 MHz base OCXO clock frequency, this amounts to 1 ppb variation - even less than the OCXO's specified frequency stability figure. The timescale of these variations is about 8 minutes (500 secs) or so. Or, another way to look at it is this: Between two different 600-second windows, the number of 100-ns OCXO pulses seen can vary by about +/- 5 or 1/2 microseconds' worth. Or, with the 1 ppb variation, we can lose or gain about 1 ns per sec by just using the OCXO clock when the GPS is unavailable. In a minute, we could gain or lose 60 ns. In an hour, we could gain or lose 3.6 microseconds, or 3,600 feet of positional accuracy. Thus, we should probably not rely on the OCXO clock for periods of more than about an hour or so when the GPS is not available. But shorter outages we can interpolate over effectively.
- Between about 45,780 and 58,300 seconds, there was a period where the relative clock frequencies suddenly shifted, the phases drifted apart for a while, then the relative frequencies suddenly went back to normal, but with the phases off by about 450 cycles or so (45 microseconds) then the relative phase suddenly snapped back to about where it would have been if the anomaly had not occurred (see plot below). Looking at the data, the period when the phases were off coincided with a period in which time lock had presumably been lost, because the self-reported TRAIM accuracy was zero. The phase snapped back when the GPS lock was re-acquired. This indicates that this phase drift was due to phase drift of the GPS unit's internal clock, rather than to OCXO frequency drift. In other words, during this approximately 3.5-hour long interval, more accurate times could have been obtained by extrapolating based on the calibrated OCXO frequency, than by using the PPS pulses from the GPS. If the OCXO alone was used during this period, accuracies would have been within a few microseconds, certainly less than 10. Whereas, during this period, the PPS times were off around 500 cycles (50 microseconds).
Friday, February 11, 2011
Wi-Fi board still crashing!
Well, looks like the Wi-Fi board crashed again during my last run, after only running for less than an hour (about 3,000 seconds).
The strange thing is, when this happens (this last time, at least) the board rebooted itself and opened new server connections... But by then, the DE3 board firmware was totally hung up (probably in the serial library).
Now, I could try to deal with this by adding capabilities to detect the serial peer going down, and try to buffer up data so that we can stream it out when the connection is re-established.... But (a) that's a lot of extra complexity, and (b) the Wi-Fi board shouldn't be rebooting itself in the first place.
What else can I try? I can't put the Wi-Fi board inside a shielded enclosure (in case RF noise is causing the problem), because that would block the Wi-Fi signal, since it uses an embedded antenna.
Perhaps it is cosmic rays causing the crashes? But it will be hard to block those, too...
The mystifying thing, though, is that this spontaneous rebooting never happened before, till just recently, which makes me think that some change to the Wi-Fi script is triggering it... I added a little bit of stuff to the script (to handle the new pass-thru commands to the CTU), but I doubled the main stack size in case it was overflowing, and that didn't help with the problem... And the auxilliary stacks should already be plenty big enough (1,000 entries).
Here are the stack sizes I am using now:
Aha! Some insights gained from watching the RX (data in) and RTS (flow control out) signals carefully on the scope:
This all makes sense, because this crashing problem started after I turned down the baud rate of the GPS->Nios connection, which resulted in more (& larger) gaps in the echoed data stream from the Nios->WiFi. The potential problem always existed, but it only began manifesting after these big gaps became present, since they allowed the possibility that the RTS might deassert at about the same time that the next data burst was starting.
This raises several alternative possibilities regarding now to proceed:
Emailed EZURiO to report this apparent firmware bug, so hopefully they can fix it in a future version of the firmware.
The strange thing is, when this happens (this last time, at least) the board rebooted itself and opened new server connections... But by then, the DE3 board firmware was totally hung up (probably in the serial library).
Now, I could try to deal with this by adding capabilities to detect the serial peer going down, and try to buffer up data so that we can stream it out when the connection is re-established.... But (a) that's a lot of extra complexity, and (b) the Wi-Fi board shouldn't be rebooting itself in the first place.
What else can I try? I can't put the Wi-Fi board inside a shielded enclosure (in case RF noise is causing the problem), because that would block the Wi-Fi signal, since it uses an embedded antenna.
Perhaps it is cosmic rays causing the crashes? But it will be hard to block those, too...
The mystifying thing, though, is that this spontaneous rebooting never happened before, till just recently, which makes me think that some change to the Wi-Fi script is triggering it... I added a little bit of stuff to the script (to handle the new pass-thru commands to the CTU), but I doubled the main stack size in case it was overflowing, and that didn't help with the problem... And the auxilliary stacks should already be plenty big enough (1,000 entries).
Here are the stack sizes I am using now:
- AT+SET 42="256" - Program counter stack.
- AT+SET 40="1000" - Space for simple variable stack frames.
- AT+SET 41="1000" - Space for complex variable stack frames.
Aha! Some insights gained from watching the RX (data in) and RTS (flow control out) signals carefully on the scope:
- RTS is deasserted (raised) briefly after a fixed time delay after the end of a transmission; this is consistent with the EZURiO docs (this delay is set by the _UARTRCVTMO() function, and defaults to 255*4 = 1020 bit periods).
- It stays high for an amount of time that varies somewhat (possibly because of other threads) but is (normally) at least a certain minimum amount of time. This is also consistent with the EZURiO docs; this delay is set by the _UARTSLEEPCOUNT() function, and defaults to 400 bit periods.
- While RTS remains high (deasserted), usually the Nios UART core does not send any data - indicating that it is indeed paying attention to this signal. It waits until RTS goes low (is asserted) before sending data.
- However, occasionally (I saw this at least once) the Nios UART will have already started a transmission when RTS goes high, but does at least manage to turn it off shortly before the end of the sleep count.
- Finally, one time I happened to see the RTS glitch high for an extremely brief interval (possibly as small as one bit period) while the Nios UART was sending, and by the next second, the module had crashed.
This all makes sense, because this crashing problem started after I turned down the baud rate of the GPS->Nios connection, which resulted in more (& larger) gaps in the echoed data stream from the Nios->WiFi. The potential problem always existed, but it only began manifesting after these big gaps became present, since they allowed the possibility that the RTS might deassert at about the same time that the next data burst was starting.
This raises several alternative possibilities regarding now to proceed:
- Try turning down the baud rate of the Nios --> WiFi connection to match the rate (56,700) of the other connection - this should reduce the gaps in the data stream during which the RTS may possibly be raised.
- Try turning up the _UartSleepCount(), giving the Nios more time to respond to the raised RTS by halting the data flow. (However, not knowing exactly how the Wi-Fi board's receive buffer is working, I am uncertain whether this would really solve the problem.)
- Try turning down the _UartRcvTmo(), so that the RTS pulses will happen sooner, and hopefully not be as likely to overlap with the start of the next transmission burst. However, this seems like an unreliable method, and it may in fact lead to more RTS pulses (since more of the transmission gaps will be large enough), and more problems.
Emailed EZURiO to report this apparent firmware bug, so hopefully they can fix it in a future version of the firmware.
Wednesday, February 9, 2011
Crashing Woes
Arrived at work hoping to see a nice fresh two-day long data file, only to find that the run I started shortly before leaving on Monday crashed after only 10 minutes into the run.
I'm not sure what happened... Everything went smoothly for about 10 minutes (Mon. 18:11 to 18:22) and then the Wi-Fi board restarted, and the data flow from the FPGA board got hung up. The Wi-Fi board produced regular heartbeats after its restart, but no more data was received from the FPGA board - it was stuck in its 'hung' state. The Wi-Fi board started to respond to a typed command ('help') but then it locked up (but kept producing heartbeats!)
I clearly need to improve the firmware to detect serial data stream lockups and recover from them more gracefully. I also need to try to figure out why the WiFi board spontaneously rebooted, and why it got hung up responding to a command afterwards. One thing that would help would be to turn up the network debug level.
I'm not sure what happened... Everything went smoothly for about 10 minutes (Mon. 18:11 to 18:22) and then the Wi-Fi board restarted, and the data flow from the FPGA board got hung up. The Wi-Fi board produced regular heartbeats after its restart, but no more data was received from the FPGA board - it was stuck in its 'hung' state. The Wi-Fi board started to respond to a typed command ('help') but then it locked up (but kept producing heartbeats!)
I clearly need to improve the firmware to detect serial data stream lockups and recover from them more gracefully. I also need to try to figure out why the WiFi board spontaneously rebooted, and why it got hung up responding to a command afterwards. One thing that would help would be to turn up the network debug level.
- It's also possible that the firmware hung in part because of diagnostic messages the Wi-Fi board sent to STDOUT on startup which it didn't understand. Turning off PRINT_NOW flag in debugmodes.uwi to suppress that output. Oops, I mean, turning on NO_PRINT. Hm, but deferred output would be nicer... Working on getting that working...
- Another spontaneous reboot! And no clues in the log file... maybe I need to turn on full network debugging...
- Interesting thought: The Nios firmware could be programmed to initialize the Wi-Fi board appropriately... Except that it might have trouble executing cold-boots (power cycles) that are occasionally needed.
- Ended up spending most of the day fiddling around with the Wi-Fi script, with the goal of getting network debugging turned on, and startup diagnostics to STDOUT turned off. Finally got there at the end of the day, about 7 pm. But for some reason, all 4 GPS satellites currently in range are generating TRAIM alarms. I had to reset the GPS to get the PPS going again. I wonder if it will just take the TRAIM algorithm a while to lock in. Anyway, I guess I will leave it running & check out its progress when I get in on Friday.
Monday, February 7, 2011
Just another manic Monday...
Cloudy today, makes me feel like napping, despite the "5-hour energy" I chugged in the car on the way here. Oh well, plenty of time for that after I get home tonight.
Over the weekend, I thought of another thing to do on the GPS app / server... Have the app generate a message "$NODE_TYPE,CTU_GPSAPP" or some such on startup, to tell the server what kind of node it is (in case we revert back to having the sensor nodes go through the EZURiO board / Python server interface at a later time).
Gordon says he's coming in later to re-run his data reformatter on the latest (post the recent serial comm tweaks) data file. That will tell us if I managed to cut down on the glitch frequency yet. Hoping to now get well under 1 glitch per MB.
Oh, at some point I wanted to try again to burn my design into Flash/EEPROM on the DE3 board, so that it will load automatically on board power-up. There are instructions for this in the manual but I didn't quite get it working before.
A thought for the future: Extend our comm protocol so that the server requests re-send of any messages that get garbled. Lotsa work though and maybe not worth it for occasional short glitches. & it's like reinventing TCP...
OK, here's what I actually did today, so far:
Over the weekend, I thought of another thing to do on the GPS app / server... Have the app generate a message "$NODE_TYPE,CTU_GPSAPP" or some such on startup, to tell the server what kind of node it is (in case we revert back to having the sensor nodes go through the EZURiO board / Python server interface at a later time).
Gordon says he's coming in later to re-run his data reformatter on the latest (post the recent serial comm tweaks) data file. That will tell us if I managed to cut down on the glitch frequency yet. Hoping to now get well under 1 glitch per MB.
Oh, at some point I wanted to try again to burn my design into Flash/EEPROM on the DE3 board, so that it will load automatically on board power-up. There are instructions for this in the manual but I didn't quite get it working before.
A thought for the future: Extend our comm protocol so that the server requests re-send of any messages that get garbled. Lotsa work though and maybe not worth it for occasional short glitches. & it's like reinventing TCP...
OK, here's what I actually did today, so far:
- Added $NODE_TYPE message output at startup.
- Added MUTE/UNMUTE commands to GPS app (& Wi-Fi pass-thru script) to allow remote control of GPS data stream.
- Ground wire came off the hand-wired serial port on the Wi-Fi board - had to manually reconnect it.
- Added INIT_GPS command to GPS app (& autorun script) to allow remote reinitialization of GPS in case it needed to be reset.
- Having weird problems with Wi-Fi boards...
- Had some problems with EZUriO module I was using - repeatedly unable to load latest autorun.uwc. Trying another module.
Thursday, February 3, 2011
Remote Command Success
I have now instrumented the MainServer and BridgeServer to accept input text lines from the terminal and send them to the remote node! That turned out to be just a few more lines of code, once the basic input capability has already been added to TikiTerm.
Here is a screenshot showing the effects of typing the "trefoil" and "help" commands in the AUXIO BridgeServer window that connects to a remote node via WiFi. I also streamed a file to the serial port via UwTerminal, and the file appears in the UART bridge window as expected.
Oh, and we can also type commands to the remote node in the UART and MAIN server windows as well (not shown in this screen cap).
What next? There are a few possibilities:
- Currently, commands typed in the AUXIO window appear 3 times: (1) Once in yellow to acknowledge that the line has been entered into the terminal window. (2) Once in green to acknowledge that the line has been sent out over the TCP connection. (3) Once in blue to acknowledge that the line has been received and echoed back to AUXOUT by the remote Wi-Fi mote. This is perhaps overkill, perhaps I should get rid of one or more of these. Oh, and if I type the command in the MAIN or UART windows, (1)&(2) appear there and (3) appears in AUXIO.
- Presently, the main server console window also accepts commands, but no useful interactive user commands have been defined yet. Perhaps I should think of some?
- Start working on the server code to do useful processing of real-time data from the GPS app? There is substantial design work still to do on this first.
- There's always the need for improved failure recovery in the WiFi script. Like, if the server goes down and then comes back up, we should automatically reconnect. That might take a major redesign though.
- Also, in the server, I need to work on a clean-exit capability. Like, when you close the console window or type an "exit" command, everything should get shut down cleanly, connections closed, windows closed, etc. This does not happen currently (unless you run with an MS-DOS console window, and close it), and in fact I have problems with zombie threads sticking around & preventing the python process from exiting. Need some way to force them to DIE, the bastards. It's tough when they are busy listening for ports or input data or whatever. I guess I have to program them to wake up periodically and check a die flag, or some such. Too bad I can't just force-kill them all from the main thread. I should probably check some Python forums and see if there is any way to do that yet. Hm, maybe a system call to kill the process.
- One more little To-Do: Replace Python's newline() with a version that handles line-end characters in the way that I want. This will solve the problem where lines entered on UwTerminal to send via the serial input to the Wi-Fi board don't get displayed on the server until the next character is sent (which is happening because Python's readline(), when in universal mode, insists on waiting till the next character after the \r so it can figure out whether this is a lone \r or an \r\n sequence - even though there's really no need to do this prior to translating the \r to \n and returning the line to the caller.
Wednesday, February 2, 2011
Line Input Struggles
Spent a while today trying to add interactive command capability to the CosmicIServer class via a StreamLineConnection tied to the STDIO streams that are redirected to the TikiTerm console. It is coded but not working yet. At the moment, I'm stuck on an error where Python is complaining that a "h.handle(msg)" method call doesn't match the "def handle(inst, msg)" method prototype in the Command_ReqHandler because the number of arguments doesn't match, but it DOES, dammit! It's baffling because those particular lines of codes aren't even new, and were working fine before, for commands sent via the MainServer connection.
Aha, I fixed that bug now (I was adding the handler CLASS instead of a handler INSTANCE to the handler list - although possibly I should change the interface so we can just supply a class instead), but now I am having a different problem, having to do with getting the code in one module to access global variables that are initialized in another module that itself uses the first module. I've always had trouble getting this sort of thing to work correctly. Oh wait, I just had an idea about how to fix it. I can dynamically add a copy of the object reference as an attribute of the second module after the object gets initialized in the first module... Hooray, it worked!!
Here's a screenshot showing the server response to an input command line. As expected, we get an error message indicating that the command is unknown. I haven't yet defined any server commands that would be particularly useful in interactive mode!
On Tap for Today
What's up today:
- Hoping that Gordon will finish up his Java program today (either here in lab or from home) so that we can process the latest data file and have the lines with errors properly skipped, or at least marked.
- Meanwhile, there are at least a couple of things I can work on:
- Submit requisition for XMath. I did get the email quote from Ray already. CORRECTION: It looks like I don't have the official quote yet. Emailed Ray to ask if he got it. Also texted him. UPDATE: Ray said to go ahead and use the price from the old quote, but with the new model number. Req is submitted now & awaiting approval.
- Finish installing SciLab on my workstation. Start reading docs to figure out how to read in & process data file & graph interesting results. UPDATE: Scilab is up & running, started looking at docs.
- Utilize TikiTerm's new input capability throughout the rest of Central Server app. This includes using this capability in at least 3 places:
* Use it from COSMICi console window to accept direct user commands to server.
* Use it from each node's main connection window to send user commands to the node.
* Use it from each bridge connection window to send user commands to the node.
Tuesday, February 1, 2011
User Input in TikiTerm
Tonight I added user text-input capability to the TikiTerm widget. Note that after the input lines have been entered, they turn yellow and move above the purple output/input separator, and become interspersed with lines of output. The text entered gets added to an input buffer which asynchronous reader threads can consume lines from. (Adding stuff to the input buffer has been tested, but reading from it hasn't been tested yet.) Also, the application's stdin stream is redirected to the input buffer on startup.
Next, I need to create a worker thread that just retrieves lines from stdin (which now should come from this main COSMICi server console TikiTerm window), and sends them to the server's command handler. Hm, right now, the command handler is basically assuming it's getting commands in the form of Message objects from a LineConnection accepted by a LineCommunicator TCP server. That isn't the case for STDIO, so maybe what's called for here is to create a special subclass of LineConnection called something like StreamLineConnection, which does its I/O via an input/output pair of ordinary (quasi-TextIOBase compatible) streams, like what TikiTerm is providing us, rather than via a TCP socket. StreamLineConnection would also take over some of the functionality of LineCommReqHandler, but not all, since this type of connection isn't being created to service a connection request that came in over the network, so the whole concept of a request handler doesn't really apply in this case.
Then we need to modify MainServer and related classes (in mainserver.py) to set up a thread to read lines from the TikiTerm input buffer and bundle them up as Message objects & send them back over the main server connection from the node.
Once that's all done, we next need to modify BridgeServer and related classes (in bridge.py) so that they tie the TikiTerm input buffer back to the network bridge connection for those connections as well.
This way, the user will have the maximum number of ways to type commands to each remote node. Any of the node's active server connections (whatever subset of MAIN, AUXIO, and/or UART) will then be able to accept typed user commands. This ability will come in handy in cases where not all of these connections are able to be opened, for whatever reason. The script on my WiFi mote already is programmed to be able to accept & process input commands from any of these sources.
Subscribe to:
Posts (Atom)