The Cosmic Inquirer: Mon., Mar. 19th

David is here. The new hard drive came in and David installed it. Now we just need to find the Windows 7 boot CD. Ray has one at home - asked him to bring it in.

Juan is going to be here Wednesday instead of today due to a conflict with his RA job.

Aarmondas & Samad came in. Samad had his PCB fabbed by Donte, but there were some fabrication problems so Donte is going to redo it. Aarmondas is experimenting with compiles. David is looking through Altera docs trying to find new things to try.

I heard the guys did some more work last Wednesday, but did not yet get the FEDM input-capture working at 500 MHz. David is checking the group blog. Juan says he tried changing the fitter placement effort but it only helped a little. He said he didn't take out the extra pulse-cap instance yet.

The last test compile in Q:\ got 334.9 MHz. Or no, that was with the high-speed components only? Not sure; need to recompile.

Let's try:

Take out 6th pulse-cap from the pulseform-cap module.
Turn off "optimize multi-corner timing" (we only care about hot corner (worst-case) for now).
Change fitter placement effort multiplier from 2.0 to 4.0.
Change fitter routing effort multiplier from 2.0 to 1.0. (These are the settings that worked best for the CTU app.)
Router timing optimization level: Leave at Maximum.

OK, first I merged all the high-speed parts, and am compiling them from source with Top set to Empty.

Then (if that meets the speed target), plan is to unmerge them, logic-lock them, set them to compile from post-fit netlist (strict with placement/routing preservation), and set top to Source and recompile.

OK, we got 577.03 MHz for the high-speed stuff. However, I'm not sure if this approach will really work - splitting the merged partition looks like it might be destroying the post-fit results. If this doesn't work, I'll try it again and next time I won't split it; will instead add all the instances to LogicLock manually.

Another thing to try is turning down the aggressiveness of the compile for the non-time-critical parts. It's possible that when it's aggressively optimizing those, it encroaches on the time-critical parts.

Interrupting the current compile b/c it seems to be stuck in the middle of analysis & synthesis - it's been at about 50% for 20 minutes.

OK, compiling just the high-speed stuff as a merged partition again. Then I'll apply the LogicLock, manually using wildcards if I have to. Then add in the low-speed stuff with relaxed compile settings.

I'm learning how to use wildcards more cleverly (I think) so that there are fewer separate entries that need to be added in the LogicLock list.

For the slow-speed logic, I relaxed the overall fitter setting from "Standard" to "Auto," reduced placement effort multiplier to 1.0, and turned router timing optimization level to Standard.

I could also try turning off physical synthesis optimization for speed in the low-speed compiles.

Bleh, 295.95 MHz. Even worse than before!

Speed is currently 565.29 MHz for just the high-speed stuff. Let's add the slow-speed stuff in without turning down the optimization settings at all. Still got my high-speed stuff in LogicLock. Compiling now...

Argh, 295.95 MHz again!!!

Trying one more thing: Creating a reserved rectangular LogicLock region for the high-speed stuff. This way it won't even be able to try optimizing other stuff by changing stuff in the LogicLock region. So, hoping that might help. However, I'm not sure the design will even still fit if a rectangular region is reserved - there might not be enough cells left, it was a really tight fit before - or that the high-speed part will still be fast enough.

OK, that last worry was unfounded - speed in rectangular region is 567.21 MHz. It reserved a big region on the left side of the chip. Now let's try adding the slow-speed stuff, outside that reserved region.

Nope, we are 160 RAM cells short. I could try taking out the pipeline registers, there are 4x2x56 = 448 bits worth of those, so those might be enough to allow us to fit again. However, I'm not sure whether it will run fast enough without them. Worth a try though.

Another thing to try: The compile without the merge, move all the instances individually into LogicLock, then turn down optimization for compiling the rest.

For some reason, analysis & synthesis always seems to hang on this computer when all the high-speed instances are in separate partitions. (Something to do with parallelizing the compile on multiple CPUs, perhaps?) So anyway, I'll skip that idea for now. (It might work on another computer though.)

In the meantime, I'll try again taking out the pipeline registers at the front of the datapaths. First, from source with no logic-lock. Got 572.74 MHz. So those front pipeline registers were never really necessary!

Now, from source w. HS elements in root logic-lock region. Same speed.

Now, from source w. HS elements in their own logic-lock region (Region_0). 569.8 MHz. No, wait, that logic-lock region came out the wrong size for some reason. Redo so it's the right size. Now 582.41 MHz! Not sure what changed to get that.

Now, turn on Reserved, turn on post-fit (strict) for HS partition, switch Top partition from Empty to Source, and recompile...

Nope; still 160 RAM cells short! I'm surprised that taking out the pipeline registers didn't seem to help at all.

Let's go back to putting things in the Root logic-lock region, non-reserved, and see what we get (haven't tried that since taking the pipeline registers out). 582.41 MHz still. Adding low-speed parts & recompiling...

Another thing to try: Put all the high-speed logic in a single entity. It's possible that the instances in the logic-lock list are being preserved, but the routing between them isn't. Putting them all into a single entity that is itself logic-locked might help. This requires substantial reorganization of code though. Still, it's a lot easier than revamping our whole data representation. Probably makes sense as the next thing to try, if my current compile doesn't work.

304.23 MHz for the whole thing. Still no dice.

I think the next logical step is to try putting all the high-speed stuff into a single entity under the top-level schematic, which can be in its own partition, and logic-locking just that entity. It will have some humongous number of output ports, but no matter. This is what worked for me when I was doing the other app (CTU GPS app on the DE3).

That's a big job, so this is a good stopping point for today. Let the students work on it...

The Cosmic Inquirer

Monday, March 19, 2012

Mon., Mar. 19th

No comments:

Post a Comment