Inside SWT

Monday, July 31, 2006

Kids these days

The last conference I was at, I went to dinner with a professor from a well known university. Ever known someone who just wants to argue with you and make his point, no matter what you say? This guy was a macro expansion. I said one word and he expanded!

Anyways, this guy began frothing at the mouth when I suggested that learning Lisp, Prolog, Smalltalk and other computer languages was good for the students. Don't get me wrong, I wasn't suggesting that they dump C, HTML and Java (or whatever else will get you a job these days). My point was that these sorts of languages contain different concepts that open up your mind to new ideas and new ways of thinking. If you don't see these things in school, then where else will you see them? You are in danger of never learning what a closure is (Java sure don't have 'em) or really understanding recursion or backtracking. Where better to learn than in school, where you don't need to ship and bugs don't cost people money?

Alas, I failed. Kids these days are missing out. Sad but true.


Wednesday, July 26, 2006

"I just click around all day ..."

Fixing bug 125656 was a real challenge. You can go read the bug report, but the summary is that a certain combination of hover help windows and clicking caused the operating system (MacOS X 10.4) to lose track of top level shells, making them dead to mouse clicks. This problem was particularly hard to find because it was intermittent. Attempts to construct a simple case outside of Eclipse failed.

So how did we fix it? First of all, we investigate the obvious. Is the shell somehow disabled? Are the mouse events going somewhere else? Is there a mouse grab stuck? Is the shell somehow disposed but still drawing? Then we discovered that FindWindow, the MacOS X carbon call that is used to locate a window given a point on the screen, couldn't find our window. Calls to IsValidWindowPtr showed that the window was still alive. What could possibly be the problem?

To find out, we wrote a method that traversed the operating system window list and soon noticed that there were shells in SWT data structures that were not in the list. We called the method after every event was dispatched and traced all the operating system calls to find out when the bad thing happened. We did all this in Java, not C, which is one of the great things about SWT. No custom C code.

I almost forgot. When a problem is intermittent, you can't ever really know for sure that it's fixed. You have to rely on statistics. So Silenio spent two days on and off, narrowing down the mouse clicks until he could get the problem to happen within 5 minutes. During that time, I asked him what he was doing and he said, "I just click around all day ..."


Sunday, July 23, 2006

Closing 37683

We closed 37683. At times, this bug resembled a blog rather than a bug report. It was used by some to spread FUDge about the toolkit and that's bad. Fortunately, I don't have a sweet tooth.

Let's be clear. Some people were seeing problems and we worked hard to isolate them (the problems - not the people). When there was a bug we could recreate, we opened a new bug report and fixed it. We wrote example code that created the same window using GTK calls in C, then GTK calls in Java and finally in pure SWT code to see whether there were fundamental problems in the event loop or elsewhere in the toolkit. There weren't.

Some of the focus was on the speed of popup menus. Menus come up fast in SWT, but application programs like Eclipse do lots of domain specific work to enable or disable items and in some cases, build a new menu each time a menu is requested. If this work involves file i/o and threads, then performance numbers are variable.

Was this bug worth it? I'd have to say "yes". We found and fixed real problems and that makes me happy.


Saturday, July 08, 2006

Code == crap

Back in university, there were two kinds of students. For arguments sake, let's call them the cleans and the scruffs. The cleans were smart guys, interested in algorithms, computability and understanding that work could be done. The scruffs were smart too. They studied and understood the same sorts of things, but in the end, they just got in there and made things go. I'm a scruff.

As a scruff, I've come to the conclusion that code is crap. I hate code. Any code. Mine too. Code is slow. Code has bugs. Given two solutions to a problem, I'll pick the one with the fewest number of lines.

As a community in the broadest sense of the word (Microsoft included), we design these monstrous layered architectures, then wonder why our systems are slow. Want to know how to make something faster? Run less code. Do you care how a library is implemented? You shouldn't. Want me to write a bunch of code in the name of truth and beauty? I won't.

Ok, I'm a closet clean but don't tell anyone ... I like binary search.


Wednesday, July 05, 2006

How does anything work?

When I think about all the places SWT runs and the things we do to make it work, it blows my mind.

Windows flavours include Windows 98, CE, NT, 2000, ME, XP and soon Vista. The win32 API has two identical calls for everything, one for ANSI and one for Unicode. Windows 98 and ME support only ANSI, while CE supports only Unicode. Windows NT, 2000, XP and Vista support both. This means no single API is available on all Windows platforms. C programs are supposed to compile for one or the other. Instead, we build one set of jars and native libraries for both and call the right API on the fly. It's ugly, but it works.

GTK is version hell. We run all the way back to 2.0.6. Lots of new API has been added since then so we look them up dynamically because Linux won't load a shared library with unsatisfied references. Somehow, when an operating system API isn't there, we run alternate code or fail gracefully. Whew!

Mozilla is it's own nightmare. Until recently, rather than using the Mozilla that you had installed on your machine, expected you to ship a copy of Mozilla statically linked to your application. Somehow we don't do this, find the installed Mozilla and use it. We work with Mozilla versions as old as 1.2 all the way up to 1.8, which was the latest at the time this article was written. The Linux C++ binary format changed somewhere along the way, breaking us. We detect the right version dynamically and load the one of two identical libraries, compiled differently. Wow.

Advanced graphics on Linux uses the cairo graphics library but this doesn't ship with the current Linux distributions, so we compile and ship our own, just in case. On the fly, we detect whether you already have cairo installed and use that one. Otherwise, we use the one we compiled. It's magic.

The Macintosh comes with two complete and separate native toolkits, carbon and cocoa. We use carbon, the C based API. However, new features in the operating system are very often only available in cocoa. Cocoa is Objective C, not C. Safari, the Mac browser, is written in cocoa. So we call it, leading to wild clipping problems and all sorts of amazing interop issues that we work around. Did I mention that Macintosh controls didn't clip against each other until a few years ago so we had to roll our own sibling and parent clipping code?

AWT is a free threaded widget toolkit that comes with it's own event loop. On Windows, we hook into the event queue and cross post keystrokes. On Motif and GTK, life is easier, we implement the XEmbed protocol, which is a bunch of low level X windows calls. On the Mac, after being broken for ages, we got help from Apple. The Mac operating system has no concept of multiple event loops. Somehow, Apple found a way to make it work. The icing on the cake is that AWT is implemented in cocoa, not carbon.

On the Java side, we need 1.2 VM's or greater and run on J2SE and J2ME. If you recompile the native libraries, we'll run on 1.1.8.

My gawd.