Thursday, July 16, 2009

Thursday - Group Collaboration

Thursday's events were primarily focused on a team-exercise that was intended to showcase our ability to use the technologies we had learned about during the previous two weeks. This opportunity was not so much given for ourselves as for the staff to see where the individuals had learned from the presentations, and perhaps to offer feedback for the next cycle of the ISSGC.

Having said that, I'll mention briefly that the students had been organized as one team for the scavenger hunt, but that due to some unintentional "weighting" of my original team, the teams were restructured. Our original team was solely OSG participants, and as a team we thought this was unfair to the spirit of the competition, so we made sure to mention it to the staff. Therefore, the new teams were given to us almost just-in-time for this competition. Nevertheless, we each as appropriate bandied together as a team and set off to work when the assignment was given to us.

So let me set up the "what happened here" so everyone can get a glimpse. The morning of the competition we gathered together in the main auditorium and David set the stage for what we were about to be given. We had approximately 24 hours to complete the task, which was to find each of six "pillars" focused around a given technology. Each pillar could only be found with the technology in question, so for instance, the gLite pillar could not be found with the Condor toolkit.

Each pillar in question was arranged in a 2D space on a Cartesian grid of -10,000 to 10,000 on each axis. Each pillar then contained a plaque that would contain a word, and once the word was found, it was to be keyed into the scoring system along with the coordinates where it was found. The executable to be used to find each word was a pre-compiled jar given to us by the project leads and listed on the appropriate technologies page. Some pretense was made that the experiment was in a 3D space, but I've done the exercise, it was 2D only.

Initially the students were given a database to write an app to retrieve values from, and those values were to be clues to help students find some initial pillars. Most of the teams got those clues pretty quickly, and were off to the next part; find the pillars themselves.

Ok, so now I'm going to drop out of narrative and I'm going to get into "Cole's point of view" so this will be totally subjective and might even upset people. Guess what, it's a blog. If you don't like it, I don't know what to tell you. But consider yourself warned. I was not impressed, so this is likely to be somewhat inflammatory.

The point of the project is to find the pillars by searching a region of "space", and once the pillar is found, dig down till you find the plaque. The way it should work was like this:

java jar x1 y1 x2 y2 scale

This generates a region (think piece of graph paper) with the number of cells equal to (x2-x1)*(y2-y1)*(1/scale) so if you chose

-10 -10 10 10 1 you would get a region with 400 cells

and then you get back one of the three results:
These are not the droids you are looking for, move along (ok, humor aside: no pillar here)
Found something interesting, can't see it
Hey, writing! (and then some one or more letters)

However, the results were stored in one of 5 files (for each pillar) with the location of each pillar (top left) and the height and width of each pillar. Lastly was the text.

To examine the pillar, the jar would read that data (every time, and as I'll get to in a moment, that's a lot of reads, so a lot of disc-IO) and then compute some random noise to fill a grid the size that was specified in the inputs. The problem is that it would also sometimes fill the cell in question with noise, so given the same inputs to the same technology, it was possible to get different results (trust me, I repeated it, I got different results for the same inputs). Now, do I expect every program to always give me perfect results? No. But if I'm doing statistical analysis against a rowset returned from a database or if I'm running a grep for a string, I don't expect to get different results everytime, provided the inputs are consistent and provided there's no room for network errors. I'll accept that those may be flawed, and that that could be the problem. But for something this simple (I decompiled the JAR, I know what the code did), I would've expected consistent results, which I didn't get.

Given a perfect world, a person could take an example like the above, make the sized chunks reasonably sized, and then look to see if a pillar was found in that region. If not, expand the region slightly, and search again. If so, then the difference region was where a pillar could be found. If not, repeat and test again This is basic geometry.

However, and this is where I got disillusioned, the students would form one large region with one really small scale and they would then send a batch job with those inputs off to be farmed against the data. And they would do that with 800 batch jobs or more at one time. For each team on each technology.

Let me repeat that if I may. 5 teams submitted 5 technologies worth of jobs at 800+ runs per batch against a server, creating 100's of thousands of jobs, and each job had one output file for stdin, stdout and stderr. Therefore, they were creating 300's of thousands of output files, plus the inputs to run them in the first place, so 400's of thousands of very small files were created on systems that don't do well with lots of small files in one directory. So many jobs were submitted individually that the submission system effectively underwent a thread-fork, and the system also opened so many connections to the test server that the system had to be rebooted.

This was not what we were taught to do at the school, and this was not a reasonable way to use the system. No-one in the real world would submit 30,000 live job submissions into a queue creating over 100,000 files in a single directory, just to look for a bit of info in a database. Of course, the database's don't normally return garbage data on multiple runs of the same inputs. They may occasionally, and that's fine, but that's where you test three times, look for two consistent answers, and then accept that answer.

So, let's say that I'm the one out of line. Let's consider: The test systems were setup with no limitations (30k+ threads active at one time was not considered) even tho live systems would have those same limitations, the student's didn't consider what the problem was, they went for brute force, and then there's the matter of the queues.

The system was setup so that gLite, Globus and Unicore (IIRC) were all setup to use a single PBS scheduler. Three toolkits were competing for one hardware resource, with thousands of submissions against each queue. In a real-world environment, those queues wouldn't have been squashed with those numbers of requests, as gatekeeper software would've prevented those types of rapid fire submissions, and the meta-schedulers would've sent different requests off to different PBS queues in the first place.

So even if it were a valid real-world experience, nobody would submit a few hundred thousand jobs against two simple queues on two small clusters, that would've been submitted to the grid.

But also remember that each job had to read from the same set of data files, but there were five of them, so I'm sure either the file got cached into memory by the filesystem IO controller (plausible but unlikely) or the disks got thrashed. I'll never know for sure. But triggering a few hundred thousand reads against five files is not a sane activity. Triggering a few thousand reads against a few hundred files is sane. Does anyone see what I mean?

So, while I was quite pleased with the assignment, and while it would've been fun for group collaboration, most of the participants did not seem to understand what the goal was, nor how to attack it.

I personally did it the way it should've been done, and saw that it took just a few submissions to get results, and I watched Ben Clifford (one of the moderators) find his first pillar in about three minutes without brute force approaches, but when I spoke with many of the students, they didn't see where an elegant bounding approach was the best way to do it. They felt that submitting 100's of thousands of brute force attempts was sufficient.

Because of this, and because there weren't sufficient technological safeguards put into place, and because the code returned inconsistent results on the same inputs (and no, don't ask me now, it's been too many days since I did it, but I did bitch then and nobody asked me for those verification inputs) - because of those reasons, I was disillusioned with the experiment.

Now, having said ALL of that, let me finish my post with this. I learned a lot more that day than you might think. I am very very grateful to the organizers for setting that up. I had a lot of fun working with the advisers to solve the problems. And I'm glad I got the chance to play with real tech on a reasonable problem.

Ok, where do I need to clarify my points? Feedback people, feedback!

No comments:

Post a Comment