Feb
28
2008
0

On operating systems

I was going to write a post about GSU’s Kafkaesque search for its “level of comfort”, but realized that I could actually say something positive and useful.

There is a recent discussion at the “coding horror” blog about Ruby and OSX vs. windows,  which reminds me of my efforts to get students to be comfortable with something other than windows (SOTW).  There are two central facts of life for software development (not quite computer science, but certainly computer engineering or software engineering – so there).

  1. Windows in some variation is the dominant by number operating system.
  2. There are (essentially) no windows supercomputers.

Since the research interests of my lab focus on questions of scientific computation and machine learning in a bioinformatics and systems biology context, the choice of operating system is somewhat irrelevant – except  when the operating system gets in the way of actually getting work done.

This turns out to be more of a problem with windows than with a Linux or Unix variant.  Windows used to be strictly Posix compliant which meant that you a chance – if you followed the standards and didn’t do anything weird- of writing portable software on a Windows machine.  The development versions of my molecular modeling software were written on a windows 3.1 machine using Borland’s IDE and trivially ported to other SOTW systems (and even to Microsoft’s own compilers).  The key was to avoid the MFC and other extensions that lock the software to a given operating system and to use object structure to encapsulate the dependence when unavoidable (as in a GUI).  (As an aside I learned this the hard way with FORTRAN on Vax/VMS where the Dec extensions insured no one else could use your program).   Microsoft has made this adherence to standards harder and harder with each release of visual C++.
This wouldn’t matter if computational performance were not a issue in my work (and in the real world for that matter).  Part of what we do is to explore different models for computation.  This means different hardware including heterogeneous and grid environments.  Part of what we do is large-scale computation, and this means supercomputers.  Neither of these are supported by the windows platform.

So it is necessary to be able to move between operating systems and to be conversant with the tools in them.

Just as it is critical with computer languages to be able to program in several.

It is an important educational goal to make sure that the students can use SOTW.  Not because of any inherent strength of any individual SOTW, or because of “software freedom”, but simply because no respectably educated computer scientist should just know one operating system.

Feb
25
2008
0

Another Trail Map

Still getting ready for Philmont with another hike Sunday afternoon.
red top mountain trail

Written by Rob in: trail map,Uncategorized |
Feb
20
2008
1

Exploring Distributed Progamming.

I’ve been digressing into management speak and similar platitudes lately. It’s time for some more meat.

One of the big issues in performance computing is the ability to effectively use multiple processors. My group has several projects where this is a critical issue – ranging from building a protein model server, to simulating large sets of coupled ODE’s for metabolomics and system biology, to using heuristic optimization algorithms (genetic, simulated annealing and particle swarm) for finding good approximations to minima and tuning potentials.

Standard approaches like MPI tend to be fragile and hardware dependent. Many implementations of MPI, for example, tend to put constraints into the software design. (It can be difficult to run a client that has a different name/source from the server in a standard client-server architecture). First and second generation message passing architectures were designed with an assumption of hardware and network reliability that leads to failures on real hardware. Pthreads is fine, but it is easy to tie up your program with mutexes.

So we’ve been exploring alternatives (think, if you like BOINC- -).

  1. Reactive Programming. We implement a set of network demons that wait for input, process it, and pass it on to the swarm. The fundamental assumption is that computer time is essentially free. This is somewhat tricky to program, and can have loops, but other than a demon crashing, it is very robust and easy to tune to loads. With a bit of work, we can make it catch demon and communication failures.
  2. Functional Programming. Functional languages like Erlang prevent side effects and therefore are easy to take into parallel. We’ve explored using Map/Reduce in Erlang to implement the solution of sets of ODE’s and found that it can be quite efficient (4 fold speed up on 4 processors with essentially no dependence on problem size), especially on symmetric multiprocessors and multicore machines.
  3. AdHoc approaches. MPICH can be implemented as a set of python programs that invoke UNIX calls like rsh and rexec. The same thing can be done for general problems.

The big headache with all these approaches is the effect of message overhead and delays. These tend to make performance worse that the classic Amdahl’s rule limit – and it is rather easy to show that in the limit of many processors and small task/processor they cause the program to take more time that the scalar program.

For example, take molecular dynamics. Molecular dynamics is the simulation of atomic and molecular motion under the assumption that a classical mechanics approximation is accurate. It is an example of a Markov chain where the chain transition is defined by the forces and velocities in Newton’s equations of motion. The slow step in molecular dynamics is the O(N^2) calculation of the nonbonded and electrostatic interactions. There are a number of fast algorithms for this (our program AMMP uses the fast multipole with amortization) that can speed it up and these can be implemented in parallel. However, we found that the simplest approach is to run multiple simulations. We get 100% speed up and more importantly – because multiple chains have statistical significance (a single Markov chain has a measure of size zero) we can use them to estimate when the simulation is sufficiently accurate to useful.

Written by Rob in: engineering |
Feb
18
2008
1

Leadership Issues

I was just at an Eagle court of honor for one of my sons’ troops and realized that there really is something to this ‘leadership’ business. So here are a few short thoughts in between grant writing and classes.

  • Vision – it is necessary to have a target, an aiming point or a vision of where you want to go. It is necessary to tell people, like your students or colleagues about it and help them to understand it. You may even need to incorporate their feedback in it. The vision then serves as a tool to help you decide what is correct. If your vision is that “the computer science department should have a first-rate research program and give a good education”, then policies can be judged by whether they further that vision or subtract from it.
  • Courage you need to have the courage, backbone or intestinal fortitude to stick with the vision and the policies which implement it. You cannot please everyone, but if you are following a rational plan that is based on a defined and agreed upon vision then while not everyone will like or agree with it, at least they will understand it.

It’s probably legitimate to ask what my vision is for my group.   It is simple: “Outstanding research in computational biology and bioinformatics targeted towards helping address serious health problems”

Written by Rob in: laboratory practice,pedagogy |
Feb
06
2008
0

Project Ownership

One of the trickiest things to do is to get your students to “take ownership” of their projects.  Typically their lives are full of distractions – classes, TA’ing, projects, family and social life – that actually getting on with dissertation work seems to take a back seat.    Too often they never latch on to an idea and run with it.  It is possible to bind a bunch of unrelated papers into a dissertation and get through the system, but this misses the major point of the Ph.D.  – learning how to formulate and pursue a research project.

Watching this ownership take place is one of the most rewarding experiences in teaching graduate students.  I’m not sure how to make this happen (and to a large extent it depends on the quality of the student), but here are a few pointers that sometimes work:

  • Communication – get the student to report regularly.  Even if they haven’t made progress, the fact that they are talking keeps things moving.  Very often, your ‘high-level’ examination of the problem can help them figure out what is not working.
  • Clear Chain of Command – (sounds military doesn’t it?)  The students have to know who to talk to first.  They must understand what they are doing, why they are doing it and who will assess their progress.  If a student, for whom you are not the adviser, comes to you for advice make sure that you communicate with the student’s adviser as well so that he knows what is going on.
  • Clear Goals – Students very often have the wrong reasons for doing things.  “Your group is friendly”, “SVM’s <insert your hot topic here> are what everyone else is using”, “There is a meeting in Hawaii”, “I want a job (there are easier ways to make a living than science)” … The only valid reasons are that of the form – “This is an interesting and important topic”, “I want to learn to work in this field”.  Good students will have the maturity to form clear and sufficient goals, and/or let you help them form clear goals.
  • Mutual Respect – Both the student and their adviser must respect each other.   You need to be able to seriously discuss disagreements, and both parties must be able to trust that the other side has a valid viewpoint.
  • Be willing to be forceful – this seems to run counter to the last point, but it is important to make it clear who is in charge.   Occasionally students ‘digress’ into non-academic pursuits or non-productive activities and they need to know that this isn’t acceptable.
  • Be Jealous of their Efforts – Many graduate classes are project-based and these can evolve into papers.  In its own right this isn’t bad, but the student isn’t learning how to formulate a problem if he/she is doing what the instructor says to do.  I think of these projects as vampires or leaches that subtract from the students efforts, and move them away from what should be the central focus of their lives – namely getting a coherent body of work finished and graduating.
Written by Rob in: pedagogy |
Feb
06
2008
1

Another trail map

Just a short post with another trail map, from preparing for a Philmont trip (http://www.scouting.org/philmont). As with everything needed for success in science practice and preparation are critical.path on stone mountain

Written by Rob in: trail map,Uncategorized |
Feb
02
2008
0

A neat tool

Plagiarism is an ongoing issue when you teach at a university. Recently a study was published in Nature documenting the use of etblast (http://invention.swmed.edu) to analyze the amounts of ‘problematic’ papers.

I immediately ran some of my more difficult former students’ papers through this. Fortunately, I might add, my best efforts to ensure that the writing and ideas were original were not in vain.

There are two points from this that will change the way I run my teaching and laboratory practice.

  1. From now on I will require electronic copies of class papers. I used to only want hard copy because of the potential of receiving word viruses and oddball settings for my word processor (Why do so many students set their word grammar checker to French? (really)). I’d love to require LaTeX, but that probably wouldn’t work.
  2. The tool can be turned around and used as a library research tool. Just as distant protein homologies share conserved small segments of sequence of sequence motifs (ETblast is derived from sequence alignment tools), logically related works will share common phrases and key words. Rather than using a somewhat arbitrary and restricted query based on what you think is important – using a large text segment from a paper or proposal with a dynamic programming tool will find significant matches about what really is important.

Neat tool.

Written by Rob in: laboratory practice,pedagogy |

Powered by WordPress | Aeros Theme | TheBuckmaker.com WordPress Themes