As stated in my previous post, our supercomputer uses GPUs to compute massive amounts of numbers quickly. Naturally we want to be able to use as much of this power as possible at the same time. Parallelism, doing many calculations at the same time, is how we can utilize this power. This is a fundamental way of working. To accomplish almost any large, complex job, the job must be subdivided so that many people can work on it at the same time. However, there has to be a manager delegating jobs to the workers. The SGE is the manager for our supercomputer. Jobs for the computing nodes are submitted to the SGE and the engine delegates those jobs to the appropriate nodes.
Parallel computing brings with it various downsides, such as task scheduling. Some computations require another computation to be done and some try to get the same information at the same time. SGE helps us out with scheduling jobs to take care of these problems.
This is the control panel for SGE and as you can see, from this panel we can submit and change jobs, configure a schedule, and control the queue. The queue is an important aspect of delegating jobs. The jobs are put in the queue, or a line, and then are executed in that order. SGE controls the queue for us and delegates jobs to be executed at the appropriate time. Overall, the SGE is a huge help for us as users by taking the load of dealing with all these jobs and delegating for us.
Until next time,