November 21, 2010 10:17 AM
The HPC is not really one computer; it’s three that are acting like one. The three computers function together as a cluster. You can think of a cluster as multiple computers working together as one computer. Clustering is the primary method for constructing supercomputers today, but it has its trade-offs. In order for all the computers to work together seamlessly and act like one logical computer, a lot of information has to be shared among the systems. Essentially, they each need the same files and user accounts. HPC is configured with a distribution of the Linux operating system known as Rocks that is specialized for cluster environments. When all goes according to plan, Rocks has tools that keep all the information shared between each node (computer) in the cluster.
We have been doing a lot of configuration work so far on the HPC: setting up user accounts, giving appropriate permissions, installing necessary programs, etc. Something went amiss and Zachary’s user account found its way to Limbo. We aren’t exactly sure how it happened, but best guess is that his user account was not properly synced throughout the cluster. Little anomalies such as this one kind of give an idea of the necessary complexity that comes with using a cluster.
Even though the HPC doesn’t like Zachary, we still think he’s an OK guy.
Till We Meet Again,
E-mail to a friend |
November 10, 2010 3:59 PM
As stated in my previous post, our supercomputer uses GPUs to compute massive amounts of numbers quickly. Naturally we want to be able to use as much of this power as possible at the same time. Parallelism, doing many calculations at the same time, is how we can utilize this power. This is a fundamental way of working. To accomplish almost any large, complex job, the job must be subdivided so that many people can work on it at the same time. However, there has to be a manager delegating jobs to the workers. The SGE is the manager for our supercomputer. Jobs for the computing nodes are submitted to the SGE and the engine delegates those jobs to the appropriate nodes.
Parallel computing brings with it various downsides, such as task scheduling. Some computations require another computation to be done and some try to get the same information at the same time. SGE helps us out with scheduling jobs to take care of these problems.
This is the control panel for SGE and as you can see, from this panel we can submit and change jobs, configure a schedule, and control the queue. The queue is an important aspect of delegating jobs. The jobs are put in the queue, or a line, and then are executed in that order. SGE controls the queue for us and delegates jobs to be executed at the appropriate time. Overall, the SGE is a huge help for us as users by taking the load of dealing with all these jobs and delegating for us.
Until next time,
E-mail to a friend |