Nimbus (OS)
Linux Labs – Beowulf Distribution: Codename Nimbus
Cluster Management Overview
I. Architecture Overview
*The **Beostat daemon has been replaced with the ** supermon ** utilities from LANL - this is a very lightweight __/proc__ based system that uses virtually no system resources; as opposed to its rather onerous predecessor.
* **Wakinyan** Monitor: A graphical monitor that both saves on screen space and has ambient temperature output.
* **2.4.19 Linux kernel** for [...]-edge stability and latest feature set (i.e. hyperthreading capability for Xeon-based clusters).
* **bproc** has been updated to the advanced LANL version with the following features:
* Unified **__P__**rocess**__ID__**entification space more complete. Bproc system daemons are fully hidden once a node boots.
- __OLD__: when a process spawns on a slave node, it initializes, then a new PID is issued
- __NEW__: all system processes disappear, and PIDs are global on all nodes
* Access control is now available on a node-by-node basis:
- User / Group / Other (i.e. chmod uga) on slave nodes themselves
- Permissions are checked on nodes for job eligibility by users.
- This is useful in a shared cluster where not everyone can use all the nodes.
* **rarpcatcher** replaces **beosetup**
* The status info is put into __/etc/beowulf/config__
* This process runs on startup
* It is always running as daemon, so when nodes are added on the fly, the process HUP’s (restarts) the beowulf system and adds in the new nodes.
* **NEW**: all node data is only lost in the case of a complete trashing of your filesystem- this due to ext3 filesystem. We have experienced ZERO corruption in extensive testing. This is opposed to older versions of the software which took a rather cavalier attitude towards node filesystem data.
* **ALSO**: Regarding I. Above, this makes boot much cleaner
* **NOTE**: If one or more of your nodes has important data, issue a sync command before you power cycle
* **REMEMBER**: the only non-persistent data stored on nodes are the libraries and system files that are copied to the node at boot time.