DEEP Testbed HOWTO
Peter A. H. Peterson <pahp@cs.ucla.edu>
Digvijay Singh <digvijay@ucla.edu>
Contents
The Basics
The Basics
Using the DEEP software is described in the DeepQuickstart document. This page describes how to access and use the DEEP testbed.
Logging in to your DEEP node
You should have been emailed your username and password; this is the default password for all your DEEP-related accounts.
To access your DEEP nodes, first ssh to deeplab.cs.ucla.edu. From that host, you can ssh to your DEEP node using the username and password that have been emailed to you. All group members have root access via the 'sudo' command. Your group can repassword root using the command sudo passwd root. You should provide root access to your TA so that he/she can troubleshoot problems.
Machines:
## # shared compute server ## leaplab.cs.ucla.edu leaplab 131.179.192.242 # DEEP nodes leap0.cs.ucla.edu leap0 131.179.192.100 # CryptoDisk leap1.cs.ucla.edu leap1 131.179.192.101 # PowerSecZone leap2.cs.ucla.edu leap2 131.179.192.102 # CryptoFlex leap3.cs.ucla.edu leap3 131.179.192.103 # ElectricSandbox leap4.cs.ucla.edu leap4 131.179.192.104 # OffLoading ## # peer machines -- which may not always be available ## lpeer1.cs.ucla.edu lpeer1 131.179.192.121 # PowerSecZone Peer ## lpeer2.cs.ucla.edu lpeer2 131.179.192.122 # CryptoFlex Peer ## lpeer3.cs.ucla.edu lpeer3 131.179.192.123 # ElectricSandbox Peer ## lpeer4.cs.ucla.edu lpeer4 131.179.192.124 # OffLaoding Peer
Backup policy / Exporting results
We do not provide ANY backup services on the DEEPs or any other machines. None.
If something bad happens on your machines, the data there will be lost. This means that you should be careful to make backups of your data to external machines, using ssh or rsync to non-deeplab machines (deeplab.cs.ucla.edu does not have a large amount of disk space). In addition to your code and instructions for recreating your environments (e.g., what packages you installed), you should also back up your sampled data -- not just your reports output, since the original data is necessary for more advanced analysis. You don't need to keep the data-sync.txt file, as long as you keep the files data.txt, messages, and Energy_Caliper_Control_File for each experiment.
Hardware configuration
The hardware configuration of each LEAP node includes a 1Ghz Intel Atom330 processor, 1G 533Mhz RAM, a 160GB harddrive, and integrated sound, network, and video on the 945GC Express northbridge. We have other hardware available for use, which can be installed upon request to your TA.
More information can be found here.
http://www.intel.com/p/en_US/support/highlights/server/d945gclf2 -- motherboard and CPU
http://www.intel.com/products/desktop/chipsets/945g/945g-overview.htm -- chipset
Software configuration
The software installed on the nodes is openSUSE 11.2, running Linux kernel 2.6.31.14-0.6-default. The choice of distribution and kernel were made for compatibility with the National Instruments DAQ (sampler), so you are advised not to change it. Kernel changes may require rebuilding the modules for the DAQ; more information on this in the troubleshooting and FAQ sections.
The hard disk on the LEAP nodes includes an unused segment which can be formatted as used as desired by your group. Find instructions online for how to add partitions, format them, and mount them in Linux.
Basic troubleshooting
A few things go wrong from time to time; here are a few things to try if you're having issues.
If the sampler is already running, it can be killed by executing kill -9 PID where PID is the process ID of the start_sampling process. To find out the process ID, execute ps aux | grep start_sampling. Wait a few seconds before trying again.
If the sampler just won't collect data, try unloading and reloading the synchronization probe. This can be done by executing:
$ sudo rmmod probe $ sudo insmod /usr/atom_LEAP/code/sync/probe.ko
You can check to see if the probe is installed by executing lsmod | grep probe.
Sometimes, the modules driving the DAQ become invalid. This will happen if the kernel version is changed, or potentially if it is recompiled. To check to see if the sampler is visible to the system, execute lsdaq. You should see output similar to the following:
$ lsdaq ------------------------------------------ Detecting National Instruments DAQ Devices Found the following DAQ Devices: NI USB-6215: "Dev1" (USB0::0x3923::0x7271::01551B0D::RAW) ------------------------------------------
As a last resort, you can always try rebooting your node. Remember that everything in /tmp will disappear following a reboot, so you should always back it up before rebooting. If this still fails to solve the problem, see the other troubleshooting or FAQ sections.