DEEP Testbed HOWTO

Peter A. H. Peterson <pahp@cs.ucla.edu>
Digvijay Singh <digvijay@ucla.edu>

Contents

The Basics

  1. Logging in to your LEAP node

  2. Backup policy / Exporting results

  3. Hardware configuration

  4. Software configuration

  5. Basic troubleshooting

The Basics

Using the DEEP software is described in the DeepQuickstart document. This page describes how to access and use the DEEP testbed.

Logging in to your DEEP node

You should have been emailed your username and password; this is the default password for all your DEEP-related accounts.

To access your DEEP nodes, first ssh to deeplab.cs.ucla.edu. From that host, you can ssh to your DEEP node using the username and password that have been emailed to you. All group members have root access via the 'sudo' command. Your group can repassword root using the command sudo passwd root. You should provide root access to your TA so that he/she can troubleshoot problems.

Machines:

## # shared compute server
## leaplab.cs.ucla.edu  leaplab 131.179.192.242

# DEEP nodes
leap0.cs.ucla.edu       leap0   131.179.192.100 # CryptoDisk 
leap1.cs.ucla.edu       leap1   131.179.192.101 # PowerSecZone
leap2.cs.ucla.edu       leap2   131.179.192.102 # CryptoFlex
leap3.cs.ucla.edu       leap3   131.179.192.103 # ElectricSandbox
leap4.cs.ucla.edu       leap4   131.179.192.104 # OffLoading

## # peer machines -- which may not always be available
## lpeer1.cs.ucla.edu   lpeer1  131.179.192.121 # PowerSecZone Peer
## lpeer2.cs.ucla.edu   lpeer2  131.179.192.122 # CryptoFlex Peer
## lpeer3.cs.ucla.edu   lpeer3  131.179.192.123 # ElectricSandbox Peer
## lpeer4.cs.ucla.edu   lpeer4  131.179.192.124 # OffLaoding Peer

Backup policy / Exporting results

We do not provide ANY backup services on the DEEPs or any other machines. None.

If something bad happens on your machines, the data there will be lost. This means that you should be careful to make backups of your data to external machines, using ssh or rsync to non-deeplab machines (deeplab.cs.ucla.edu does not have a large amount of disk space). In addition to your code and instructions for recreating your environments (e.g., what packages you installed), you should also back up your sampled data -- not just your reports output, since the original data is necessary for more advanced analysis. You don't need to keep the data-sync.txt file, as long as you keep the files data.txt, messages, and Energy_Caliper_Control_File for each experiment.

Hardware configuration

The hardware configuration of each LEAP node includes a 1Ghz Intel Atom330 processor, 1G 533Mhz RAM, a 160GB harddrive, and integrated sound, network, and video on the 945GC Express northbridge. We have other hardware available for use, which can be installed upon request to your TA.

More information can be found here.

Software configuration

The software installed on the nodes is openSUSE 11.2, running Linux kernel 2.6.31.14-0.6-default. The choice of distribution and kernel were made for compatibility with the National Instruments DAQ (sampler), so you are advised not to change it. Kernel changes may require rebuilding the modules for the DAQ; more information on this in the troubleshooting and FAQ sections.

The hard disk on the LEAP nodes includes an unused segment which can be formatted as used as desired by your group. Find instructions online for how to add partitions, format them, and mount them in Linux.

Basic troubleshooting

A few things go wrong from time to time; here are a few things to try if you're having issues.

If the sampler is already running, it can be killed by executing kill -9 PID where PID is the process ID of the start_sampling process. To find out the process ID, execute ps aux | grep start_sampling. Wait a few seconds before trying again.

If the sampler just won't collect data, try unloading and reloading the synchronization probe. This can be done by executing:

$ sudo rmmod probe
$ sudo insmod /usr/atom_LEAP/code/sync/probe.ko

You can check to see if the probe is installed by executing lsmod | grep probe.

Sometimes, the modules driving the DAQ become invalid. This will happen if the kernel version is changed, or potentially if it is recompiled. To check to see if the sampler is visible to the system, execute lsdaq. You should see output similar to the following:

$ lsdaq
------------------------------------------
Detecting National Instruments DAQ Devices
Found the following DAQ Devices:
NI USB-6215: "Dev1"    (USB0::0x3923::0x7271::01551B0D::RAW)
------------------------------------------

As a last resort, you can always try rebooting your node. Remember that everything in /tmp will disappear following a reboot, so you should always back it up before rebooting. If this still fails to solve the problem, see the other troubleshooting or FAQ sections.

DeepWiki: TestbedHOWTO (last edited 2011-12-19 15:55:27 by PeterPeterson)