Computer Restart Procedures

Here is where we should keep information on how to restart the computers that periodically need restarting.

List of all lab computers

BURTgooey

Useful links

General FE instructions

c1sus

c1ioo

c1lsc

c1iscex

c1iscey

c1omc

c1ass

nodus

fb (Includes DAQ)

Acromag Systems

c1iool0

c1pem1

op440m

op340m

megatron

SLOW controls computers

Out of Date Ethernet network connection diagram as of Oct 7, 2008: 40m_network_10-07-08.pdf

Martian_Host_Table

Here you can find a map of the computers around the lab.


Which models run on which machines?


To restart the frame builder daqd process, simply do the following from a control room machine:

The init process running on the fb machine will then automatically restart daqd.

Alternatively, one can ssh into FB1 (192.168.113.201), and run sudo systemctl restart daqd_*

/!\ Generally after restarting the frame builder process, the front ends will not be talking to the fb properly (0x2bad and red lights). The easiest solution is to reboot the front ends.

For dataviewer to get data you need to make sure "daqd" and "nds pipe" are running on the fb machine.

daqd and nds have been added to the /etc/inittab file on the fb machine. These will automatically restart when killed or the machine is restarted.

However, if either process fails to start several times in rapid succession, the init process will stop trying.

The code which is called by the init process lives in /opt/rtcds/caltech/c1/target/fb/.

For testpoints to be available for a given front end, you need running on the correct front end computer:

To confirm the necessary codes are running on a front end, you can:

Cold start order is:

  1. If necessary, stop everything (fb, front ends, mx_streams, etc)
  2. Start (or restart) fb services (open-mx, mx, nds, daqd, nds):

controls@fb1:~ 0$ sudo modprobe -r gpstime
controls@fb1:~ 0$ sudo modprobe gpstime
controls@fb1:~ 0$ sudo systemctl start open-mx
controls@fb1:~ 0$ sudo systemctl start mx
controls@fb1:~ 0$ sudo systemctl start nds
controls@fb1:~ 0$ sudo systemctl start daqd*
  1. Start the front ends. They should automatically start all the necessary processes on boot up.
    1. Be patient, it can take over a minute for all the modules to finally load. You can type "dmesg" to see how far its gotten.



This machine runs the c1x03, c1ioo, and c1gpt FE models. It controls mode cleaner wavefront sensors, mode cleaner length, and green locking.

On reboot, these models should automatically start up. See also the fb/DAQ and the General FE Instructions sections.

It is connected via GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1scx, and c1scy computers.

c1ioo is a Sun X4600 machine. As such for a complete shutdown (not normally necessary but sometimes), do the following:

Shutdown the computer normally. (Power button or "shutdown -h now").

Go out to the rack and unplug all 4 power supply cables on the back of the machine.

Wait for a bit for the machine to completely stop (30 seconds or so).

Plug all the cables back in, and press the power button.


This machine runs the c1x02, c1sus, c1mcs, c1rfm, c1pem FE models. It controls the BS, ITMX, ITMY, PRM,SRM,MC1,MC2, and MC3 optics.

On reboot, these models should automatically start up. See also the fb/DAQ and the General FE Instructions sections.

It is connected by a GE Fanuc VMIC 5565 RFM card to the c1scx, c1scy, and c1ioo machines.

It is connected by a Dolphin PCIE reflected memory card to the c1lsc machine.


This machine runs the c1x04 and c1lsc FE models.

On reboot, these models should automatically start up. See also the fb/DAQ and the General FE Instructions sections.

It is connected by a Dolphin PCIE reflected memory card to the c1sus machine.


This machine runs the c1x01 and c1scx FE models.

On reboot, these models should automatically start up. See also the fb/DAQ and the General FE Instructions sections.

It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers.


This machine runs the c1x05 and c1scy FE models.

On reboot, these models should automatically start up. See also the fb/DAQ and the General FE Instructions sections.

It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers.


Newer (circa 2018) slow controls systems consist of a Debian rackmount server connected to an Acromag chassis. As of June 2021, all of the slow controls systems have been replaced, except for c1auxey. The new Acromag systems currently include c1vac, c1psl, c1susaux, and c1auxex.

If the EPICS channels are unresponsive, first try connecting via ssh from another computer and remotely rebooting:

and then burt restore. If you are unable connect to the server via ssh, walk out to the rack and manually power cycle the computer.

If some channels remain unresponsive or show zero values after rebooting, power cycle the Acromag chassis and reboot the server again.


Sometimes you can just do this guy by doing:

then burt restore this guy.

If it still doesn't come back then sing this link.


At the command prompt, type:

Try CTRL+x

It should reboot c1iool0

This computer automatically executes startup.cmd. So there is no need to run it manually.

If for some reason it does not load the startup script automatically, try this:

At the telnet prompt, type

Then, after the main loop is started, type CTRL-], followed by


-1) Make sure the c1omc is powered on--it doesn't power up automatically following a power outage. First find the OMC, then press its power button.

0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details.

1) while logged in as controls, run the script startupC1 in the c1omcepics target directory.

2) Log in as root. Start the real-time code by running the omcfe.rtl script in the c1omc

2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it.

3) Also, as root, run the command /opt/gds/awgtpman -2 in the background.

Note that c1omc has two ethernet ports. Use the bottom one.

If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a.

A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then,



Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.


Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'.

Restart the following scripts:


Nodus is a Solaris box in the rack in the office. Here are some of the things that it runs that you will want to restart:


Megatron is a Ubuntu box that is used for running our MC autolocker and FSS Slow control scripts. These scripts are enabled to start at boot and restart on failure. So you would not need to start them ideally. If the are not running, check their status by:

 sudo systemctl status MCautolocker -l
 sudo systemctl status FSSslowPy -l

Make sure the nfs mount is still working by checking:

 mount -l | grep nfs

If nfs mount isn't working, check if the nameserver resolution is working. 192.168.113.104 (chiara) should be listed as nameserver in the file, /etc/resolv.conf. If that is not the case, follow the steps in this elog 40m/16479. Once you have nameserver resolution, rebooting the computer again should fix everything.


When rebooting the FE machines, it is necessary to BURTgooey to the SLOW controls machines to restore the epics settings (restore using snapshot files from directory /opt/rtcds/caltech/c1/burt/autoburt/today/).

Since these are really OLD machines, we cannot ssh into these. Ping the targets listed in /cvs/cds/caltech/target/. You can connect to the machines by 'telnet <computer name>'.

To check the status of the computer, type 'i' at the command prompt. This will output a table of the processes running on the computer. To exit the machine type 'Ctrl+Shift+]' which will take you back to telnet. Then type 'quit' to safely exit.

If the computers do not respond (in cases where you cannot telnet into them), it is ok to hard reboot them. This can be done using the key on the crate. If no key, press the reset button or keep the ON/OFF button pressed until the machine powers down. Then, press ON/OFF again.

List of SLOW computers that are alive (as of Jul 18 2013):

SLOW computers not present or dead (as of Jul 18 2013):

Computer_Restart_Procedures (last edited 2021-11-23 01:58:50 by AnchalguptaATligoDOTorg)