Computer_Restart_Procedures

Here is where we should keep information on how to restart the computers that periodically need restarting.

[#List List of all lab computers]

[#coldstart Coldstart procedures]

[#c1lsc c1lsc]	[#c1iscex c1iscex]	[#c1iscey c1iscey]	[#c1sosvme c1sosvme]	[#c1susvme1 c1susvme1]	[#c1susvme2 c1susvme2]	[#nodus nodus]
[#c1psl c1psl]	[#c1iool0 c1iool0]	[#c0dcu1 c0dcu1]	[#c1asc c1asc]	[#c0daqawg c0daqawg]	[#c0daqctrl c0daqctrl]	[#c1iscepics c1iscepics]
[#c1omc c1omc]	[#c1ass c1ass]	[#fb40m fb40m]	[#EPICS EPICS]	[#op440m op440m]	[#op340m op340m]	[#c1pem1 c1pem1]	[#c1iovme c1iovme]

[#HardwareReboot Hardware Reboot Procedure for FE computers]

Ethernet network connection diagram as of Apr 25, 2008: attachment:40m_network_042508.pdf

Martian Host Table

c1lsc

Turn OFF all the SUS buttons on the right hand side of the LSC screen (C1LSC.adl).

Push both of the RESET buttons on the little RESET screen on the LSC screen (FIRST GREEN, then RED).

From a control room terminal type:

ssh c1lsc

cd /cvs/cds/caltech/target/c1lsc/

./startup.cmd

c1iscex

Shut off the watchdogs for ETMX via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1iscex

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1iscex/startup.cmd

When the line "starting main loop (printing from sus_start)." appears, hit CTRL-] and type:

quit

Turn the watchdogs back on, once the computer is up again.

c1iscey

Shut off the watchdogs for ETMY via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1iscey

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1iscey/startup.cmd

Turn the watchdogs back on, once the computer is up again.

Type ctrl-] to break. Then "quit".

c1sosvme

Shut off the watchdogs for all optics via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1sosvme

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1sosvme/startup.cmd

You will probably need to restart c1susvme1 and c1susvme2 now.

Type ctrl-] to break. Then "quit".

c1susvme1

Shut off the watchdogs for ITMX, ITMY, BS, PRM via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

ssh c1susvme1

Become superuser by running the su command.

Go to the /cvs/cds/caltech/target/c1susvme1 directory.

./startup.cmd

Turn the watchdogs back on, once the computer is up again.

c1susvme2

Shut off the watchdogs for SRM, MC1, MC2, MC3 optics via epics.

Disable the MC autolocker (by clicking the DISABLE button on the C1IOO_LockMC.adl screen).

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

ssh c1susvme2

Become superuser by running the su command.

Go to the /cvs/cds/caltech/target/c1susvme2 directory.

./startup.cmd

Turn the watchdogs back on, once the computer is up again.

Re-Enable the MC autolocker.

c1psl

Sometimes you can just do this guy by doing:

telnet c1psl

reboot

then burt restore this guy.

But often, this just makes it upset and the screens go white but it never comes back. When that happens go out to the rack (the one next to the one with the MC servo) and turn off the crate (on the bottom) which has the c1psl processor. After ~3.14 seconds, turn it back on. c1psl ought to come back now.

If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.

c1pem1

Sometimes you can just do this guy by doing:

telnet c1pem1

reboot

then burt restore this guy.

If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.

c1iool0

At the command prompt, type:

telnet c1iool0

At the telnet prompt, type

< /cvs/cds/caltech/target/c1iool0/startup.cmd

Then, after the main loop is started, type CTRL-], followed by

quit

c1iovme

At the command prompt, type:

telnet c1iovme

At the telnet prompt, type

< /cvs/cds/caltech/target/c1iovme/startup.cmd

Type CTRL-], followed by quit

c0dcu1

Restart procedure needed.

(Temporary)

Turn key on DAQ CTRL crate to turn off ALL framebuilders; turn back after ~10 seconds

Follow restart procedures for all other computers

OR, first try pressing the "reset" button on c0dcu1 and waiting ~3 minutes.

c1asc

Turn key on ASC crate to shut off power; turn back after ~10 seconds

Type

telnet c1asc

Then type

< /cvs/cds/caltech/target/c1asc/startup.cmd

After signal light on DAQ_DETAIL screen turns green, type CTRL-], followed by

quit

c0daqawg 1) First try this:
> telnet c0daqawg
- if you get a prompt try:
> vmeBusReset
- The AWG light on the RFM screen ought to go red. IF it does, wait ~5 minutes for it to come back.

2) If the above doesn't work, try:

Turn key on DAQ AWG crate to shut off power; turn back after ~10 seconds

3) Can also try: pushing the recessed RESET button on the c0daqawg computer

That's it

c0daqctrl 1) No clue. Power cycle? SSH? Stab it with a spoon???

Try pushing the recessed reset button. This may be the last thing that needs to be done.

c1omc

-1) Make sure the c1omc is powered on--it doesn't power up automatically following a power outage. First find the OMC, then press its power button.

0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details.

1) while logged in as controls, run the script startupC1 in the c1omcepics target directory.

2) Log in as root. Start the real-time code by running the omcfe.rtl script in the c1omc

target directory.

2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it.

3) Also, as root, run the command /opt/gds/awgtpman -2 in the background.

Note that c1omc has two ethernet ports. Use the bottom one.

If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a.

A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then,

i) log in to c1omc. become root.
ii) kill epics with a 'pkill omcepics'
iii) kill the test-point manager with a 'pkill awgtpman'
iv) remove the front end kernel module with '/sbin/rmmod omcfe'
v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod'

c1ass
Currently the procedure for restarting C1ASS seems to be the same as for C1OMC above except that the ass test point manager doesn't need the "-2" flag.

fb40m

This is not really a reboot procedure, as I don't know it.

To restart the testpoint manager: First check if it is running (i.e. does it need to be started) 'ps -ef | grep tpman'. If it needs to be started, 'ssh fb40m', 'su' to root, run '/usr/controls/tpman &'.

Then restart the 'daqd' process:'telnet fb40m 8087', type "shutdown" at the prompt. The framebuilder will restart itself in ~20s.

If everything gets hosed, and the RAID is angrily flashing red lights, power off the framebuilder (by logging in as SU and then typing "poweroff"), power-cycle the RAID, then turn the framebuilder on. If there is disk corruption, you can use "fsck -y" to automatically answer "yes" to all of "fsck"'s questions, so it can run unattended.

Restart the nightly BACKUP of /cvs/cds and our trend-frames by following the instructions in Restarting the backup script

If the frame builder light is red on the reflected memory MEDM screens but the machine seems otherwise to be working, try pressing the reset button on the reflected memory bypass switch.
If the frame builder still isn't working, make sure that c0daqctrl is running properly.

EPICS

c1dcuepics runs the processes labeled "dcuepics40m" and "losepics" and "iscepics40m". These should start automatically.BR

c1dcuepics DOES NOT EXIST IN 2007, 2008 ? (old entry told us: runs the process "iscepics40m" which can be started by ssh'ing into c1dcuepics, cd'ing into /cvs/cds/caltech/target/c1iscepics and running ./startupC1 as user 'controls')

C1:IOO-MC_F channels may not come back unless the IOO rack is keyed; follow C1IOVME procedure after that

op440m

Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.

op340m

Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'.

Restart the following scripts:

conlog

nodus

Nodus is a Solaris box in the rack in the office. Here are some of the things that it runs that you will want to restart:

List of all lab controls computers

In control room: linux1 - in network rack - NFS server for /cvs/cds (keeps two copies, raid1) BR rana - in network rack - gateway machine (also called gate40m and rana113) BR linux2 - controls console, running Linux kernel 2.6.9-1.667smp BR linux3 - controls console, running Linux kernel 2.6.9-1.667 BR op140m - controls console, running Solaris 9 BR op340m - headless script machine, running Solaris 9 BR op440m - controls console, running Solaris 9 BR op540m - controls console, running Solaris 9 BR BR Non computers: BR 131.215.113.2 - NAT router in network rack BR 131.215.113.10 c0rga - iServer for the RGA BR BR From /cvs/cds/caltech/target : BR c0daqawg - front-end VME cpu running linux (?) in 1Y6 BR c0daqctrl - front-end VME cpu running linux in 1Y7 BR c0dcu1 - front-end VME cpu running VxWorks (?) in 1Y7 BR c1asc - front-end VME cpu running linux in 1X5 BR c1aux - EPICS VME cpu running VxWorks in 1X1 BR c1auxex - EPICS VME cpu running VxWorks in 1X4 BR c1auxey - EPICS VME cpu running VxWorks in 1Y7 BR c1dcuepics - EPICS PC cpu running linux in 1Y6 BR c1iool0 - EPICS VME cpu running VxWorks in 1Y2 BR c1iovme - front-end VME cpu running linux in 1Y2 BR c1iscaux - EPICS VME cpu running VxWorks(?) in 1X3 BR c1iscaux2 - EPICS VME cpu running VxWorks(?) in 1X3 BR c1iscepics - EPICS PC cpu running linux in 1X6 DOES NOT EXIST as of Oct 2007 BR c1iscex - front-end VME cpu running linux in 1X9 BR c1iscey - front-end VME cpu running linux in 1Y7 BR c1losepics - EPICS PC cpu running linux in 1Y6 BR c1lsc - front-end VME cpu running linux in 1X5 BR c1pem1 - EPICS VME cpu running VxWorks(?) in 1Y? BR c1psl - EPICS VME cpu running VxWorks(?) in 1Y1 BR c1sosvme - front-end VME cpu running linux in 1Y4 BR c1susaux - EPICS VME cpu running VxWorks(?) in 1Y5 BR c1susvme1 - front-end VME cpu running linux in 1Y4 BR c1susvme2 - front-end VME cpu running linux in 1Y4 BR c1vac1 - EPICS VME cpu running VxWorks in 1Y9 BR c1vac2 - EPICS VME cpu running VxWorks in 1Y9 BR

coldstart procedures

check the vacuum controls in rack 1Y9 (on UPS) BR check the laser chiller, laser power supply, ion pump HV BR make sure linux1 is up and serving /cvs/cds (on UPS) BR make sure rana113 (gate40m) is up and has /cvs/cds mounted (on UPS) BR reset the Marconi RF signal generators according to the stickers on the front BR The controls computers should be up (on UPS) BR Bring up the embedded computers, starting with EPICS: BR c1vac1 and c1vac2 (on UPS), c1psl, c1iool0, c1iscaux, c1iscaux2, c1iscepics, c1dcuepics, c1susaux, c1aux, c1auxex, c1auxey BR

NB: you will probably have to actually power-on scipe27 (c1iscepics) and scipe25 (c1losepics/c1dcuepics). BR NB-2: As of October 2007, computer 'c1isepics' does NOT exist, so do not do anything related to that computer. BR

then the DAQ: c0daqctrl, c0daqawg, c0dcu1 BR check the RFM switch--if it's not green, reset it BR make sure the framebuilder (fb40m) is building frames--i.e., all the MEDM lights are greenBR then the front-end servos: c1iovme, c1sosvme, c1susvme1, c1susvme2, c1iscex, c1iscey, c1lsc, c1asc, c1omc, c1ass

NB: above, re-boot c1susvme1, c1susvme2, c1lsc after powering-on scipe27 so they can get a fresh copy of linux from scipe27. BR

do BURT restores of c1iscepics.snap, c1losepics.snap, c1omcepics.snap, c1assepics.snap--everything else should do saverestore (automatic) BR check for stuck EPICS buttons/sliders (just give everything a quick twiddle) BR restart the testpoint manager BR reset the Uniblitz mechanical shutters after power outagesBR press the "closed loop" buttons for the input-steering piezojena controllers BR Restart c1omc. BR Restart the nightly BACKUP of /cvs/cds and our trend-frames by following the instructions in the ILOG. BR Restart the conlogger on op340m BR Restart Mafalda (Linux machine) by pressing its power button ;BR BR As the "/Frames" partition from the framebuilder is not mounted to the control computers via /etc/fstab, it is recommnded to mount "by hand", otherwise programs from mDV directory might not work. In order to do so, please type the following commands in the terminal command line: BR >> su BR use the password for the "Roots" BR >>sudo mount -ro nosuid fb40m:/frames /frames BR BR

The day after a power outage, check that the RGA logger is still logging, and that the RGA is still RGA'ing BR

Hardware Reboot Procedure for FE computers

Starting from Y-end, look for crates with RFM network connection (orange cables). Turn off the power by turning the key and wait for 10sec. Then turn on the power again. You should hear a beep after a while. Go thorough every crate which has orange cables connected. Then initialize each FE computer following the procedures described above. Check the status of the computers by C0DAQ_RFMNETWORK.adl.