Diff for "Computer_Restart_Procedures"

Differences between revisions 4 and 33 (spanning 29 versions)

Here is where we should keep information on how to restart the computers that periodically need restarting.

[#List List of all lab computers]

[#coldstart Coldstart procedures]

[#hosttable Martian Host Table]

[#c1lsc c1lsc]	[#c1iscex c1iscex]	[#c1iscey c1iscey]	[#c1sosvme c1sosvme]	[#c1susvme1 c1susvme1]	[#c1susvme2 c1susvme2]
[#c1psl c1psl]	[#c1iool0 c1iool0]	[#c0dcu1 c0dcu1]	[#c1asc c1asc]	[#c0daqawg c0daqawg]	[#c0daqctrl c0daqctrl]
[#c1omc c1omc]	[#fb40m fb40m]	[#EPICS EPICS]	[#op440m op440m]	[#op340m op340m]

c1lsc

Turn OFF all the SUS buttons on the right hand side of the LSC screen (C1LSC.adl).

Push both of the RESET buttons on the little RESET screen on the LSC screen.

From a control room terminal type:

ssh c1lsc

cd /cvs/cds/caltech/target/c1lsc/

./startup.cmd

c1iscex

Shut off the watchdogs for ETMX via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1iscex

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1iscex/startup.cmd

When the line "starting main loop (printing from sus_start)." appears, hit CTRL-] and type:

quit

Turn the watchdogs back on, once the computer is up again.

c1iscey

Shut off the watchdogs for ETMY via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1iscey

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1iscey/startup.cmd

Turn the watchdogs back on, once the computer is up again.

Type ctrl-] to break. Then "quit".

c1sosvme

Shut off the watchdogs for all optics via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1sosvme

Copy and paste this into the command line (including the "<"):

< /cvs/cds/caltech/target/c1sosvme/startup.cmd

You will probably need to restart c1susvme1 and c1susvme2 now.

Type ctrl-] to break. Then "quit".

c1susvme1

Shut off the watchdogs for ITMX, ITMY, BS, PRM via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

ssh c1susvme1

Become superuser by running the su command.

Go to the /cvs/cds/caltech/target/c1susvme1 directory.

./startup.cmd

Turn the watchdogs back on, once the computer is up again.

c1susvme2

Shut off the watchdogs for SRM, MC1, MC2, MC3 optics via epics.

Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.

From a control room terminal type:

telnet c1susvme2

At the prompt, type:

su

Type in the superuser password.

cd c1susvme2 ./startup.cmd

Type crtl-] to break. Then type "quit".

Turn the watchdogs back on, once the computer is up again.

Reenable the Mode Cleaner autolocker.

c1psl

Sometimes you can just do this guy by doing:

telnet c1psl

reboot

then burt restore this guy.

But often, this just makes it upset and the screens go white but it never comes back. When that happens go out to the rack (the one next to the one with the MC servo) and turn off the crate (on the bottom) which has the c1psl processor. After ~3.14 seconds, turn it back on. c1psl ought to come back now.

If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.

c1iool0

At the command prompt, type:

telnet c1iool0

At the telnet prompt, type

< /cvs/cds/caltech/target/c1iool0/startup.cmd

Then, after the main loop is started, type CTRL-], followed by

quit

c0dcu1

Restart procedure needed.

(Temporary)

Turn key on DAQ CTRL crate to turn off ALL framebuilders; turn back after ~10 seconds

Follow restart procedures for all other computers

OR, first try pressing the "reset" button on c0dcu1 and waiting ~3 minutes.

c1asc

Turn key on ASC crate to shut off power; turn back after ~10 seconds

Type

telnet c1asc

Then type

< /cvs/cds/caltech/target/c1asc/startup.cmd

After signal light on DAQ_DETAIL screen turns green, type CTRL-], followed by

quit

c0daqawg 1) First try this:

> telnet c0daqawg
- if you get a prompt try:
> vmeBusReset
- The AWG light on the RFM screen ought to go red. IF it does, wait ~5 minutes for it to come back.

2) If the above doesn't work, try:

Turn key on DAQ AWG crate to shut off power; turn back after ~10 seconds That's it

c0daqctrl 1) No clue. Power cycle? SSH? Stab it with a spoon???

c1omc

0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details.

1) while logged in as controls, run the script startupC1 in the c1omcepics target directory.

2) Log in as root. Start the real-time code by running the omcfe.rtl script in the c1omc

target directory.

2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it.

3) Also, as root, run the command /opt/gds/awgtpman -2 in the background.

Note that c1omc has two ethernet ports. Use the bottom one.

If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a.

A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then,

i) log in to c1omc. become root.
ii) kill epics with a 'pkill omcepics'
iii) kill the test-point manager with a 'pkill awgtpman'
iv) remove the front end kernel module with '/sbin/rmmod omcfe'
v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod'

fb40m

This is not really a reboot procedure, as I don't know it. But, to restart the testpoint manager, log in as root and run '/usr/controls/tpman' in the background. Then restart the 'daqd' process by doing a "telnet fb40m 8087" and typing "shutdown" at the prompt.

If everything gets hosed, and the RAID is angrily flashing red lights, power off the framebuilder (by logging in as SU and then typing "poweroff"), power-cycle the RAID, then turn the framebuilder on. If there is disk corruption, you can use "fsck -y" to automatically answer "yes" to all of "fsck"'s questions, so it can run unattended.

EPICS

c1dcuepics runs the processes labeled "dcuepics40m" and "losepics". These should start automatically. c1iscepics runs the process "iscepics40m" which can be started by running ./startupC1 as user 'controls'

C1:IOO-MC_F channels may not come back unless the IOO rack is keyed; follow C1IOVME procedure after that

op440m

Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.

After logging in you must restart the following scripts: autolockMCmain, PSLwatch, and FSSSlowServo. Run those scripts in the background. Then type "restart_conlogger" at the prompt.

op340m

Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'.

List of all lab controls computers

In control room: linux1 - in network rack - NFS server for /cvs/cds (keeps two copies, raid1) BR linux2 - controls console, running Linux kernel 2.6.9-1.667smp BR linux3 - controls console, running Linux kernel 2.6.9-1.667 BR op140m - controls console, running Solaris 9 BR op440m - controls console, running Solaris 9 BR op540m - controls console, running Solaris 9 BR BR From /cvs/cds/caltech/target : BR c0daqawg - front-end VME cpu running linux (?) in 1Y6 BR c0daqctrl - front-end VME cpu running linux in 1Y7 BR c0dcu1 - front-end VME cpu running VxWorks (?) in 1Y7 BR c1asc - front-end VME cpu running linux in 1X5 BR c1aux - EPICS VME cpu running VxWorks in 1X1 BR c1auxex - EPICS VME cpu running VxWorks in 1X9 BR c1auxey - EPICS VME cpu running VxWorks in 1Y7 BR c1dcuepics - EPICS PC cpu running linux in 1Y6 BR c1iool0 - EPICS VME cpu running VxWorks in 1Y2 BR c1iovme - front-end VME cpu running linux in 1Y2 BR c1iscaux - EPICS VME cpu running VxWorks(?) in 1X5 BR c1iscaux2 - EPICS VME cpu running VxWorks(?) in 1X5 BR c1iscepics - EPICS PC cpu running linux in 1X6 BR c1iscex - front-end VME cpu running linux in 1X9 BR c1iscey - front-end VME cpu running linux in 1Y7 BR c1losepics - EPICS PC cpu running linux in 1Y6 BR c1lsc - front-end VME cpu running linux in 1X5 BR c1pem1 - EPICS VME cpu running VxWorks(?) in 1Y? BR c1psl - EPICS VME cpu running VxWorks(?) in 1Y1 BR c1sosvme - front-end VME cpu running linux in 1Y4 BR c1susaux - EPICS VME cpu running VxWorks(?) in 1Y5 BR c1susvme1 - front-end VME cpu running linux in 1Y4 BR c1susvme2 - front-end VME cpu running linux in 1Y4 BR c1vac1 - EPICS VME cpu running VxWorks in 1Y9 BR c1vac2 - EPICS VME cpu running VxWorks in 1Y9 BR

Martian Host Table

This is a list of the Martian network hosttable on op140m on April 23rd 2007 The NAT router is at 131.215.113.2

131.215.113.20 linux1

131.215.113.21 linux2 131.215.113.22 linux3 131.215.113.10 c0rga 131.215.113.211 op140m op140m.ligo.caltech.edu loghost 131.215.113.201 rana113 rana 131.215.113.202 fb40m fb0 131.215.113.203 br40m 131.215.113.204 dmt140m 131.215.113.205 dmt240m 131.215.113.206 131.215.113.207 131.215.113.208 131.215.113.209 131.215.113.210 hpmartian 131.215.113.211 op140m 131.215.113.212 op240m 131.215.113.213 op340m 131.215.113.214 op440m 131.215.113.215 op540m 131.215.113.221 40mars-221 131.215.113.222 40mars-222 131.215.113.223 40mars-223 131.215.113.224 40mars-224 131.215.113.225 40mars-225 131.215.113.226 40mars-226 131.215.113.227 40mars-227 131.215.113.228 40mars-228 131.215.113.229 40mars-229 131.215.113.230 40mars-230 131.215.113.231 40mars-231 131.215.113.232 40mars-232 131.215.113.233 40mars-233 131.215.113.234 40mars-234 131.215.113.235 40mars-235 131.215.113.236 40mars-236 131.215.113.237 40mars-237 131.215.113.238 40mars-238 131.215.113.239 40mars-239 131.215.113.240 40mars-240 131.215.113.7 cdssol6 131.215.113.51 scipe1 c1pem1 131.215.113.52 scipe2 c1vac1 131.215.113.53 scipe3 c1psl 131.215.113.54 scipe4 c1vac2 131.215.113.55 scipe5 c1susaux 131.215.113.56 scipe6 c1omc 131.215.113.57 scipe7 c1iool0 131.215.113.58 scipe8 c1ass 131.215.113.59 scipe9 c1auxex 131.215.113.60 scipe10 c1auxey 131.215.113.61 scipe11 c1aux 131.215.113.62 scipe12 c1lsc 131.215.113.63 scipe13 c1susvme2 131.215.113.64 scipe14 c1susvme1 131.215.113.65 scipe15 131.215.113.66 scipe16 131.215.113.67 scipe17 c1iovme 131.215.113.68 scipe18 c1sosvme 131.215.113.69 scipe19 c1losepics 131.215.113.70 scipe20 c1asc 131.215.113.71 scipe21 c0daqctrl 131.215.113.73 scipe23 131.215.113.74 scipe24 c0dcu1 131.215.113.75 scipe25 c1dcuepics 131.215.113.77 scipe27 c1lscbootserver c1iscepics 131.215.113.78 scipe28 c0daqawg 131.215.113.79 scipe29 c1iscey 131.215.113.80 scipe30 c1iscex 131.215.113.81 scipe31 c1iscaux 131.215.113.82 scipe32 c1iscaux2 131.215.113.90 rfm-bypass vmiacc-5595 131.215.113.101 linux101 131.215.113.102 linux102

coldstart procedures

check the vacuum controls in rack 1Y9 (on UPS) BR check the laser chiller, laser power supply, ion pump HV BR make sure linux1 is up and serving /cvs/cds (on UPS) BR make sure rana113 (gate40m) is up and has /cvs/cds mounted (on UPS) BR reset the Marconi RF signal generators BR The controls computers should be up (on UPS) BR Bring up the embedded computers, starting with EPICS: BR c1vac1 and c1vac2 (on UPS), c1psl, c1iool0, c1iscaux, c1iscaux2, c1iscepics, c1dcuepics, c1susaux, c1aux, c1auxex, c1auxey BR

NB: you will probably have to actually power-on scipe27 (c1iscepics) and scipe25 (c1losepics/c1dcuepics). BR

then the DAQ: c0daqctrl, c0daqawg, c0dcu1 BR check the RFM switch--if it's not green, reset it BR make sure the framebuilder (fb40m) is building frames--i.e., all the MEDM lights are greenBR then the front-end servos: c1iovme, c1sosvme, c1susvme1, c1susvme2, c1iscex, c1iscey, c1lsc, c1asc, c1omc, c1ass

NB: above, re-boot c1susvme1, c1susvme2, c1lsc so they can get a fresh copy of linux from scipe27. BR

do BURT restores of c1iscepics.snap, c1losepics.snap, c1omcepics.snap, c1assepics.snap--everything else should do saverestore (automatic) BR check for stuck EPICS buttons/sliders (just give everything a quick twiddle) BR restart the testpoint manager BR reset the mechanical shutters after power outagesBR press the "closed loop" buttons for the input-steering piezojena controllers BR

-  ⇤ ← Revision 4 as of 2006-10-20 01:37:36 → 
  Size: 7896
  Editor: RobertWard
  Comment:
+   ← Revision 33 as of 2007-04-24 21:47:01 → ⇥
  Size: 13744
  Editor: RobertWard
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
-||[#c1lsc c1lsc]||[#c1iscex c1iscex]||[#c1iscey c1iscey]||[#c1sosvme c1sosvme]||[#c1susvme1 c1susvme1]||[#c1susvme2 c1susvme2]||[#c1psl c1psl]||[#c1iool0 c1iool0]||[#c0dcu1 c0dcu1]||[#c1asc c1asc]||[#c0daqawg c0daqawg]||[#c0daqctrl c0daqctrl]||[#c1omc c1omc]||
+||[#hosttable Martian Host Table] ||

||[#c1lsc c1lsc]||[#c1iscex c1iscex]||[#c1iscey c1iscey]||[#c1sosvme c1sosvme]||[#c1susvme1 c1susvme1]||[#c1susvme2 c1susvme2]||
||[#c1psl c1psl]||[#c1iool0 c1iool0]||[#c0dcu1 c0dcu1]||[#c1asc c1asc]||[#c0daqawg c0daqawg]||[#c0daqctrl c0daqctrl]||
||[#c1omc c1omc]||[#fb40m fb40m]||[#EPICS EPICS]||[#op440m op440m]||[#op340m op340m]||
-Line 144:
+Line 148:
+then burt restore this guy.
-Line 173:
+Line 179:
+OR, first try pressing the "reset" button on c0dcu1 and waiting ~3 minutes.
-Line 212:
+Line 220:
+-------
<<Anchor(c1omc)>>'''c1omc'''

0) Make sure the previous incarnation of the code is no longer running.  See Appendix A for details.

1) while logged in as controls, run the script '''startupC1''' in the '''c1omcepics''' target directory.

2) Log in as root.  Start the real-time code by running the '''omcfe.rtl''' script in the '''c1omc'''
   target directory.  

2.5) Now the process will wait for a BURT restore.  Find the appropriate autoburt snapshot file, and restore it.

3) Also, as root, run the command '''/opt/gds/awgtpman -2''' in the background.

Note that c1omc has two ethernet ports.  Use the bottom one.

If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds.
If it's not, sudo mount -a.

A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen.  Then,
 . i) log in to c1omc.  become root.
 . ii) kill epics with a 'pkill omcepics'
 . iii) kill the test-point manager with a 'pkill awgtpman'
 . iv) remove the front end kernel module with '/sbin/rmmod omcfe'
 . v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod'

-------
<<Anchor(fb40m)>>'''fb40m'''

This is not really a reboot procedure, as I don't know it.  But, to restart the testpoint manager, log in as root and run '/usr/controls/tpman' in the background.  Then restart the 'daqd' process by doing a "telnet fb40m 8087" and typing "shutdown" at the prompt.

If everything gets hosed, and the RAID is angrily flashing red lights, power off the framebuilder (by logging in as SU and then typing
"poweroff"), power-cycle the RAID, then turn the framebuilder on.  If there is disk corruption, you can use "fsck -y" to automatically
answer "yes" to all of "fsck"'s questions, so it can run unattended.
-------
<<Anchor(EPICS)>>'''EPICS'''

c1dcuepics runs the processes labeled "dcuepics40m" and "losepics".  These should start automatically.
c1iscepics runs the process "iscepics40m" which can be started by running ./startupC1 as user 'controls'
-Line 215:
+Line 261:
+-------
<<Anchor(op440m)>>'''op440m'''

Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off.
The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.

After logging in you must restart the following scripts: autolockMCmain, PSLwatch, and FSSSlowServo.  Run those scripts in the background.  Then type "restart_conlogger" at the prompt.
-------
<<Anchor(op340m)>>'''op340m'''

Reboot as usual.  It's headless, so you'll need to ssh in and type 'reboot'.
-Line 221:
+Line 277:
-NAT router - in back of network rack - provides gateway to campus internet [[BR]]
linux2 - controls console [[BR]]
linux3 - controls console [[BR]]
op140m - controls console [[BR]]
op440m - controls console [[BR]]
op540m - controls console[[BR]]
+linux2 - controls console, running Linux kernel 2.6.9-1.667smp [[BR]]
linux3 - controls console, running Linux kernel 2.6.9-1.667 [[BR]]
op140m - controls console, running Solaris 9 [[BR]]
op440m - controls console, running Solaris 9 [[BR]]
op540m - controls console, running Solaris 9 [[BR]]
-Line 256:
+Line 311:
+<<Anchor(hosttable)>> '''Martian Host Table'''

This is a list of the Martian network hosttable on op140m on April 23rd 2007
The NAT router is at 131.215.113.2

 131.215.113.20  linux1
131.215.113.21  linux2
131.215.113.22  linux3
131.215.113.10  c0rga   
131.215.113.211 op140m  op140m.ligo.caltech.edu loghost
131.215.113.201 rana113 rana
131.215.113.202 fb40m fb0
131.215.113.203 br40m
131.215.113.204 dmt140m
131.215.113.205 dmt240m
131.215.113.206
131.215.113.207
131.215.113.208
131.215.113.209
131.215.113.210 hpmartian
131.215.113.211 op140m
131.215.113.212 op240m
131.215.113.213 op340m
131.215.113.214 op440m
131.215.113.215 op540m
131.215.113.221 40mars-221
131.215.113.222 40mars-222
131.215.113.223 40mars-223
131.215.113.224 40mars-224
131.215.113.225 40mars-225
131.215.113.226 40mars-226
131.215.113.227 40mars-227
131.215.113.228 40mars-228
131.215.113.229 40mars-229
131.215.113.230 40mars-230
131.215.113.231 40mars-231
131.215.113.232 40mars-232
131.215.113.233 40mars-233
131.215.113.234 40mars-234
131.215.113.235 40mars-235
131.215.113.236 40mars-236
131.215.113.237 40mars-237
131.215.113.238 40mars-238
131.215.113.239 40mars-239
131.215.113.240 40mars-240
131.215.113.7   cdssol6
131.215.113.51  scipe1  c1pem1
131.215.113.52  scipe2  c1vac1
131.215.113.53  scipe3  c1psl
131.215.113.54  scipe4  c1vac2
131.215.113.55  scipe5  c1susaux
131.215.113.56  scipe6  c1omc
131.215.113.57  scipe7  c1iool0
131.215.113.58  scipe8  c1ass
131.215.113.59  scipe9  c1auxex
131.215.113.60  scipe10 c1auxey
131.215.113.61  scipe11 c1aux
131.215.113.62  scipe12 c1lsc
131.215.113.63    scipe13 c1susvme2
131.215.113.64  scipe14 c1susvme1
131.215.113.65  scipe15 
131.215.113.66  scipe16
131.215.113.67  scipe17 c1iovme
131.215.113.68  scipe18 c1sosvme
131.215.113.69  scipe19 c1losepics
131.215.113.70  scipe20 c1asc
131.215.113.71  scipe21 c0daqctrl
131.215.113.73  scipe23
131.215.113.74  scipe24 c0dcu1
131.215.113.75  scipe25 c1dcuepics
131.215.113.77  scipe27 c1lscbootserver c1iscepics
131.215.113.78  scipe28 c0daqawg
131.215.113.79  scipe29 c1iscey
131.215.113.80  scipe30 c1iscex
131.215.113.81  scipe31 c1iscaux
131.215.113.82  scipe32 c1iscaux2
131.215.113.90  rfm-bypass vmiacc-5595
131.215.113.101 linux101
131.215.113.102 linux102

-------
-Line 262:
+Line 398:
-make sure the NAT router is up and the control consoles can see the web [[BR]]
reset the Marconi RF signal generators [BR]]
+reset the Marconi RF signal generators [[BR]]
-Line 267:
+Line 402:
+  NB: you will probably have to actually power-on scipe27 (c1iscepics) and scipe25 (c1losepics/c1dcuepics). [[BR]]
-Line 268:
+Line 404:
-then the front-end servos: c1iovme, c1sosvme, c1susvme1, c1susvme2, c1iscex, c1iscey, c1lsc, c1asc [[BR]]
+check the RFM switch--if it's not green, reset it [[BR]]
make sure the framebuilder (fb40m) is building frames--i.e., all the MEDM lights are green[[BR]]
then the front-end servos: c1iovme, c1sosvme, c1susvme1, c1susvme2, c1iscex, c1iscey, c1lsc, c1asc, c1omc, c1ass 
  NB: above, re-boot c1susvme1, c1susvme2, c1lsc so they can get a fresh copy of linux from scipe27. [[BR]]
do BURT restores of c1iscepics.snap, c1losepics.snap, c1omcepics.snap, c1assepics.snap--everything else should do saverestore (automatic) [[BR]]
check for stuck EPICS buttons/sliders (just give everything a quick twiddle) [[BR]]
restart the testpoint manager [[BR]]
reset the mechanical shutters after power outages[[BR]]
press the "closed loop" buttons for the input-steering piezojena controllers [[BR]]