|
Size: 7752
Comment:
|
Size: 15807
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| = Computer Restart Procedures = |
|
| Line 5: | Line 7: |
| ||[#coldstart Coldstart procedures] || ||[#c1lsc c1lsc]||[#c1iscex c1iscex]||[#c1iscey c1iscey]||[#c1sosvme c1sosvme]||[#c1susvme1 c1susvme1]||[#c1susvme2 c1susvme2]||[#c1psl c1psl]||[#c1ioo c1ioo]||[#c0dcu1 c0dcu1]||[#c1asc c1asc]||[#c0daqawg c0daqawg]||[#c0daqctrl c0daqctrl]|| |
||[#coldstart Coldstart procedures] ||[[/BURTgooey]] || [#nuclear Nuclear Option] || ||[#c1lsc c1lsc] ||[#c1iscex c1iscex] ||[#c1iscey c1iscey] ||[#c1sosvme c1sosvme] ||[#c1susvme1 c1susvme1] ||[#c1susvme2 c1susvme2] || [#c1asc c1asc] ||[#c1iovme c1iovme] || [#c1omc c1omc] ||[#c1ass c1ass] || || [#nodus nodus] ||[#fb40m fb40m] (Includes DAQ and testpoint manager) ||[#c0daqawg c0daqawg] ||[#c0daqctrl c0daqctrl] || ||[#EPICS EPICS] || [#c1psl c1psl] ||[#c1iool0 c1iool0] ||[#c0dcu1 c0dcu1] ||[#c1iscepics c1iscepics] ||[#c1pem1 c1pem1] || ||[#op440m op440m] ||[#op340m op340m] || || [#Megatron Megatron] || ||[#HardwareReboot Hardware Reboot Procedure for FE computers] || Ethernet network connection diagram as of Oct 7, 2008: attachment:40m_network_10-07-08.pdf [[Martian Host Table]] [#Electronics Here] you can find a map of the computers around the lab. |
| Line 10: | Line 36: |
| . <<Anchor(c1lsc)>> '''c1lsc''' | . <<Anchor(Megatron)>> '''Megatron''' Shutdown the computer normally. (Power button or "shutdown -h now"). Go out to the rack and unplug all 4 power supply cables on the back of the machine. Wait for a bit for the machine to completely stop (30 seconds or so). Plug all the cables back in, and press the power button. ---------- . <<Anchor(c1lsc)>> '''c1lsc''' |
| Line 14: | Line 51: |
| Push both of the RESET buttons on the little RESET screen on the LSC screen. | Push both of the RESET buttons on the little RESET screen on the LSC screen (FIRST GREEN, then RED). |
| Line 18: | Line 55: |
| {{{ssh c1lsc}}} | . {{{ ssh c1lsc}}} |
| Line 22: | Line 60: |
| {{{cd /cvs/cds/caltech/target/c1lsc/}}} {{{./startup.cmd}}} |
. {{{ cd /cvs/cds/caltech/target/c1lsc/ ./startup.cmd}}} |
| Line 27: | Line 65: |
| . <<Anchor(c1iscex)>>'''c1iscex''' | . <<Anchor(c1iscex)>>'''c1iscex''' |
| Line 35: | Line 73: |
| {{{telnet c1iscex}}} | . {{{ telnet c1iscex}}} |
| Line 39: | Line 78: |
| {{{< /cvs/cds/caltech/target/c1iscex/startup.cmd}}} | . {{{ < /cvs/cds/caltech/target/c1iscex/startup.cmd}}} |
| Line 43: | Line 83: |
| {{{quit}}} | . {{{ quit}}} |
| Line 48: | Line 89: |
| . <<Anchor(c1iscey)>> '''c1iscey''' | . <<Anchor(c1iscey)>> '''c1iscey''' |
| Line 56: | Line 97: |
| {{{telnet c1iscey}}} | . {{{ telnet c1iscey}}} |
| Line 60: | Line 102: |
| {{{< /cvs/cds/caltech/target/c1iscey/startup.cmd}}} | . {{{ < /cvs/cds/caltech/target/c1iscey/startup.cmd}}} |
| Line 64: | Line 107: |
| Type ctrl-] to break. Then "quit". |
|
| Line 65: | Line 110: |
| . <<Anchor(c1sosvme)>> '''c1sosvme''' | . <<Anchor(c1sosvme)>> '''c1sosvme''' |
| Line 73: | Line 118: |
| {{{telnet c1sosvme}}} | . {{{ telnet c1sosvme}}} |
| Line 77: | Line 123: |
| {{{< /cvs/cds/caltech/target/c1sosvme/startup.cmd}}} | . {{{ < /cvs/cds/caltech/target/c1sosvme/startup.cmd}}} |
| Line 81: | Line 128: |
| Type ctrl-] to break. Then "quit". |
|
| Line 82: | Line 131: |
| . <<Anchor(c1susvme1)>> '''c1susvme1''' | . <<Anchor(c1susvme1)>> '''c1susvme1''' |
| Line 86: | Line 135: |
| You have two options for turning it off/on before restarting the code: 1) Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen. OR 2) You probably have to connect a keyboard and a monitor to the machine. Then power cycle the crate or push the reset button on the machine with a pointy thing (like the tip of a pen). If the display is still blank after a while, press Ctrl+x. This should start the computer. OR 3) We found that connecting certain types of monitors (such as the Sun Systems CRT) prevents boot up, use a LCD style screen. Don't give up easily. Some combination of keyboards and/or monitor should work. When you get a PXE bad media error, test cable message, check that the Ethernet cable is well connected and that the status lights for the ethernet are lit. If not, try pushing it in better, or switching to the other port. After doing (1) or (2) or (3) From a control room terminal type: . {{{ ssh c1susvme1}}} log in with the ''controls'' password. Become superuser by running the ''su'' command. Go to the ''/cvs/cds/caltech/target/c1susvme1'' directory. . {{{ ./startup.cmd}}} Turn the watchdogs back on, once the computer is up again. --------- . <<Anchor(c1susvme2)>> '''c1susvme2''' Shut off the watchdogs for SRM, MC1, MC2, MC3 optics via epics. Disable the MC autolocker (by clicking the DISABLE button on the C1IOO_LockMC.adl screen). |
|
| Line 90: | Line 176: |
| {{{ssh c1susvme1}}} | . {{{ ssh c1susvme2}}} |
| Line 96: | Line 183: |
| Go to the ''/cvs/cds/caltech/target/c1susvme1'' directory. {{{startup.cmd}}} |
Go to the ''/cvs/cds/caltech/target/c1susvme2'' directory. . {{{ ./startup.cmd}}} |
| Line 102: | Line 190: |
| --------- . <<Anchor(c1susvme2)>> '''c1susvme2''' Shut off the watchdogs for SRM, MC1, MC2, MC3 optics via epics. Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen. From a control room terminal type: {{{telnet c1susvme2}}} Login as controls At the prompt, type: {{{su}}} Type in the superuser password. {{{cd c1susvme2}}} {{{./startup.cmd}}} Turn the watchdogs back on, once the computer is up again. Reenable the Mode Cleaner autolocker. |
Re-Enable the MC autolocker. |
| Line 128: | Line 193: |
| . <<Anchor(c1psl)>>'''c1psl''' | . <<Anchor(c1psl)>>'''c1psl''' |
| Line 132: | Line 197: |
| {{{telnet c1psl}}} {{{reboot}}} |
. {{{ telnet c1psl reboot}}} then burt restore this guy. |
| Line 141: | Line 208: |
| . <<Anchor(c1ioo)>>'''c1ioo''' | . <<Anchor(c1pem1)>>'''c1pem1''' Sometimes you can just do this guy by doing: . {{{ telnet c1pem1 reboot}}} then burt restore this guy. If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link. ------- . <<Anchor(c1iool0)>>'''c1iool0''' |
| Line 145: | Line 225: |
| {{{telnet c1iovme}}} | . {{{ telnet c1iool0}}} Try {{{CTRL+x}}} It should reboot c1iool0 This computer automatically executes startup.cmd. So there is no need to run it manually. If for some reason it does not load the startup script automatically, try this: |
| Line 149: | Line 238: |
| {{{< /cvs/cds/caltech/target/c1iovme/startup.cmd}}} | . {{{ < /cvs/cds/caltech/target/c1iool0/startup.cmd}}} |
| Line 153: | Line 243: |
| {{{quit}}} ------- . <<Anchor(c0dcu1)>>'''c0dcu1''' Restart procedure needed. (Temporary) |
. {{{ quit}}} ------- . <<Anchor(c1iovme)>>'''c1iovme''' At the command prompt, type: . {{{ telnet c1iovme}}} At the telnet prompt, type . {{{ < /cvs/cds/caltech/target/c1iovme/startup.cmd}}} Type CTRL-], followed by . {{{ quit}}} ------- . <<Anchor(c0dcu1)>>'''c0dcu1''' At first you should try . {{{ telnet c0dcu1}}} If you can login to c0dcu1, it means it doesn't run correctly. After the login, try to startup it by typing . {{{ /cvs/cds/caltech/target/c1duc1/startup.cmd}}} It is going to give you some error messages and this might be a hint to solve problems. To recycle the power you can run . {{{ vmeBusReset}}} Another solutions are: |
| Line 166: | Line 298: |
| ------- . <<Anchor(c1asc)>>'''c1asc''' |
OR, first try pressing the "reset" button on c0dcu1 and waiting ~3 minutes. ------- . <<Anchor(c1asc)>>'''c1asc''' First make sure that c1LSC is running and its light in the C0DAQ_DETAIL MEDM screen is green. |
| Line 173: | Line 309: |
| {{{telnet c1asc}}} | . {{{ telnet c1asc}}} |
| Line 177: | Line 314: |
| {{{< /cvs/cds/caltech/target/c1asc/startup.cmd}}} | . {{{ < /cvs/cds/caltech/target/c1asc/startup.cmd}}} |
| Line 181: | Line 319: |
| {{{quit}}} ------- <<Anchor(c0daqawg)>>'''c0daqawg''' 1) First try this: > telnet c0daqawg if you get a prompt try: > vmeBusReset The AWG light on the RFM screen ought to go red. IF it does, wait ~5 minutes for it to come back. 2) If the above doesn't work, try: Turn key on DAQ AWG crate to shut off power; turn back after ~10 seconds |
.{{{ quit}}} ------- . <<Anchor(c0daqawg)>>'''c0daqawg''' 1) Push the recessed RESET button on the c0daqawg computer 2) If the above doesn't work, try: . Turn key on DAQ AWG crate to shut off power; turn back after ~10 seconds 3) An alternative way: . {{{ telnet c0daqawg }}} . if you get a prompt try: . {{{ vmeBusReset }}} . The AWG light on the RFM screen ought to go red. IF it does, wait ~5 minutes for it to come back. . That's it ------- . <<Anchor(c0daqctrl)>>'''c0daqctrl''' 1) No clue. Power cycle? SSH? Stab it with a spoon??? {{{ Try pushing the recessed reset button. This may be the last thing that needs to be done. }}} ------- . <<Anchor(c1omc)>>'''c1omc''' -1) Make sure the c1omc is powered on--it doesn't power up automatically following a power outage. First find the OMC, then press its power button. 0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details. 1) while logged in as controls, run the script '''startupC1''' in the '''c1omcepics''' target directory. 2) Log in as root. Start the real-time code by running the '''omcfe.rtl''' script in the '''c1omc''' . target directory. 2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it. 3) Also, as root, run the command '''/opt/gds/awgtpman -2''' in the background. Note that c1omc has two ethernet ports. Use the bottom one. If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a. A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then, . i) log in to c1omc. become root. . ii) kill epics with a 'pkill omcepics' . iii) kill the test-point manager with a 'pkill awgtpman' . iv) remove the front end kernel module with '/sbin/rmmod omcfe' . v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod' ------- . <<Anchor(c1ass)>>'''c1ass''' * Currently the procedure for restarting C1ASS seems to be the same as for C1OMC above except that the ass test point manager doesn't need the "-2" flag. ------- . <<Anchor(fb40m)>>'''fb40m''' This is not really a reboot procedure, as I don't know it. * To restart the testpoint manager: just ssh into fb40m and type "pkill tpman" . {{{ ssh fb40m > pkill tpman}}} * Then restart the 'daqd' process:'telnet fb40m 8087', type "shutdown" at the prompt. The framebuilder will restart itself in ~20s. . {{{ telnet fb40m 8087 > shutdown}}} * If everything gets hosed, and the RAID is angrily flashing red lights, power off the framebuilder (by logging in as SU and then typing "poweroff"), power-cycle the RAID, then turn the framebuilder on. If there is disk corruption, you can use "fsck -y" to automatically answer "yes" to all of "fsck"'s questions, so it can run unattended. * ''If you restart the framebuilder, you need to restart the backup script.'' Restart the nightly BACKUP of /cvs/cds and our trend-frames by following the instructions in [[Restarting the backup script]] Summary of how to restart the backup script: The steps are as follows (copy everything after each of the numbered steps verbatim): |
| Line 199: | Line 410: |
| That's it ------- <<Anchor(c0daqctrl)>>'''c0daqctrl''' 1) No clue. Power cycle? SSH? Stab it with a spoon??? ------- C1:IOO-MC_F channels may not come back unless the IOO rack is keyed; follow C1IOVME procedure after that ------- <<Anchor(List)>> '''List of all lab controls computers''' In control room: linux1 - in network rack - NFS server for /cvs/cds (keeps two copies, raid1) [[BR]] NAT router - in back of network rack - provides gateway to campus internet [[BR]] linux2 - controls console [[BR]] linux3 - controls console [[BR]] op140m - controls console [[BR]] op440m - controls console [[BR]] op540m - controls console[[BR]] [[BR]] From /cvs/cds/caltech/target : [[BR]] c0daqawg - front-end VME cpu running linux (?) in 1Y6 [[BR]] c0daqctrl - front-end VME cpu running linux in 1Y7 [[BR]] c0dcu1 - front-end VME cpu running VxWorks (?) in 1Y7 [[BR]] c1asc - front-end VME cpu running linux in 1X5 [[BR]] c1aux - EPICS VME cpu running VxWorks in 1X1 [[BR]] c1auxex - EPICS VME cpu running VxWorks in 1X9 [[BR]] c1auxey - EPICS VME cpu running VxWorks in 1Y7 [[BR]] c1dcuepics - EPICS PC cpu running linux in 1Y6 [[BR]] c1iool0 - EPICS VME cpu running VxWorks in 1Y2 [[BR]] c1iovme - front-end VME cpu running linux in 1Y2 [[BR]] c1iscaux - EPICS VME cpu running VxWorks(?) in 1X5 [[BR]] c1iscaux2 - EPICS VME cpu running VxWorks(?) in 1X5 [[BR]] c1iscepics - EPICS PC cpu running linux in 1X6 [[BR]] c1iscex - front-end VME cpu running linux in 1X9 [[BR]] c1iscey - front-end VME cpu running linux in 1Y7 [[BR]] c1losepics - EPICS PC cpu running linux in 1Y6 [[BR]] c1lsc - front-end VME cpu running linux in 1X5 [[BR]] c1pem1 - EPICS VME cpu running VxWorks(?) in 1Y? [[BR]] c1psl - EPICS VME cpu running VxWorks(?) in 1Y1 [[BR]] c1sosvme - front-end VME cpu running linux in 1Y4 [[BR]] c1susaux - EPICS VME cpu running VxWorks(?) in 1Y5 [[BR]] c1susvme1 - front-end VME cpu running linux in 1Y4 [[BR]] c1susvme2 - front-end VME cpu running linux in 1Y4 [[BR]] c1vac1 - EPICS VME cpu running VxWorks in 1Y9 [[BR]] c1vac2 - EPICS VME cpu running VxWorks in 1Y9 [[BR]] ------- <<Anchor(coldstart)>> '''coldstart procedures''' check the vacuum controls in rack 1Y9 (on UPS) [[BR]] check the laser chiller, laser power supply, ion pump HV [[BR]] make sure linux1 is up and serving /cvs/cds (on UPS) [[BR]] make sure rana113 (gate40m) is up and has /cvs/cds mounted (on UPS) [[BR]] make sure the NAT router is up and the control consoles can see the web [[BR]] reset the Marconi RF signal generators [BR]] The controls computers should be up (on UPS) [[BR]] Bring up the embedded computers, starting with EPICS: [[BR]] c1vac1 and c1vac2 (on UPS), c1psl, c1iool0, c1iscaux, c1iscaux2, c1iscepics, c1dcuepics, c1susaux, c1aux, c1auxex, c1auxey [[BR]] then the DAQ: c0daqctrl, c0daqawg, c0dcu1 [[BR]] then the front-end servos: c1iovme, c1sosvme, c1susvme1, c1susvme2, c1iscex, c1iscey, c1lsc, c1asc [[BR]] |
1. ssh fb40m 2. cd /cvs/cds/caltech/scripts/backup 3. ssh-agent > .agent 4. awk '/setenv/' .agent > .agent.edit 5. mv .agent.edit .agent 6. source .agent 7. ssh-add ~/.ssh/id_rsa (This one will not ask for a passphrase) 8. ssh-add ~/.ssh/backup2PB ( This one requires a passphrase. Read the README: ..../scripts/backup/000README.txt ) 9. ssh-add -l (This verifies that both the id_rsa and backup2PB are there. If it also picks up the wrong one (id_dsa), remove it by typing "ssh-add -d" ) 10. ssh 40m@ldas-cit.ligo.caltech.edu /bin/ls /archive/frames/trend/minute-trend/40m (This should do a test ssh, and list the archived frame folders. You can open the last one, and then look at the gps time of the last .gwf file, and it should be sometime in the middle of the previous night.) * If the frame builder light is red on the reflected memory MEDM screens but the machine seems otherwise to be working, try pressing the reset button on the reflected memory bypass switch. * If the frame builder still isn't working, make sure that '''c0daqctrl''' is running properly. ------- . <<Anchor(EPICS)>>'''EPICS''' '''c1dcuepics''' runs the processes labeled "dcuepics40m" and "losepics" and "iscepics40m". These should start automatically.[[BR]] Note that to burt restore c1dcuepics, one needs to fix the autoburt .snap file. Because of the newline at the end of the C0:TIM-PACIFIC_STRING channel, it moves the quote to the next line, breaking the autoburt restore. For the moment, you need to manually move the lone " to the end of the previous line. <<Anchor(c1iscepics)>>Need to login and restart ntpd (remove this entry after setting up the cron tab correctly): {{{linux2:/etc>ssh scipe25}}}{{{controls@scipe25's password:}}}{{{[controls@c1dcuepics ~]$ sudo /usr/sbin/ntpd -c /etc/ntp.conf}}}''' ''' After you restart c1dcuepics, you must burt restore c1iscepics and c1losepics! (On nodus, in the command line, type "burt" to open the burt restore gui) ------- . C1:IOO-MC_F channels may not come back unless the IOO rack is keyed; follow C1IOVME procedure after that ------- . <<Anchor(op440m)>>'''op440m''' Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls. ------- . <<Anchor(op340m)>>'''op340m''' Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'. Restart the following scripts: * [[conlog]] ------- . <<Anchor(nodus)>>'''nodus''' Nodus is a Solaris box in the rack in the office. Here are some of the things that it runs that you will want to restart: * [[EPICS gateway]] * [[ndsproxy]] * [[ApacheOnNodus|Apache (Required for SVN remote access)]] * [[elog]] ---------- . <<Anchor(HardwareReboot)>> '''Hardware Reboot Procedure for FE computers''' Starting from Y-end, look for crates with RFM network connection (orange cables). Turn off the power by turning the key and wait for 10sec. Then turn on the power again. You should hear a beep after a while. Go thorough every crate which has orange cables connected. Then initialize each FE computer following the procedures described above. Check the status of the computers by C0DAQ_RFMNETWORK.adl. Run the slider twiddle script to unstick any stuck sliders: /cvs/cds/caltech/scripts/Admin/slider_twiddle ---------- . <<Anchor(nuclear)>> '''Nuclear Option''' This is the most involved option, and if it doesn't work, you're in trouble. When everything on the RFM_NETWORK screen goes red, this option usually brings it back. . Log into fb40m and power it off (sudo poweroff). . Log into c1dcuepics and power it off (sudo poweroff). . Key off the crates in 1Y6 and 1Y7, and push the RESET buttons on BOTH RFM BYPASS SWITCHES. . Power on the framebuilder. Wait ~10 minutes for it to boot (solaris). . Power on the crate in 1Y7 (C0DAQCTRL). Wait one minute. . Power on the crate in 1Y6 (C0daqawg). . Power on c1dcuepics. Wait 3 minutes for it to boot. . BURT restore c1dcuepics processes -- these are currently c1losepics.snap and c1iscepics.snap Now you can follow the procedure in [#HardwareReboot Hardware Reboot] to bring up the front-end controls machines. You can also try the software (login from the control room) method if you feeling lazy and lucky. NB: The last time this procedure was exercised, the last two steps (involving c1dcuepics) were done FIRST, and it worked. So that's an option too. |
Computer Restart Procedures
Here is where we should keep information on how to restart the computers that periodically need restarting.
[#List List of all lab computers] |
[#coldstart Coldstart procedures] |
[#nuclear Nuclear Option] |
[#c1lsc c1lsc] |
[#c1iscex c1iscex] |
[#c1iscey c1iscey] |
[#c1sosvme c1sosvme] |
[#c1susvme1 c1susvme1] |
[#c1susvme2 c1susvme2] |
[#c1asc c1asc] |
[#c1iovme c1iovme] |
[#c1omc c1omc] |
[#c1ass c1ass] |
[#nodus nodus] |
[#fb40m fb40m] (Includes DAQ and testpoint manager) |
[#c0daqawg c0daqawg] |
[#c0daqctrl c0daqctrl] |
[#EPICS EPICS] |
[#c1psl c1psl] |
[#c1iool0 c1iool0] |
[#c0dcu1 c0dcu1] |
[#c1iscepics c1iscepics] |
[#c1pem1 c1pem1] |
[#op440m op440m] |
[#op340m op340m] |
[#Megatron Megatron] |
[#HardwareReboot Hardware Reboot Procedure for FE computers] |
Ethernet network connection diagram as of Oct 7, 2008: attachment:40m_network_10-07-08.pdf
[#Electronics Here] you can find a map of the computers around the lab.
Megatron
Shutdown the computer normally. (Power button or "shutdown -h now").
Go out to the rack and unplug all 4 power supply cables on the back of the machine.
Wait for a bit for the machine to completely stop (30 seconds or so).
Plug all the cables back in, and press the power button.
c1lsc
Turn OFF all the SUS buttons on the right hand side of the LSC screen (C1LSC.adl).
Push both of the RESET buttons on the little RESET screen on the LSC screen (FIRST GREEN, then RED).
From a control room terminal type:
ssh c1lsc
log in as controls. Then do 'su' to become root. Then
cd /cvs/cds/caltech/target/c1lsc/ ./startup.cmd
c1iscex
Shut off the watchdogs for ETMX via epics.
Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.
From a control room terminal type:
telnet c1iscex
Copy and paste this into the command line (including the "<"):
< /cvs/cds/caltech/target/c1iscex/startup.cmd
When the line "starting main loop (printing from sus_start)." appears, hit CTRL-] and type:
quit
Turn the watchdogs back on, once the computer is up again.
c1iscey
Shut off the watchdogs for ETMY via epics.
Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.
From a control room terminal type:
telnet c1iscey
Copy and paste this into the command line (including the "<"):
< /cvs/cds/caltech/target/c1iscey/startup.cmd
Turn the watchdogs back on, once the computer is up again.
Type ctrl-] to break. Then "quit".
c1sosvme
Shut off the watchdogs for all optics via epics.
Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.
From a control room terminal type:
telnet c1sosvme
Copy and paste this into the command line (including the "<"):
< /cvs/cds/caltech/target/c1sosvme/startup.cmd
You will probably need to restart c1susvme1 and c1susvme2 now.
Type ctrl-] to break. Then "quit".
c1susvme1
Shut off the watchdogs for ITMX, ITMY, BS, PRM via epics.
You have two options for turning it off/on before restarting the code:
1) Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.
OR
2) You probably have to connect a keyboard and a monitor to the machine. Then power cycle the crate or push the reset button on the machine with a pointy thing (like the tip of a pen). If the display is still blank after a while, press Ctrl+x. This should start the computer.
OR
3) We found that connecting certain types of monitors (such as the Sun Systems CRT) prevents boot up, use a LCD style screen. Don't give up easily. Some combination of keyboards and/or monitor should work. When you get a PXE bad media error, test cable message, check that the Ethernet cable is well connected and that the status lights for the ethernet are lit. If not, try pushing it in better, or switching to the other port.
After doing (1) or (2) or (3)
From a control room terminal type:
ssh c1susvme1
log in with the controls password.
Become superuser by running the su command.
Go to the /cvs/cds/caltech/target/c1susvme1 directory.
./startup.cmd
Turn the watchdogs back on, once the computer is up again.
c1susvme2
Shut off the watchdogs for SRM, MC1, MC2, MC3 optics via epics.
Disable the MC autolocker (by clicking the DISABLE button on the C1IOO_LockMC.adl screen).
Push the corresponding RESET button on the "C0DAQ_RFMNETWORK.adl" MEDM screen.
From a control room terminal type:
ssh c1susvme2
log in with the controls password.
Become superuser by running the su command.
Go to the /cvs/cds/caltech/target/c1susvme2 directory.
./startup.cmd
Turn the watchdogs back on, once the computer is up again.
Re-Enable the MC autolocker.
c1psl
Sometimes you can just do this guy by doing:
telnet c1psl reboot
then burt restore this guy.
But often, this just makes it upset and the screens go white but it never comes back. When that happens go out to the rack (the one next to the one with the MC servo) and turn off the crate (on the bottom) which has the c1psl processor. After ~3.14 seconds, turn it back on. c1psl ought to come back now.
If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.
c1pem1
Sometimes you can just do this guy by doing:
telnet c1pem1 reboot
then burt restore this guy.
If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.
c1iool0
At the command prompt, type:
telnet c1iool0
Try CTRL+x
It should reboot c1iool0
This computer automatically executes startup.cmd. So there is no need to run it manually.
If for some reason it does not load the startup script automatically, try this:
At the telnet prompt, type
< /cvs/cds/caltech/target/c1iool0/startup.cmd
Then, after the main loop is started, type CTRL-], followed by
quit
c1iovme
At the command prompt, type:
telnet c1iovme
At the telnet prompt, type
< /cvs/cds/caltech/target/c1iovme/startup.cmd
Type CTRL-], followed by
quit
c0dcu1
At first you should try
telnet c0dcu1
If you can login to c0dcu1, it means it doesn't run correctly. After the login, try to startup it by typing
/cvs/cds/caltech/target/c1duc1/startup.cmd
It is going to give you some error messages and this might be a hint to solve problems.
To recycle the power you can run
vmeBusReset
Another solutions are:
Turn key on DAQ CTRL crate to turn off ALL framebuilders; turn back after ~10 seconds
Follow restart procedures for all other computers
OR, first try pressing the "reset" button on c0dcu1 and waiting ~3 minutes.
c1asc
First make sure that c1LSC is running and its light in the C0DAQ_DETAIL MEDM screen is green.
Turn key on ASC crate to shut off power; turn back after ~10 seconds
Type
telnet c1asc
Then type
< /cvs/cds/caltech/target/c1asc/startup.cmd
After signal light on DAQ_DETAIL screen turns green, type CTRL-], followed by
quit
c0daqawg 1) Push the recessed RESET button on the c0daqawg computer 2) If the above doesn't work, try:
- Turn key on DAQ AWG crate to shut off power; turn back after ~10 seconds 3) An alternative way:
telnet c0daqawg
- if you get a prompt try:
vmeBusReset
- The AWG light on the RFM screen ought to go red. IF it does, wait ~5 minutes for it to come back.
- That's it
c0daqctrl 1) No clue. Power cycle? SSH? Stab it with a spoon???
Try pushing the recessed reset button. This may be the last thing that needs to be done.
c1omc
-1) Make sure the c1omc is powered on--it doesn't power up automatically following a power outage. First find the OMC, then press its power button.
0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details.
1) while logged in as controls, run the script startupC1 in the c1omcepics target directory.
2) Log in as root. Start the real-time code by running the omcfe.rtl script in the c1omc
- target directory.
2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it.
3) Also, as root, run the command /opt/gds/awgtpman -2 in the background.
Note that c1omc has two ethernet ports. Use the bottom one.
If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a.
A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then,
- i) log in to c1omc. become root.
- ii) kill epics with a 'pkill omcepics'
- iii) kill the test-point manager with a 'pkill awgtpman'
- iv) remove the front end kernel module with '/sbin/rmmod omcfe'
- v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod'
c1ass
- Currently the procedure for restarting C1ASS seems to be the same as for C1OMC above except that the ass test point manager doesn't need the "-2" flag.
fb40m
This is not really a reboot procedure, as I don't know it.
- To restart the testpoint manager: just ssh into fb40m and type "pkill tpman"
ssh fb40m > pkill tpman
- Then restart the 'daqd' process:'telnet fb40m 8087', type "shutdown" at the prompt. The framebuilder will restart itself in ~20s.
telnet fb40m 8087 > shutdown
- If everything gets hosed, and the RAID is angrily flashing red lights, power off the framebuilder (by logging in as SU and then typing "poweroff"), power-cycle the RAID, then turn the framebuilder on. If there is disk corruption, you can use "fsck -y" to automatically answer "yes" to all of "fsck"'s questions, so it can run unattended.
If you restart the framebuilder, you need to restart the backup script.
Restart the nightly BACKUP of /cvs/cds and our trend-frames by following the instructions in Restarting the backup script Summary of how to restart the backup script: The steps are as follows (copy everything after each of the numbered steps verbatim):
- ssh fb40m
- cd /cvs/cds/caltech/scripts/backup
ssh-agent > .agent
awk '/setenv/' .agent > .agent.edit
- mv .agent.edit .agent
- source .agent
- ssh-add ~/.ssh/id_rsa (This one will not ask for a passphrase)
- ssh-add ~/.ssh/backup2PB ( This one requires a passphrase. Read the README: ..../scripts/backup/000README.txt )
- ssh-add -l (This verifies that both the id_rsa and backup2PB are there. If it also picks up the wrong one (id_dsa), remove it by typing "ssh-add -d" )
ssh 40m@ldas-cit.ligo.caltech.edu /bin/ls /archive/frames/trend/minute-trend/40m (This should do a test ssh, and list the archived frame folders. You can open the last one, and then look at the gps time of the last .gwf file, and it should be sometime in the middle of the previous night.)
- If the frame builder light is red on the reflected memory MEDM screens but the machine seems otherwise to be working, try pressing the reset button on the reflected memory bypass switch.
If the frame builder still isn't working, make sure that c0daqctrl is running properly.
EPICS
c1dcuepics runs the processes labeled "dcuepics40m" and "losepics" and "iscepics40m". These should start automatically.BR
Note that to burt restore c1dcuepics, one needs to fix the autoburt .snap file. Because of the newline at the end of the C0:TIM-PACIFIC_STRING channel, it moves the quote to the next line, breaking the autoburt restore. For the moment, you need to manually move the lone " to the end of the previous line.
Need to login and restart ntpd (remove this entry after setting up the cron tab correctly): linux2:/etc>ssh scipe25controls@scipe25's password:[controls@c1dcuepics ~]$ sudo /usr/sbin/ntpd -c /etc/ntp.conf
After you restart c1dcuepics, you must burt restore c1iscepics and c1losepics! (On nodus, in the command line, type "burt" to open the burt restore gui)
- C1:IOO-MC_F channels may not come back unless the IOO rack is keyed; follow C1IOVME procedure after that
op440m
Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.
op340m
Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'.
Restart the following scripts:
nodus
Nodus is a Solaris box in the rack in the office. Here are some of the things that it runs that you will want to restart:
Hardware Reboot Procedure for FE computers
Starting from Y-end, look for crates with RFM network connection (orange cables). Turn off the power by turning the key and wait for 10sec. Then turn on the power again. You should hear a beep after a while. Go thorough every crate which has orange cables connected. Then initialize each FE computer following the procedures described above. Check the status of the computers by C0DAQ_RFMNETWORK.adl.
Run the slider twiddle script to unstick any stuck sliders: /cvs/cds/caltech/scripts/Admin/slider_twiddle
Nuclear Option
This is the most involved option, and if it doesn't work, you're in trouble. When everything on the RFM_NETWORK screen goes red, this option usually brings it back.
- Log into fb40m and power it off (sudo poweroff).
- Log into c1dcuepics and power it off (sudo poweroff).
- Key off the crates in 1Y6 and 1Y7, and push the RESET buttons on BOTH RFM BYPASS SWITCHES.
- Power on the framebuilder. Wait ~10 minutes for it to boot (solaris).
- Power on the crate in 1Y7 (C0DAQCTRL). Wait one minute.
- Power on the crate in 1Y6 (C0daqawg).
- Power on c1dcuepics. Wait 3 minutes for it to boot.
- BURT restore c1dcuepics processes -- these are currently c1losepics.snap and c1iscepics.snap
Now you can follow the procedure in [#HardwareReboot Hardware Reboot] to bring up the front-end controls machines. You can also try the software (login from the control room) method if you feeling lazy and lucky.
NB: The last time this procedure was exercised, the last two steps (involving c1dcuepics) were done FIRST, and it worked. So that's an option too.
