|
Size: 9663
Comment:
|
Size: 11557
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 9: | Line 9: |
| ||[#c1sus c1sus] ||[#c1ioo c1ioo] || | || [#FE General FE instructions] || ||[#c1sus c1sus] ||[#c1ioo c1ioo] || [#c1lsc c1lsc] || [#c1iscex c1iscex] || [#c1iscey c1iscey] || |
| Line 38: | Line 40: |
| . 1. telnet fb 8088 | . 1. telnet fb 8087 |
| Line 53: | Line 55: |
| . 1. Fix the underlying problem. Try looking at /opt/rtcds/caltech/c1/target/fb/logs/daqd.log.XXXX for error messages. | . 1. Fix the underlying problem. Try looking at /opt/rtcds/caltech/c1/target/fb/logs/daqd.log.XXXX for error messages. Make sure the master and daqdrc files are correct in the fb directory. |
| Line 62: | Line 64: |
| Line 67: | Line 68: |
| . "sudo /opt/rtcds/caltech/c1/scripts/startc1SYSNAME" where SYSNAME is something like sus or ioo or x02 . The above starts the IOCs, awgtpman, and loads the front end module. |
. The front end module needs to be loaded and running. . The awgtpman process since it handles the arbitrary waveform generator and test points for the front end. . The IOC needs to be running since it hands all the EPICs communication |
| Line 79: | Line 82: |
| . To check if mx_streams are running, you can "ps -ef | grep mx_stream" - there should be 1 per model | . To check if mx_stream is running, you can "ps -ef | grep mx_stream" - there should be only 1 that has all the models as command line inputs |
| Line 85: | Line 88: |
| . 0. If necessary, stop everything (fb, front ends, mx_streams, etc) . 1. Start fb codes (daqd, nds, dhcpd) . 2. Start the front ends. . 3. Start IOP on front ends (startc1x##) . 4. Start FE models (startc1SYSNAME) . 5. Start mx_streams (/etc/restart_streams) . If the DAQ is in a bad state, try to start fresh in this order. It seems to usually work. . The frame builder for an unknown reason is flaky, and generally I find I have to restart it 2 or 3 times before it doesn't die within the first 60 seconds or so. . If you get past that first minute or so, it tends to be stable from then on ''Restarting the Nightly Backup of Frames'' . The following instructions are valid for the old fb40m Frame Builder. They're here for reference, until the backup scripts are fixed, at which time new instructions will be posted. ---JD 18Oct2010 Restart the nightly BACKUP of /cvs/cds and our trend-frames by following the instructions in Restarting the backup script Summary of how to restart the backup script: The steps are as follows (copy everything after each of the numbered steps verbatim): 1. ssh fb40m 2. cd /cvs/cds/caltech/scripts/backup 3. ssh-agent > .agent 4. awk '/setenv/' .agent > .agent.edit 5. mv .agent.edit .agent 6. source .agent 7. ssh-add ~/.ssh/id_rsa (This one will not ask for a passphrase) 8. ssh-add ~/.ssh/backup2PB ( This one requires a passphrase. Read the README: ..../scripts/backup/000README.txt ) 9. ssh-add -l (This verifies that both the id_rsa and backup2PB are there. If it also picks up the wrong one (id_dsa), remove it by typing "ssh-add -d" ) 10. ssh 40m@ldas-cit.ligo.caltech.edu /bin/ls /archive/frames/trend/minute-trend/40m (This should do a test ssh, and list the archived frame folders. You can open the last one, and then look at the gps time of the last .gwf file, and it should be sometime in the middle of the previous night.) |
1.#0 If necessary, stop everything (fb, front ends, mx_streams, etc) 1. Start (or restart) fb codes (daqd, nds, dhcpd) 1. Start the front ends. They should automatically start all the necessary processes on boot up. a. Be patient, it can take over a minute for all the modules to finally load. You can type "dmesg" to see how far its gotten. ---------- . <<Anchor(FE)>> '''General FE Instructions''' . ''Trouble Shooting'' 1.#1 The first thing to try is a reboot. All the front ends should automatically start their FE processes on a reboot. 1. Also see the [#fb fb/DAQ] section. 1. The following assume you are running a terminal on the FE computer having problems, i.e. ssh -X computer_name 1. Try "dmesg | grep c1SYS", where c1SYS is the front end model name, such as c1lsc, c1x02, c1us, c1rfm, c1scy, and so on. a. Look for things like ADC timeouts, not finding ADC/DAC or BO cards or other error messages. 1. If the EPICS process is running (you see non-white channels in medm screens, but nothing is responding or changing), try restarting the FE itself a. Be aware that BURT Restore needs to be set to 1 on the corresponding GDS_TP screen. These can be accessed from the sitemap with the GDS FE and GDS IOP buttons. a. The IOP (usually named c1xYY where YY is a number like 01 or 02) needs to be running for any other FE processes to run properly. a. Check to see if the FE module is loaded. Type "lsmod" and look for c1SYSfe, where SYS is the 3 letter name of the model. a. If its running but not responding, kill it with "sudo rmmod c1SYSfe", again replacing SYS with the 3 letter name of the model. a. To restart first go to the directory where the module lives. Type "target", then "cd c1SYS/bin/". Then type "sudo insmod c1SYSfe.ko", with the correct replacement for SYS. 1. To restart just one FE including its IOC and awgtpman processes, go to the scripts directory (type "scripts") then run the startc1SYS script, where SYS is the FE model you want to start. a. Do not use sudo to run the script. If you get an error indicating a log file cannot be opened due to permissions, go to the log file and run "sudo chown controls:controls LOGFILE", where LOGFILE is the file name. |
| Line 128: | Line 126: |
| On reboot, these models should automatically start up. See also the [#fb fb/DAQ] section. | On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections. It is connected via GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1scx, and c1scy computers. |
| Line 142: | Line 142: |
| Line 144: | Line 145: |
| This machine runs the c1x02, c1sus, c1mcs, c1rms FE models. It controls the BS, ITMX, ITMY, PRM,SRM,MC1,MC2, and MC3 optics. On reboot, these models should automatically start up. See also the [#fb fb/DAQ] section. |
This machine runs the c1x02, c1sus, c1mcs, c1rfm, c1pem FE models. It controls the BS, ITMX, ITMY, PRM,SRM,MC1,MC2, and MC3 optics. On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections. It is connected by a GE Fanuc VMIC 5565 RFM card to the c1scx, c1scy, and c1ioo machines. It is connected by a Dolphin PCIE reflected memory card to the c1lsc machine. ---------- . <<Anchor(c1lsc)>> '''c1lsc''' This machine runs the c1x04 and c1lsc FE models. On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections. It is connected by a Dolphin PCIE reflected memory card to the c1sus machine. ---------- . <<Anchor(c1iscex)>> '''c1iscex''' This machine runs the c1x01 and c1scx FE models. On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections. It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers. ---------- . <<Anchor(c1iscey)>> '''c1iscey''' This machine runs the c1x05 and c1scy FE models. On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections. It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers. |
Computer Restart Procedures
Here is where we should keep information on how to restart the computers that periodically need restarting.
[#List List of all lab computers] |
[#links Useful links] |
[#FE General FE instructions] |
[#c1sus c1sus] |
[#c1ioo c1ioo] |
[#c1lsc c1lsc] |
[#c1iscex c1iscex] |
[#c1iscey c1iscey] |
[#c1omc c1omc] |
[#c1ass c1ass] |
[#nodus nodus] |
[#fb fb] (Includes DAQ) |
[#c1psl c1psl] |
[#c1iool0 c1iool0] |
[#c1pem1 c1pem1] |
[#op440m op440m] |
[#op340m op340m] |
Out of Date Ethernet network connection diagram as of Oct 7, 2008: attachment:40m_network_10-07-08.pdf
[#Electronics Here] you can find a map of the computers around the lab.
Useful links
Which models run on which machines?
fb and DAQ issues
To restart the frame builder process, simply do the following from a control room machine:
- 1. telnet fb 8087
- 2. shutdown
The init process running on the fb machine will then automatically restart daqd.
Generally after restarting the frame builder process, the front ends will not be talking to the fb properly (0x2bad and red lights). The easiest solution is to reboot the front ends.
For dataviewer to get data you need to make sure "daqd" and "nds pipe" are running on the fb machine.
daqd and nds have been added to the /etc/inittab file on the fb machine. These will automatically restart when killed or the machine is restarted.
However, if either process fails to start several times in rapid succession, the init process will stop trying.
- 1. Fix the underlying problem. Try looking at /opt/rtcds/caltech/c1/target/fb/logs/daqd.log.XXXX for error messages. Make sure the master and daqdrc files are correct in the fb directory.
- 2. ssh fb
- 3. "sudo /sbin/init q" to restart the init process or restart the fb machine with "sudo shutdown -r now"
The code which is called by the init process lives in /opt/rtcds/caltech/c1/target/fb/.
For testpoints to be available for a given front end, you need running on the correct front end computer:
- The IOP needs to be running, since it handles, it is genamed something like c1x00, c1x01, etc.
- The front end module needs to be loaded and running.
- The awgtpman process since it handles the arbitrary waveform generator and test points for the front end.
- The IOC needs to be running since it hands all the EPICs communication
- mx_streams running (use "sudo /etc/restart_streams") this should start a mx_stream for each front end system and is needed to talk to the fb
To confirm the necessary codes are running on a front end, you can:
- To check if the front ends are loaded, you can use "lsmod" on the front end machine, looking for c1SYSNAMEfe entries
- To check if the IOCs are running, you can "ps -ef | grep epicsC1.cmd" - there should be 1 per model
- To check if mx_stream is running, you can "ps -ef | grep mx_stream" - there should be only 1 that has all the models as command line inputs
- To check if awgtpman are running, you can "ps -ef | grep awgtpman" - there should be 1 per model
Cold start order is:
- If necessary, stop everything (fb, front ends, mx_streams, etc)
- Start (or restart) fb codes (daqd, nds, dhcpd)
- Start the front ends. They should automatically start all the necessary processes on boot up.
- Be patient, it can take over a minute for all the modules to finally load. You can type "dmesg" to see how far its gotten.
General FE Instructions
Trouble Shooting
- The first thing to try is a reboot. All the front ends should automatically start their FE processes on a reboot.
- Also see the [#fb fb/DAQ] section.
- The following assume you are running a terminal on the FE computer having problems, i.e. ssh -X computer_name
- Try "dmesg | grep c1SYS", where c1SYS is the front end model name, such as c1lsc, c1x02, c1us, c1rfm, c1scy, and so on.
- Look for things like ADC timeouts, not finding ADC/DAC or BO cards or other error messages.
- If the EPICS process is running (you see non-white channels in medm screens, but nothing is responding or changing), try restarting the FE itself
- Be aware that BURT Restore needs to be set to 1 on the corresponding GDS_TP screen. These can be accessed from the sitemap with the GDS FE and GDS IOP buttons.
- The IOP (usually named c1xYY where YY is a number like 01 or 02) needs to be running for any other FE processes to run properly.
- Check to see if the FE module is loaded. Type "lsmod" and look for c1SYSfe, where SYS is the 3 letter name of the model.
- If its running but not responding, kill it with "sudo rmmod c1SYSfe", again replacing SYS with the 3 letter name of the model.
- To restart first go to the directory where the module lives. Type "target", then "cd c1SYS/bin/". Then type "sudo insmod c1SYSfe.ko", with the correct replacement for SYS.
- To restart just one FE including its IOC and awgtpman processes, go to the scripts directory (type "scripts") then run the startc1SYS script, where SYS is the FE model you want to start.
- Do not use sudo to run the script. If you get an error indicating a log file cannot be opened due to permissions, go to the log file and run "sudo chown controls:controls LOGFILE", where LOGFILE is the file name.
c1ioo
This machine runs the c1x03, c1ioo, and c1gpt FE models. It controls mode cleaner wavefront sensors, mode cleaner length, and green locking.
On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections.
It is connected via GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1scx, and c1scy computers.
c1ioo is a Sun X4600 machine. As such for a complete shutdown (not normally necessary but sometimes), do the following:
Shutdown the computer normally. (Power button or "shutdown -h now").
Go out to the rack and unplug all 4 power supply cables on the back of the machine.
Wait for a bit for the machine to completely stop (30 seconds or so).
Plug all the cables back in, and press the power button.
c1sus
This machine runs the c1x02, c1sus, c1mcs, c1rfm, c1pem FE models. It controls the BS, ITMX, ITMY, PRM,SRM,MC1,MC2, and MC3 optics.
On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections.
It is connected by a GE Fanuc VMIC 5565 RFM card to the c1scx, c1scy, and c1ioo machines.
It is connected by a Dolphin PCIE reflected memory card to the c1lsc machine.
c1lsc
This machine runs the c1x04 and c1lsc FE models.
On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections.
It is connected by a Dolphin PCIE reflected memory card to the c1sus machine.
c1iscex
This machine runs the c1x01 and c1scx FE models.
On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections.
It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers.
c1iscey
This machine runs the c1x05 and c1scy FE models.
On reboot, these models should automatically start up. See also the [#fb fb/DAQ] and the [#FE General FE Instructions] sections.
It is connected by a GE Fanuc VMIC 5565 reflected memory card to the c1sus, c1ioo, and c1scy computers.
c1psl
Sometimes you can just do this guy by doing:
telnet c1psl reboot
then burt restore this guy.
But often, this just makes it upset and the screens go white but it never comes back. When that happens go out to the rack (the one next to the one with the MC servo) and turn off the crate (on the bottom) which has the c1psl processor. After ~3.14 seconds, turn it back on. c1psl ought to come back now.
If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.
c1pem1
Sometimes you can just do this guy by doing:
telnet c1pem1 reboot
then burt restore this guy.
If it still doesn't come back then sing [http://www.amazon.com/gp/music/clipserve/B000002W9Q001005/1/ref=mu_sam_ra001_005/002-7727484-0862420 this] link.
c1iool0
At the command prompt, type:
telnet c1iool0
Try CTRL+x
It should reboot c1iool0
This computer automatically executes startup.cmd. So there is no need to run it manually.
If for some reason it does not load the startup script automatically, try this:
At the telnet prompt, type
< /cvs/cds/caltech/target/c1iool0/startup.cmd
Then, after the main loop is started, type CTRL-], followed by
quit
c1omc
-1) Make sure the c1omc is powered on--it doesn't power up automatically following a power outage. First find the OMC, then press its power button.
0) Make sure the previous incarnation of the code is no longer running. See Appendix A for details.
1) while logged in as controls, run the script startupC1 in the c1omcepics target directory.
2) Log in as root. Start the real-time code by running the omcfe.rtl script in the c1omc
- target directory.
2.5) Now the process will wait for a BURT restore. Find the appropriate autoburt snapshot file, and restore it.
3) Also, as root, run the command /opt/gds/awgtpman -2 in the background.
Note that c1omc has two ethernet ports. Use the bottom one.
If nothing works, check the mount tables and make sure that linux1:/home/cds is mounted as /cvs/cds. If it's not, sudo mount -a.
A) To stop the front end code, first press the red FE RESET button on the C1OMC_GDS screen. Then,
- i) log in to c1omc. become root.
- ii) kill epics with a 'pkill omcepics'
- iii) kill the test-point manager with a 'pkill awgtpman'
- iv) remove the front end kernel module with '/sbin/rmmod omcfe'
- v) check that the [omcfe] kernel module is gone with a '/sbin/lsmod'
c1ass
- Currently the procedure for restarting C1ASS seems to be the same as for C1OMC above except that the ass test point manager doesn't need the "-2" flag.
op440m
Reboot as usual. If its acting weird or slow just hit the moon button. Pick the shutdown option. After a few minutes it will turn off. The hit the on button on the front of the machine. Wait for the login prompt. Then log in as controls.
op340m
Reboot as usual. It's headless, so you'll need to ssh in and type 'reboot'.
Restart the following scripts:
nodus
Nodus is a Solaris box in the rack in the office. Here are some of the things that it runs that you will want to restart:
