Summer 2022 CDS system upgrade
The plan is to upgrade the 40m CDS system to latest RTS release. As of this writing, that would be advLigoRTS 4.2.8.
The basic plan:
- Set up a new rack midway down the Y arm (1Y3b).
- Move all 6 current front end machines, and FB, to the new rack.
- Install a new front end machine with support for AVX512 instructions in the new rack.
Run OneStop fiber between all the front ends and their IO chassis (which all stay in their current locations).
- Install Dolphin IX interconnect for all front ends in the new rack (drop all old RFM).
- IX card in all front ends
- IX switch in new rack
- Upgrade all machines (front ends (via diskless boot) and FB) to Debian 11 (or 10, whichever is currently in production at the sites).
- Install all needed software components via CDSSoft Debian 11 (or 10) archive.
Before the upgrade we will test the new software configuration on the 40m test stand.
We will assume that we will support up to 8 front end machines in the new system (7 will be present after the upgrade)
What do we need that we don't currently have:
What we need that we think we currently have (need to actually find and store these to avoid double counting):
- A Rack
- Some other FE machines for all of our running models? How many? Can we re-use some existing computers? NEED to TEST that new RTS works on these or we are in danger of hanging up the whole 40m during this upgrade.
- A framebuilder. I know is **SHOULD** work, but that's what we say about everything right before it doesn't work. Need to test this also and possibly buy a new FB if our old one has issues.
Items |
Quantity |
Description |
Status |
Action |
KVM 8-port Switch |
1 |
keyboard and display control switch |
unknown |
See table below for details |
OneStop Card |
17 |
PCIe extension from FE to IO chassis |
received |
|
OneStop Fibre (100 m) |
7 |
PCIe extension fibre |
received |
|
Dolphin IX Card |
8 |
FE RFM adapter card |
received |
|
Dolphin IX Cables |
8 |
FE to dolphin switch |
received |
|
Dolphin 8-port switch |
1 |
dolphin switch to connect all FE |
received |
|
Rack |
1 |
Rack to mount all FE machines, dolphin switch, KVM switch |
received |
|
Items |
Quantity |
Description |
Link |
Price |
|
KVM PS2 8-port Switch |
1 |
Keyboard and display control switch |
$429.95 |
|
|
PS/2 KVM Cable 6 Ft |
2 |
Converts the VGA to VGA + PS2 for the older computers |
$19.95 |
|
|
OneStop Fiber Optic Cable |
6 |
Connects IOChssis to Computers. (THESE ARE DIRECTIONAL) |
Available at Request |
|
|
OneStop Copper Cable |
1 |
Connects IOChssis to Computers. (THESE ARE DIRECTIONAL) |
$372.00 |
|
The FE machines initially spec'd for the teststand are c1sus2 and c1bhd. Currently, c1sus2 has been moved to the main system and c1bhd may be moved over in the near future, so I believe we need at least 2+ FE machines to have an operational teststand longterm. In the meantime, we need a new front-end to test dolphin communication with c1bhd.
Items |
Quantity |
Description |
Status |
Action |
Front-end (FE) machines |
2+ |
Needed to test dolphin communication |
received |
|
OSS-PCIE-HIB25-X4 Gen 2 Host Cable Adaptor. |
2+ |
OneStop companion card on FE |
unknown |
|
recipe
fb1
- debootstrap /diskless/root
- FIXME: tftpd-hpa and pxe setup
- in /diskless/root
- apt install locales openssh-server man emacs-nox nfs-client
- FIXME: update initramfs:
- /etc/initramfs-tools/initramfs.conf
- /etc/initramfs-tools/modules
- update-initramfs -u
- FIXME: make script to setup/mount /diskless/root on fb1
- mount --bind /dev /diskless/root/dev
- mount -t proc /proc /diskless/root/proc
- chroot /diskless/root
- apt install advligorts-rcg advligorts-fe
- FIXME: need to install rtcds kernel for deb 10 for now, until rcg 5.0 dkms fixed for deb 10
- fix permissions on /opt/rtcds:
- advligorts on NFS host for target
- recursive group advligorts
- setgid for all directories
- umask for group write permission
bug finding generate_KisselButton.py:
` Unable to find the following file in CDS_MEDM_PATH: SUS_SINGLE.adl ERROR: Could not find file: generate_KisselButton.py Searched path: /opt/rtcds/caltech/c1/post_build Exiting make: *** [Makefile:166: install-c1sus] Error 1 `
---
chroot installation issues
We ran into some problems installing dolphin via chroot, which was fixed by
- sudo mount --bind /dev /diskless/root/dev
- sudo mount -t proc /proc /diskless/root/proc
- sudo chroot /diskless/root
uname -r dolphin driver install issue
We also noticed that the different linux kernel versions on the bootserver and front-ends was causing issues due to chroot 'uname -r' call giving the bootserver's kernel instead of the FE kernel, so we fixed that by installing the same FE rtcds kernel on the boost server as well.
- sudo apt install linux-image-4.19.0-6-rtcds-amd64-unsigned
- sudo apt remove linux-image-4.19.0-21-amd64
- sudo reboot
- sudo apt install ligo-dolphin-ix-node
boot server
These are the cmds we ran to setup dolphin on the bootserver.
- sudo apt install ligo-dolphin-networkmanager
- sudo cp /diskless/root/etc/apt/sources.list.d/restricted.list /etc/apt/sources.list.d/
- sudo apt update
- sudo apt install ligo-dolphin-networkmanager
- sudo /opt/DIS/sbin/dis_mkconf -fabrics 1 -sclw 8 -stt 1 -nodes c1sus c1bhd -nosessions
---
- #cd /diskless/root/etc
- #rm -rf dis
- #ln -s /var/log dis
We decided the symlink idea was not a good one, so we edited /diskless/root/etc/fstab to mount a writeable /etc/dis instead.
Model edits for dolphin test
- Edited IOP models and user models for c1bhd and c1sus. We now have them both in the c1bhd folder.
Included dolphin_time_xmit=1 in c1x06 CDS block parameter on the sender to send timing over dolphin and dolphin_time_rcvr=1 on the receiving IOP model ie. c1x02, etc.
- This prevented the IOP model from starting bcos it needed the advligorts-dolphin-daemon package, which depends on ligo-dolphin-srcdis, so
- apt install ligo-dolphin-srcdis
- apt install advligorts-dolphin-daemon
- apt install advligorts-dolphin-proxy-km-dkms
- This prevented the IOP model from starting bcos it needed the advligorts-dolphin-daemon package, which depends on ligo-dolphin-srcdis, so
- No I/O chassis means ADC cards in c1x02 model should give a build/start error, so we used virtualIOP=2 in c1x02, c1sus IOP, to allow error-free build.
Additional configuration for Gen2 dolphin
- Edit /opt/DIS/lib/modules/dis_ix.conf to increase the broadcast group size to 16MB
- i.e. ntb_mcast_group_size=24;
- [IS THIS NECESSARY?] Implementation of ‘pciRfm=1’ to ‘pciRfm=2. Instead of doing this for all IOP models, Create a USE_DOLPHIN_GEN2 instead as shown below
- chroot /diskless/root
- cd /usr/share/advligorts/src/src/include
- touch USE_DOLPHIN_GEN2
(posible bug in packaging) System services needed
To get dolphin communication working, we had to do the following:
- sudo modprobe dolphin-proxy-km
- sudo systemctl start rts-dolphin_daemon
FIX
- sudo chroot /diskless/root
add the text dolphin-proxy-km to the file /etc/modules [Doesn't seem to work. rts-dolphin_daemon depends on this package, so does not show green light until this is loaded. Not sure why loading at boot is a problem. ]
edit /lib/systemd/system/rts-dolphin_daemon.service by adding:
[Install]
WantedBy=multi-user.target
systemctl enable rts-dolphin_daemon
DAQ setup
On FB1
- apt-get install isc-dhcp-server
edit /etc/network/interface to assign static ip address
sudo ifup enp2s0 to bring up interface
- edit /etc/dhcp/dhcp.conf
sudo apt install advligorts-common advligorts-edc advligorts-rcg -t buster-unstable
sudo apt install advligorts-gpstime-dkms advligorts-mbuf-dkms -t buster-unstable
sudo apt install advligorts-local-dc advligorts-transport-common advligorts-transport-pubsub -t buster-unstable
sudo apt install ldas-tools-framecpp advligorts-daqd -t buster_unstable
sudo systemctl enable rts-transport@cps_recv
sudo systemctl start rts-transport@cps_recv
sudo systemctl enable rts-daqd
sudo systemctl start rts-daqd
sudo ln -s /opt/rtcds/caltech/c1/target/gds/param/testpoint.par /etc/advligorts/testpoint.par
sudo ln -s /opt/rtcds/caltech/c1/target/daqd/master /etc/advligorts/master or copy contents for models of interest i.e c1bhd and c1sus into /etc/advligorts/testpoint.par
LOOK AT /etc/advligorts/daqdrc for how to enable frame writing
On FE
add enviroment file /etc/advligorts/systemd_env_${hostname}, i.e.
cat /etc/advligorts/systemd_env_c1bhd
local_dc_args='-w 0 -s "c1x06 c1bhd" -b local_dc -m 100 -d /opt/rtcds/caltech/c1/target/gds/param'
cps_xmit_args="-b local_dc -m 100 -p 'tcp://${10.0.113.2}:9000' -D 1"
apt-get install advligorts-pubsub
systemctl enable rts-transport@cps_xmit
systemctl enable rts-local_dc
