UFO Server
System
- Host name: ufosrv1.ka.fzk.de
- Interfaces: eth0 (10 GBit), eth1 (upper-right socket)
- eth0: dhcp
- eth1: 141.52.111.135/22
- Gateway: 141.52.111.208 (via dhcp)
- Name server: 141.52.111.248 (via dhcp)
- Running services
- SSH on ports 22 and 24
- NX server over SSH
- VirtualGL (OpenGL forwarding)
Monitoring
- Current Status
- Sensors & Performance, Historical Archive
- IPMI
- User name: ADMIN, Password: Ask Suren
- Video output should be configured to graphics card integrated into the motherboard
- Temperature, Voltage, and Fan sensors monitoring
- Remote power management: power-off, power-on, reboot
- Java-based remote console
- SOL-based remote console
- To connect run: ipmiconsole -h 141.52.111.203 -u ADMIN -p <password>
- ipmiconsole application is provided by freeipmi package
Hardware
- Display connected to integrated video card (Matrox G200)
- 2 x Xeon X5650 / Intel X58 / 96 GB DDR3
- System drives: 2 x Hitachi 2TB SATA2
- Areca ARC-1880 Raid Controller (x16 slot)
- 16 x 2 TB Hitachi HUA722020ALA330 in external Areca Enclosure
- 4 x 256 Crucial RealSSD C300
- External PCIe 2.0 x16 (x16 slot)
- External GPU box from One Stop Systems
- 4 x NVIDIA GeForce? GTX580
- 2 x NVIDIA GeForce? GTX580 (x16 slots)
- Intel 82598EB 10GBit Ethernet (x4 slot)
- Silicon Software CameraLink? FrameGrabber? MicroEnable? IV VD4-CL Full (PCIe 1.0 x4 slot)
- Free slots
- PCI express: x4 and x1
- Storage: 2 x SSD in the main server case
UFO Camera
- Do not leave a PCIe extender cable connected to the server if camera is removed or switched off
- After disconnecting the camera you may need to turn off computer ( removing power plugs! ) and turn it on again
Areca Raid Configuration
- A single Areca-1880 controller handles both external storage box with SATA hard drives and internal SSD cache. Only a pair of system hard drives are connected to the SATA controller integrated in the motherboard.
- 16 x Hitachi 2TB SATA hard drives in the external enclosure are organized as Raid-6
- 4 x Crucial SSD C300 in the server case are organized as Raid-0
Partitioning
- Two system hard-drives are connected to internal SATA controller and mirrored as Raid-1 using Linux Software Raid.
- Devices: /dev/sda, /dev/sdb
- Partitions: /boot (2GB ext2), / (256GB ext4), /home (ext4)
- The RAID is split into the 2 partitions (GPT partition table): the fast and normal.
- Device: /dev/sdc
- The fast partition will be used to stream the data from the camera and should be able to stand throughput of 850 MB/s. The data should be moved out as soon as possible. Only a single application is allowed to write to the disk.
- Size: first 6TB of disk array
- File system: non-journaled ext4
- Mount point: /mnt/fast
- Standard partition is for short term data storage (before offloading to LSDF)
- Size: 22 TB
- File system: ext4
- Mount point: /mnt/raid
- The SSD cache
- Device: /dev/sdd
- Size: 1 TB
- File system: ext4
- Mount point: /mnt/ssd
- Partition table (/dev/sda & /dev/sdc)
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003874d Device Boot Start End Blocks Id System /dev/sda1 * 2048 4192255 2095104 fd Linux raid autodetect /dev/sda2 4192256 541069311 268438528 fd Linux raid autodetect /dev/sda3 541069312 3907028991 1682979840 fd Linux raid autodetect
- Partition table (/dev/sdc)
Disk /dev/sdc: 28.0TB Sector size (logical/physical): 512B/512B Partition Table: gpt_sync_mbr Number Start End Size File system Name Flags 1 1049kB 6597GB 6597GB primary 2 6597GB 28.0TB 21.4TB xfs primary
- Partition table (/dev/sdd)
Disk /dev/sdd: 1000.0 GB, 999998619648 bytes 255 heads, 63 sectors/track, 121576 cylinders, total 1953122304 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000b6bc3 Device Boot Start End Blocks Id System /dev/sdd1 2048 1953122303 976560128 83 Linux
- Raid table:
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] md2 : active raid1 sdb3[1] sda3[0] 1682979704 blocks super 1.0 [2/2] [UU] bitmap: 3/13 pages [12KB], 65536KB chunk md0 : active raid1 sdb1[1] sda1[0] 2095092 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sda2[0] sdb2[1] 268438392 blocks super 1.0 [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk
- mdadm.conf
DEVICE containers partitions ARRAY /dev/md0 UUID=7c032686:e8861a19:9ccb43c3:8f25011e ARRAY /dev/md1 UUID=4a18bb5c:4b4b4490:929fdc08:99b65f2f ARRAY /dev/md2 UUID=8e7c863e:3a75af81:321862ae:d679602e
- fstab
/dev/disk/by-id/md-uuid-4a18bb5c:4b4b4490:929fdc08:99b65f2f / ext4 acl,user_xattr 1 1 /dev/disk/by-id/md-uuid-7c032686:e8861a19:9ccb43c3:8f25011e /boot ext2 acl,user_xattr 1 2 /dev/disk/by-id/md-uuid-8e7c863e:3a75af81:321862ae:d679602e /home ext4 acl,user_xattr 1 2 /dev/disk/by-id/scsi-2001b4d2003077811-part2 /mnt/raid xfs defaults 1 2 /dev/disk/by-id/scsi-2001b4d2064473251-part1 /mnt/ssd ext4 acl,user_xattr 1 2 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 usbfs /proc/bus/usb usbfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 anka-tomo2.ka.fzk.de:/mnt/tomoraid3 /mnt/tomoraid3 nfs defaults 0 0 lsmb01.lsdf.kit.edu:/gpfs/lsdf/anka /mnt/tomoraid-LSDF nfs defaults 0 0
Software
Base System
- openSUSE 12.1
- Due to Bug #731230, the systemd should be updated
- Desktop: Gnome
- Development: Kernel, GNOME, Python
System Configuration
- Additional kernel parameters to be added into the /boot/grub/menu.lst: vga=3 console=ttyS1,115200 earlyprint=serial,ttyS1,115200 pcie_aspm=off
- vga - configure standard text mode console
- console and earlyprint - enable remote SOL console
- pcie_aspm - prevent errors on PCIe bus
- NVIDIA driver should be instructed to use MSI interrupt using NVreg_EnableMSI=1 parameter. /etc/modprobe.d/50-nvidia.conf:
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 NVreg_EnableMSI=1
- In /etc/init.d/boot.local nvidia module should loaded and enforced into the persistent mode:
modprobe nvidia nvidia-smi -pm 1
- Terminal on SOL console should be enabled in the /etc/inittab by adding:
T0:2345:respawn:/usr/sbin/mgetty -s 115200 /dev/ttyS1 vt100
- Logins on /dev/ttyS1 should be allowed in the /etc/securetty
Additional Packages
- Repositories
zypper ar http://download.opensuse.org/repositories/science/openSUSE_12.1/science.repo zypper ar http://download.opensuse.org/repositories/X11:/RemoteDesktop/openSUSE_12.1/X11:RemoteDesktop.repo
- Packages
zypper install mgetty zypper install bzr cmake zypper install sshfs zypper install freeglut-devel openmpi-devel fftw3-devel python-imaging python-numpy-devel zypper install gcc gcc-c++ glib2-devel json-glib-devel zypper install gobject-introspection-devel python-gobject2 zypper install gtk-doc python-Sphinx zypper install libtiff-devel zypper install nano zypper install imagej zypper install FreeNX zypper install python-qt4-devel zypper install libmysqlclient-devel zypper install libmysqld-devel zypper install python-scipy python-matplotlib python-matplotlib-tk zypper install tiff
Camera Drivers
- Get Silicon Software driver and SDK
- Install menable driver for Silicon Software micronEnable CameraLink? frame-grabber
- Extract source
- For post 3.2 kernels, you may need to apply a patch menable-ds.patch
- Compile with default compiler (the system will crash if you use different version of compiler to build kernel and modules). Please, be careful here, if you have already installed CUDA and set default compiler to gcc-4.3, you need temporarily to revert to gcc-4.6 to build the kernel module!
- Install and load the module
tar xjf menable_linuxdrv_src_3.9.14_4.0.3.tar.bz2 cd menable_linuxdrv_src_3.9.14_4.0.3 cat menable-ds.patch | patch -p 1 make make install depmod -a modprobe menable
- Install SDK RPMs
rpm -i siso-rt5*.rpm
FreeNX
- Configure FreeNX server to allow remote desktop
nxsetup
- There is some problems with current snapshot (30.01.2013) of OpenSuSE 12.2 repository, as workaround you may
- install FreeNX from
http://download.opensuse.org/repositories/home:/please_try_again/openSUSE_12.2/home:please_try_again.repo
- make symlink from authorized_keys to authorized_keys2 in the /var/lib/nxserver/home/.ssh
- install FreeNX from
VirtualGL
Install VirtualGL to provide OpenGL forwarding (HOW?)
CUDA
- Downgrade to gcc 4.3 (gcc-4.6 is not compatible with the latest CUDA toolkit)
zypper ar http://download.opensuse.org/repositories/devel:/gcc/openSUSE_12.1/devel:gcc.repo zypper install gcc43 gcc43-c++ gcc43-locale for name in `rpm -ql gcc43 gcc43-c++ | grep "/usr/bin"`; do ln -sf $name /usr/bin/`basename $name -4.3`; done
- Currently installed
- Driver: 304.33
- Toolkit: 4.2 (installed into /opt/cuda)
- SDK: 4.2 (installed into /opt/cuda/sdk)
- SDK must be compiled
cd /opt/cuda/sdk make
- Allow reseting failed GPUs. Add following line into the /etc/sudoers:
ALL ALL=(ALL) NOPASSWD: /opt/cuda/sdk/C/bin/linux/release/deviceQuery
UFO Framework
- Install libpco/master and libuca/master to add support for PCO cameras
- Install ufo-core/master and ufo-filters/master according to the manual.
PyHST
- Currently installed in
/opt/pyhst
- To build from pyhst/pyhst
cd /opt bzr branch http://ufo.kit.edu/sources/csa/pyhst/ cd pyhst cmake . make
- Start script
/opt/PyHST
#!/bin/bash PACKAGE_HOME=/opt/pyhst PACKAGE_SOURCE=${PACKAGE_HOME} CUDA_DIR=/opt/cuda.41 export PATH=${INSTALLATION_HOME}/bin:$PATH export LD_LIBRARY_PATH=${PACKAGE_SOURCE}:${INSTALLATION_HOME}/lib:$LD_LIBRARY_PATH export LDFLAGS="-L ${INSTALLATION_HOME}/lib" export CPPFLAGS="-I ${INSTALLATION_HOME}/include" export LD_LIBRARY_PATH=$CUDA_DIR/lib64/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${PACKAGE_SOURCE}/hst_cuda:$LD_LIBRARY_PATH export PYTHONPATH=${PACKAGE_SOURCE}:$PYTHONPATH PYTHON=python ${PYTHON} ${PACKAGE_SOURCE}/PyHST.py $*
- Start PyHST with
/opt/PyHST <parameter_file.par>
TANGO 7.2.6a
Installed prerequisites:
- OmniORB 4.1.5
- OmniNotify? 2.0
Additional installed packages:
- PyTango? 7.2.2
Maintenance
- Reset failed GPU devices
sudo /opt/cuda/sdk/C/bin/linux/release/deviceQuery
- Usability tests:
- Check if CUDA and OpenCL are usable
/opt/scripts/nagios_opencl.sh
- Check PyHST is usable
/opt/scripts/nagios_pyhst.sh
- Check UFO Framework is usable
... to be added ...
- Check if CUDA and OpenCL are usable
- Check if cameras are usable
... to be added ...
Usage
- Run PyHST
/opt/PyHST <parameter_file.par>
Last modified 12 years ago
Last modified on Jan 30, 2013, 4:55:51 PM
Attachments (1)
- menable-ds.patch (3.0 KB) - added by 13 years ago.
Download all attachments as: .zip