wiki:UfoServer

UFO Server

TOC?

System

  • Host name: ufosrv1.ka.fzk.de
  • Interfaces: eth0 (10 GBit), eth1 (upper-right socket)
    • eth0: dhcp
    • eth1: 141.52.111.135/22
    • Gateway: 141.52.111.208 (via dhcp)
    • Name server: 141.52.111.248 (via dhcp)
  • Running services
    • SSH on ports 22 and 24
    • NX server over SSH
    • VirtualGL (OpenGL forwarding)

Monitoring

  • Current Status
  • Sensors & Performance, Historical Archive
  • IPMI
    • User name: ADMIN, Password: Ask Suren
    • Video output should be configured to graphics card integrated into the motherboard
    • Temperature, Voltage, and Fan sensors monitoring
    • Remote power management: power-off, power-on, reboot
    • Java-based remote console
    • SOL-based remote console
      • To connect run: ipmiconsole -h 141.52.111.203 -u ADMIN -p <password>
      • ipmiconsole application is provided by freeipmi package

Hardware

  • Display connected to integrated video card (Matrox G200)
  • 2 x Xeon X5650 / Intel X58 / 96 GB DDR3
  • System drives: 2 x Hitachi 2TB SATA2
  • Areca ARC-1880 Raid Controller (x16 slot)
    • 16 x 2 TB Hitachi HUA722020ALA330 in external Areca Enclosure
    • 4 x 256 Crucial RealSSD C300
  • External PCIe 2.0 x16 (x16 slot)
    • External GPU box from One Stop Systems
    • 4 x NVIDIA GeForce? GTX580
  • 2 x NVIDIA GeForce? GTX580 (x16 slots)
  • Intel 82598EB 10GBit Ethernet (x4 slot)
  • Silicon Software CameraLink? FrameGrabber? MicroEnable? IV VD4-CL Full (PCIe 1.0 x4 slot)
  • Free slots
    • PCI express: x4 and x1
    • Storage: 2 x SSD in the main server case

UFO Camera

  • Do not leave a PCIe extender cable connected to the server if camera is removed or switched off
  • After disconnecting the camera you may need to turn off computer ( removing power plugs! ) and turn it on again

Areca Raid Configuration

  • A single Areca-1880 controller handles both external storage box with SATA hard drives and internal SSD cache. Only a pair of system hard drives are connected to the SATA controller integrated in the motherboard.
  • 16 x Hitachi 2TB SATA hard drives in the external enclosure are organized as Raid-6
  • 4 x Crucial SSD C300 in the server case are organized as Raid-0

Partitioning

  • Two system hard-drives are connected to internal SATA controller and mirrored as Raid-1 using Linux Software Raid.
    • Devices: /dev/sda, /dev/sdb
    • Partitions: /boot (2GB ext2), / (256GB ext4), /home (ext4)
  • The RAID is split into the 2 partitions (GPT partition table): the fast and normal.
    • Device: /dev/sdc
    • The fast partition will be used to stream the data from the camera and should be able to stand throughput of 850 MB/s. The data should be moved out as soon as possible. Only a single application is allowed to write to the disk.
      • Size: first 6TB of disk array
      • File system: non-journaled ext4
      • Mount point: /mnt/fast
    • Standard partition is for short term data storage (before offloading to LSDF)
      • Size: 22 TB
      • File system: ext4
      • Mount point: /mnt/raid
  • The SSD cache
    • Device: /dev/sdd
    • Size: 1 TB
    • File system: ext4
    • Mount point: /mnt/ssd
  • Partition table (/dev/sda & /dev/sdc)
    Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
    255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x0003874d
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1   *        2048     4192255     2095104   fd  Linux raid autodetect
    /dev/sda2         4192256   541069311   268438528   fd  Linux raid autodetect
    /dev/sda3       541069312  3907028991  1682979840   fd  Linux raid autodetect
    
  • Partition table (/dev/sdc)
    Disk /dev/sdc: 28.0TB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt_sync_mbr
    
    Number  Start   End     Size    File system  Name     Flags
     1      1049kB  6597GB  6597GB               primary
     2      6597GB  28.0TB  21.4TB  xfs          primary
    
  • Partition table (/dev/sdd)
    Disk /dev/sdd: 1000.0 GB, 999998619648 bytes
    255 heads, 63 sectors/track, 121576 cylinders, total 1953122304 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x000b6bc3
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdd1            2048  1953122303   976560128   83  Linux
    
  • Raid table:
    Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
    md2 : active raid1 sdb3[1] sda3[0]
          1682979704 blocks super 1.0 [2/2] [UU]
          bitmap: 3/13 pages [12KB], 65536KB chunk
    
    md0 : active raid1 sdb1[1] sda1[0]
          2095092 blocks super 1.0 [2/2] [UU]
          bitmap: 0/1 pages [0KB], 65536KB chunk
    
    md1 : active raid1 sda2[0] sdb2[1]
          268438392 blocks super 1.0 [2/2] [UU]
          bitmap: 0/3 pages [0KB], 65536KB chunk
    
  • mdadm.conf
    DEVICE containers partitions
    ARRAY /dev/md0 UUID=7c032686:e8861a19:9ccb43c3:8f25011e
    ARRAY /dev/md1 UUID=4a18bb5c:4b4b4490:929fdc08:99b65f2f
    ARRAY /dev/md2 UUID=8e7c863e:3a75af81:321862ae:d679602e
    
  • fstab
    /dev/disk/by-id/md-uuid-4a18bb5c:4b4b4490:929fdc08:99b65f2f /                    ext4       acl,user_xattr        1 1
    /dev/disk/by-id/md-uuid-7c032686:e8861a19:9ccb43c3:8f25011e /boot                ext2       acl,user_xattr        1 2
    /dev/disk/by-id/md-uuid-8e7c863e:3a75af81:321862ae:d679602e /home                ext4       acl,user_xattr        1 2
    /dev/disk/by-id/scsi-2001b4d2003077811-part2 /mnt/raid            xfs        defaults              1 2
    /dev/disk/by-id/scsi-2001b4d2064473251-part1 /mnt/ssd             ext4       acl,user_xattr        1 2
    proc                 /proc                proc       defaults              0 0
    sysfs                /sys                 sysfs      noauto                0 0
    debugfs              /sys/kernel/debug    debugfs    noauto                0 0
    usbfs                /proc/bus/usb        usbfs      noauto                0 0
    devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
    
    anka-tomo2.ka.fzk.de:/mnt/tomoraid3 /mnt/tomoraid3 nfs defaults 0 0
    lsmb01.lsdf.kit.edu:/gpfs/lsdf/anka /mnt/tomoraid-LSDF nfs defaults 0 0
    

Software

Base System

  • openSUSE 12.1
  • Desktop: Gnome
  • Development: Kernel, GNOME, Python

System Configuration

  • Additional kernel parameters to be added into the /boot/grub/menu.lst: vga=3 console=ttyS1,115200 earlyprint=serial,ttyS1,115200 pcie_aspm=off
    • vga - configure standard text mode console
    • console and earlyprint - enable remote SOL console
    • pcie_aspm - prevent errors on PCIe bus
  • NVIDIA driver should be instructed to use MSI interrupt using NVreg_EnableMSI=1 parameter. /etc/modprobe.d/50-nvidia.conf:
    options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 NVreg_EnableMSI=1
    
  • In /etc/init.d/boot.local nvidia module should loaded and enforced into the persistent mode:
    modprobe nvidia
    nvidia-smi -pm 1
    
  • Terminal on SOL console should be enabled in the /etc/inittab by adding:
    T0:2345:respawn:/usr/sbin/mgetty -s 115200 /dev/ttyS1 vt100
    
  • Logins on /dev/ttyS1 should be allowed in the /etc/securetty

Additional Packages

  • Repositories
    zypper ar http://download.opensuse.org/repositories/science/openSUSE_12.1/science.repo
    zypper ar http://download.opensuse.org/repositories/X11:/RemoteDesktop/openSUSE_12.1/X11:RemoteDesktop.repo
    
  • Packages
    zypper install mgetty
    zypper install bzr cmake
    zypper install sshfs
    zypper install freeglut-devel openmpi-devel fftw3-devel python-imaging python-numpy-devel
    zypper install gcc gcc-c++ glib2-devel json-glib-devel
    zypper install gobject-introspection-devel python-gobject2
    zypper install gtk-doc python-Sphinx
    zypper install libtiff-devel
    zypper install nano
    zypper install imagej
    zypper install FreeNX
    zypper install python-qt4-devel
    zypper install libmysqlclient-devel
    zypper install libmysqld-devel
    zypper install python-scipy python-matplotlib python-matplotlib-tk
    zypper install tiff
    

Camera Drivers

  • Get Silicon Software driver and SDK
  • Install menable driver for Silicon Software micronEnable CameraLink? frame-grabber
    • Extract source
    • For post 3.2 kernels, you may need to apply a patch menable-ds.patch
    • Compile with default compiler (the system will crash if you use different version of compiler to build kernel and modules). Please, be careful here, if you have already installed CUDA and set default compiler to gcc-4.3, you need temporarily to revert to gcc-4.6 to build the kernel module!
    • Install and load the module
      tar xjf menable_linuxdrv_src_3.9.14_4.0.3.tar.bz2
      cd menable_linuxdrv_src_3.9.14_4.0.3
      cat menable-ds.patch | patch -p 1
      make
      make install
      depmod -a
      modprobe menable
      
    • Install SDK RPMs
      rpm -i siso-rt5*.rpm
      

FreeNX

  • Configure FreeNX server to allow remote desktop
    nxsetup
    
  • There is some problems with current snapshot (30.01.2013) of OpenSuSE 12.2 repository, as workaround you may
    • install FreeNX from
      http://download.opensuse.org/repositories/home:/please_try_again/openSUSE_12.2/home:please_try_again.repo
      
    • make symlink from authorized_keys to authorized_keys2 in the /var/lib/nxserver/home/.ssh

VirtualGL

Install VirtualGL to provide OpenGL forwarding (HOW?)

CUDA

  • Downgrade to gcc 4.3 (gcc-4.6 is not compatible with the latest CUDA toolkit)
    zypper ar http://download.opensuse.org/repositories/devel:/gcc/openSUSE_12.1/devel:gcc.repo
    zypper install gcc43 gcc43-c++ gcc43-locale
    for name in `rpm -ql gcc43 gcc43-c++ | grep "/usr/bin"`; do ln -sf $name /usr/bin/`basename $name -4.3`; done
    
  • Currently installed
    • Driver: 304.33
    • Toolkit: 4.2 (installed into /opt/cuda)
    • SDK: 4.2 (installed into /opt/cuda/sdk)
    • SDK must be compiled
      cd /opt/cuda/sdk
      make
      
  • Allow reseting failed GPUs. Add following line into the /etc/sudoers:
    ALL ALL=(ALL) NOPASSWD: /opt/cuda/sdk/C/bin/linux/release/deviceQuery
    

UFO Framework

PyHST

  • Currently installed in /opt/pyhst
  • To build from pyhst/pyhst
    cd /opt
    bzr branch http://ufo.kit.edu/sources/csa/pyhst/
    cd pyhst
    cmake .
    make
    
  • Start script /opt/PyHST
    #!/bin/bash
    
    PACKAGE_HOME=/opt/pyhst
    PACKAGE_SOURCE=${PACKAGE_HOME}
    CUDA_DIR=/opt/cuda.41
    
    export PATH=${INSTALLATION_HOME}/bin:$PATH
    export LD_LIBRARY_PATH=${PACKAGE_SOURCE}:${INSTALLATION_HOME}/lib:$LD_LIBRARY_PATH
    export LDFLAGS="-L ${INSTALLATION_HOME}/lib"
    export CPPFLAGS="-I ${INSTALLATION_HOME}/include"
    
    export LD_LIBRARY_PATH=$CUDA_DIR/lib64/:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=${PACKAGE_SOURCE}/hst_cuda:$LD_LIBRARY_PATH
    
    export PYTHONPATH=${PACKAGE_SOURCE}:$PYTHONPATH
    PYTHON=python
    
    ${PYTHON}   ${PACKAGE_SOURCE}/PyHST.py   $*
    
  • Start PyHST with
    /opt/PyHST <parameter_file.par>
    

TANGO 7.2.6a

Installed prerequisites:

Additional installed packages:

Maintenance

  • Reset failed GPU devices
    sudo /opt/cuda/sdk/C/bin/linux/release/deviceQuery
    
  • Usability tests:
    • Check if CUDA and OpenCL are usable
       /opt/scripts/nagios_opencl.sh
      
    • Check PyHST is usable
       /opt/scripts/nagios_pyhst.sh
      
    • Check UFO Framework is usable
       ... to be added ...
      
  • Check if cameras are usable
     ... to be added ...
    

Usage

  • Run PyHST
    /opt/PyHST <parameter_file.par>
    
Last modified 12 years ago Last modified on Jan 30, 2013, 4:55:51 PM

Attachments (1)

Download all attachments as: .zip