wiki:UfoCluster

Version 11 (modified by Suren A. Chilingaryan, 11 years ago) (diff)

--

Active Nodes

  • 192.168.11.1 - ipepdvcompute1.ka.fzk.de (compute: Fermi, storage)
  • 192.168.11.2 - ipecamera.ka.fzk.de (camera)
  • 192.168.11.3 - ipekatrinadei.ka.fzk.de (storage)
  • 192.168.11.4 - ipeusctcompute1.ka.fzk.de (compute: Fermi)
  • 192.168.11.5 - ipepdvcompute2.ka.fzk.de (master, compute: Kepler, Xeon Phi)
  • 192.168.11.6 - ipepdvkepler.ka.fzk.de (camera): moved to ANKA
  • 192.168.11.7 - ipepdvsrv2.ka.fzk.de (virtualization)
  • 192.168.11.8 - ipepdvcompute3.ka.fzk.de (compute: AMD)

Remote Nodes

  • 192.168.11.6x - detached student cluster nodes
  • 192.168.11.180 - ipepdvdev1.ka.fzk.de
  • 192.168.11.117 - ipechilinga2.ka.fzk.de (csa)

Installation

Diagnostic

  • Port information: ibstat
  • Hardware graph: iblinkinfo
  • Network Diagnostic: ibdiagnet -ls 10 -lw 4x
  • Ping: ibping -S ; ibping <Baselid_of_server_reported_by_ibstat>

Storage

  • Fast storage for camera streaming
    • First 8 TB on storage boxes attached to ipekatrinadei and ipepdvcompute1
    • Exposed over iSER protocol using tgt server
      • No access sharing, single user only
    • Both devices allocated to the camera computer using openiscsi
    • Software Raid 0 is assembled from the devices
    • Formatted using XFS and mounted under /mnt/server
  • Big and slow storage
    • glusterfs is used currently, but fhgfs is faster and may be better option if they go open source (there are some plans)
      • OpenSuSE packages does not include RDMA support, the source RPM should be recompiled on the system with installed Infiniband stack
    • Storage mounted under /pdv
      • /pdv/home/ clustered home folders
      • /pdv/data/ sample data sets
    • External computers may mount the storage over NFS
      ipepdvcompute1:/storage> /pdv nfs defaults,_netdev,mountproto=tcp 0 0
      
  • Configuration

Management

  • ipepdvcompute1 is a master node
  • If you have account on ipepdvcompute1, you may convert it to the cluster account by creating empty ~/.pdvcluster folder. Then, hourly cron-job will:
    • Create accounts on all cluster nodes and synchronize uids across the cluster nodes. If you want to mount it over NFS on your desktop, you will still need to match your desktop user uid to the uid you are using on ipepdvcompute1.
    • Create a cluster home in /pdv/home/.
    • Replicate ~/.ssh from ipepdvcompute1 to all cluster nodes to allow public key authentication.
  • The ~/.ssh/ from ipepdvcompute1 will be re-replicated every hour. So, you can add/change keys on ipepdvcompute1 and they will be propagated.
  • /pdv/cluster/cluster_run.sh will run command on all cluster nodes
    • You'll need to put/generate a ssh private key into the ~/.ssh on ipepdvcompute1 and put corresponding public key to authorized_keys
      • Go to .ssh folder: cd ~/.ssh
      • Generate key (press enter when asked for password): ssh-keygen -t dsa
      • Append public key to the authorized_keys: cat id_dsa.pub >> authorized_keys
      • Wait 1h until keys are propagated
    • Example: /pdv/cluster/cluster_run.sh head -n 1 /etc/issue