Modify

Opened 10 years ago

Closed 9 years ago

#238 closed defect (fixed)

Kernel crashes on repeated DMA streaming operation with IPECamera

Reported by: Suren A. Chilingaryan Owned by: Suren A. Chilingaryan
Priority: major Milestone:
Component: pcilib Version:
Keywords: Cc: Michele Caselle, Uros Stevanovic, Matthias Vogelgesang

Description

Script:

#! /bin/bash
duration=1 #in s
rd_flag=1
function stream {
  echo "do stream"
  rd_flag=1
  pci -w 0x9040 0x80004a01
  sleep $duration
  pci -w 0x9040 80000201
  rd_flag=0
}
# rd_flag=1
# pci -r dma0 --multipacket -o stream.out -t 1000000.
read -t 0.1 in1
# while [ $rd_flag == "1" ]; do
while [ -z "$in1" ]; do
  date
  size_f=`du -hsb save/ | awk '{print $1}'`.
  echo "save folder size: $size_f bytes"
  tst=$((100*1024*1024*1024))
  decide=`x=$size_f; y=$((100*1024*1024*1024)); echo "$x $y" | awk '{if ($1 < $2) print "run"; else print "stop"}'`
  # decide=`x=$size_f; y=$((100*40)); echo "$x $y" | awk '{if ($1 > $2) print "run"; else print "stop"}'`
  if [ "$decide" == "stop" ];then
    echo "too many data"
    exit
  fi
  rm stream.out
  echo "input anything to stop"
  read -t 0.9 in1
  echo "Stream start....................."
  stream &
  # time rm stream.out
  # while [ "$rd_flag" == "1" ];do
  #   echo $rd_flag
  #   pci -r dma0 --multipacket -o stream.out -t 100000000.
  # done
  pci -r dma0 --multipacket -o stream.out -t 100000000.
  sleep .001
  status=`pci -r 9068 | awk '{print $2}'`
  if [ "$status" != "00000000" ]; then
    echo "############  ERROR!! #####################"
    pci -r 9050 -s 12
    # xmessage -nearmouse "error has happened"
    mv stream.out save/stream.out.$status
    reset.sh --logic
    sleep .1
  fi
  pci -r 9050 -s 12
  echo "again input anything to stop"
  read -t 0.9 in1
done

Crash:

[ 5743.222614] ------------[ cut here ]------------
[ 5743.227240] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/mm/memory.c:2319!
[ 5743.236805] invalid opcode: 0000 [#1] PREEMPT SMP 
[ 5743.241642] Modules linked in: ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core af_packet iTCO_wdt gpio_ich iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel igb ghash_clmulni_intel ptp lpc_ich cryptd pcspkr i2c_i801 mfd_core pps_core pl2303 usbserial joydev pciDriver(O) mlx4_core ioatdma dca shpchp wmi acpi_power_meter acpi_pad mperf button sg dm_mod autofs4 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea xhci_hcd processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0
[ 5743.308715] CPU: 3 PID: 13625 Comm: pci Tainted: G          IO 3.11.10-25-desktop #1
[ 5743.316454] Hardware name: Supermicro X10SRi-F/X10SRi-F, BIOS 1.0a 08/27/2014
[ 5743.323584] task: ffff881fcf812600 ti: ffff881fdef36000 task.ti: ffff881fdef36000
[ 5743.331061] RIP: 0010:[<ffffffff8114ab57>]  [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470
[ 5743.339518] RSP: 0018:ffff881fdef37d50  EFLAGS: 00010282
[ 5743.344825] RAX: 80000000fb400037 RBX: ffff881fde9ea3c8 RCX: 80000000fb400237
[ 5743.351956] RDX: ffff881fde9ea3c0 RSI: 00007fb9a0a78000 RDI: ffff881fdde220c0
[ 5743.359087] RBP: 00000000000fb401 R08: 000000000000000b R09: 00000000fb4ff000
[ 5743.366216] R10: 00007fb9a0a7c000 R11: 00000000000000fb R12: 00007fb9a0b78000
[ 5743.373346] R13: 0000000000000001 R14: 00007fb9a0a79000 R15: 8000000000000037
[ 5743.380478] FS:  00007fb9a0a4b700(0000) GS:ffff88207fd80000(0000) knlGS:0000000000000000
[ 5743.388563] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5743.394305] CR2: 000000000103a768 CR3: 0000001fdef33000 CR4: 00000000001407e0
[ 5743.401436] Stack:
[ 5743.403449]  ffff881fdde220c0 ffff881fcfbac828 00007fb9a0b78000 ffffea006f8b2b60
[ 5743.410908]  00007fb9a0b77fff fffffff80475a988 00007fb9a0b78000 ffff881f45e46730
[ 5743.418366]  ffff881fdef337f8 00007fb9a0b78000 0000000000100000 00007fb9a0b77fff
[ 5743.425828] Call Trace:
[ 5743.428284]  [<ffffffffa01c17df>] pcidriver_mmap_pci+0x8f/0x140 [pciDriver]
[ 5743.435248]  [<ffffffff8114f850>] mmap_region+0x400/0x610
[ 5743.440648]  [<ffffffff8114fd58>] do_mmap_pgoff+0x2f8/0x3b0
[ 5743.446223]  [<ffffffff8113bd2e>] vm_mmap_pgoff+0x8e/0xc0
[ 5743.451623]  [<ffffffff8114e497>] SyS_mmap_pgoff+0x1b7/0x240
[ 5743.457286]  [<ffffffff815af4ad>] system_call_fastpath+0x1a/0x1f
[ 5743.463293]  [<00007fb99fd5252a>] 0x7fb99fd52529
[ 5743.467906] Code: 08 08 4c 39 64 24 10 74 08 4d 89 e6 e9 b3 fe ff ff 48 83 44 24 38 08 48 8b 44 24 10 48 39 44 24 30 74 32 49 89 c6 e9 04 fe ff ff <0f> 0b 48 8b 54 24 50 48 8b 74 24 28 48 8b 7c 24 60 e8 63 b7 ef 
[ 5743.487876] RIP  [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470
[ 5743.493981]  RSP <ffff881fdef37d50>
[ 5743.497493] ---[ end trace 3cb52f16fb084d98 ]---
[ 5743.617137] note: pci[13625] exited with preempt_count 1
[ 5743.617139] BUG: scheduling while atomic: pci/13625/0x00000002
[ 5743.617163] Modules linked in: ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core af_packet iTCO_wdt gpio_ich iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel igb ghash_clmulni_intel ptp lpc_ich cryptd pcspkr i2c_i801 mfd_core pps_core pl2303 usbserial joydev pciDriver(O) mlx4_core ioatdma dca shpchp wmi acpi_power_meter acpi_pad mperf button sg dm_mod autofs4 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea xhci_hcd processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0
[ 5743.617164] CPU: 3 PID: 13625 Comm: pci Tainted: G      D   IO 3.11.10-25-desktop #1
[ 5743.617165] Hardware name: Supermicro X10SRi-F/X10SRi-F, BIOS 1.0a 08/27/2014
[ 5743.617166]  ffff881fdef37a48 ffffffff815a1822 ffff881fcf812600 ffffffff8159dc9b
[ 5743.617167]  ffff881fdef37aa8 ffffffff815a69f6 0000000000012900 ffff881fdef37fd8
[ 5743.617168]  ffff881fdef37fd8 0000000000012900 ffff881fcf812600 ffff881fcf812600
[ 5743.617168] Call Trace:
[ 5743.617176]  [<ffffffff81004ad8>] dump_trace+0x88/0x310
[ 5743.617178]  [<ffffffff81004e30>] show_stack_log_lvl+0xd0/0x1d0
[ 5743.617179]  [<ffffffff8100626c>] show_stack+0x1c/0x50
[ 5743.617183]  [<ffffffff815a1822>] dump_stack+0x50/0x89
[ 5743.617186]  [<ffffffff8159dc9b>] __schedule_bug+0x48/0x56
[ 5743.617188]  [<ffffffff815a69f6>] thread_return+0x4b4/0x4be
[ 5743.617190]  [<ffffffff815a7925>] rwsem_down_read_failed+0xc5/0x120
[ 5743.617193]  [<ffffffff812dd4d4>] call_rwsem_down_read_failed+0x14/0x30
[ 5743.617196]  [<ffffffff815a545e>] down_read+0xe/0x10
[ 5743.617199]  [<ffffffff810bd71e>] acct_collect+0x3e/0x190
[ 5743.617203]  [<ffffffff810547b6>] do_exit+0x846/0xa90
[ 5743.617205]  [<ffffffff815a968a>] oops_end+0x9a/0xe0
[ 5743.617207]  [<ffffffff81003331>] do_invalid_op+0x81/0xa0
[ 5743.617209]  [<ffffffff815b0d5e>] invalid_op+0x1e/0x30
[ 5743.617213]  [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470
[ 5743.617217]  [<ffffffffa01c17df>] pcidriver_mmap_pci+0x8f/0x140 [pciDriver]
[ 5743.617222]  [<ffffffff8114f850>] mmap_region+0x400/0x610
[ 5743.617225]  [<ffffffff8114fd58>] do_mmap_pgoff+0x2f8/0x3b0
[ 5743.617228]  [<ffffffff8113bd2e>] vm_mmap_pgoff+0x8e/0xc0
[ 5743.617231]  [<ffffffff8114e497>] SyS_mmap_pgoff+0x1b7/0x240
[ 5743.617233]  [<ffffffff815af4ad>] system_call_fastpath+0x1a/0x1f
[ 5743.617236]  [<00007fb99fd5252a>] 0x7fb99fd52529

Attachments (0)

Change History (8)

comment:1 Changed 10 years ago by Suren A. Chilingaryan

Both DMA and PIO access seems working after crash.

comment:2 Changed 10 years ago by Suren A. Chilingaryan

The crashing place in kernel, exactly on BUG_ON(!pte_none(*pte));:

static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot)
{
        pte_t *pte;
        spinlock_t *ptl;

        pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
        if (!pte)
                return -ENOMEM;
        arch_enter_lazy_mmu_mode();
        do {
                BUG_ON(!pte_none(*pte));
                set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
                pfn++;
        } while (pte++, addr += PAGE_SIZE, addr != end);
        arch_leave_lazy_mmu_mode();
        pte_unmap_unlock(pte - 1, ptl);
        return 0;
}
#define pte_offset_map_lock(mm, pmd, address, ptlp)     \
({                                                      \
        spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
        pte_t *__pte = pte_offset_map(pmd, address);    \
        *(ptlp) = __ptl;                                \
        spin_lock(__ptl);                               \
        __pte;                                          \
})
static inline int pte_none(pte_t pte)
{
        return (pte_val(pte) & _PAGE_INVALID) && !(pte_val(pte) & _PAGE_SWT);
}

comment:3 Changed 10 years ago by Suren A. Chilingaryan

This apparently happens if multiple processes try to mmap PCIe BAR space into the user-space simultaneously. It still the question - why? I have not find any documents forbidding this so far.

comment:5 Changed 10 years ago by Suren A. Chilingaryan

Workaround until it is properly resolved is to ensure that two pci commands are not executed or stopped simultaneous. Just best way would be to start streaming, sleep for a few tens of milliseconds and, then, enable AUTO triggering. Plus ensure that the streaming is long enough and it terminates after you disable AUTO triggering.

comment:6 Changed 10 years ago by Suren A. Chilingaryan

The problem is actually that mmap call does not accept any parameters specifying which BAR or KMEM to map. Therefore, it is done using multiple calls to kernel space.

  • First, we select mode BAR or KMEM. ioctl( ctx->handle, PCIDRIVER_IOC_MMAP_MODE, PCIDRIVER_MMAP_PCI )
  • Then, the BAR is specified ioctl( ctx->handle, PCIDRIVER_IOC_MMAP_AREA, PCIDRIVER_BAR0 + bar )
  • And, finally, mmap is called mmap( 0, board_info->bar_length[bar], PROT_WRITE | PROT_READ, MAP_SHARED, ctx->handle, 0 )

If another instance of pci is started in between of these commands, the settings when mmap is called will be inconsistent.

We can solve it either by locking on the process level within pci or by trying make per-process configuration queue's.


comment:7 Changed 10 years ago by Suren A. Chilingaryan

Component: Applicationpcilib

comment:8 Changed 9 years ago by Suren A. Chilingaryan

Resolution: fixed
Status: newclosed

Fixed by r285.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Suren A. Chilingaryan.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.