Opened 10 years ago
Closed 9 years ago
#238 closed defect (fixed)
Kernel crashes on repeated DMA streaming operation with IPECamera
Reported by: | Suren A. Chilingaryan | Owned by: | Suren A. Chilingaryan |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | pcilib | Version: | |
Keywords: | Cc: | Michele Caselle, Uros Stevanovic, Matthias Vogelgesang |
Description
Script:
#! /bin/bash duration=1 #in s rd_flag=1 function stream { echo "do stream" rd_flag=1 pci -w 0x9040 0x80004a01 sleep $duration pci -w 0x9040 80000201 rd_flag=0 } # rd_flag=1 # pci -r dma0 --multipacket -o stream.out -t 1000000. read -t 0.1 in1 # while [ $rd_flag == "1" ]; do while [ -z "$in1" ]; do date size_f=`du -hsb save/ | awk '{print $1}'`. echo "save folder size: $size_f bytes" tst=$((100*1024*1024*1024)) decide=`x=$size_f; y=$((100*1024*1024*1024)); echo "$x $y" | awk '{if ($1 < $2) print "run"; else print "stop"}'` # decide=`x=$size_f; y=$((100*40)); echo "$x $y" | awk '{if ($1 > $2) print "run"; else print "stop"}'` if [ "$decide" == "stop" ];then echo "too many data" exit fi rm stream.out echo "input anything to stop" read -t 0.9 in1 echo "Stream start....................." stream & # time rm stream.out # while [ "$rd_flag" == "1" ];do # echo $rd_flag # pci -r dma0 --multipacket -o stream.out -t 100000000. # done pci -r dma0 --multipacket -o stream.out -t 100000000. sleep .001 status=`pci -r 9068 | awk '{print $2}'` if [ "$status" != "00000000" ]; then echo "############ ERROR!! #####################" pci -r 9050 -s 12 # xmessage -nearmouse "error has happened" mv stream.out save/stream.out.$status reset.sh --logic sleep .1 fi pci -r 9050 -s 12 echo "again input anything to stop" read -t 0.9 in1 done
Crash:
[ 5743.222614] ------------[ cut here ]------------ [ 5743.227240] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/mm/memory.c:2319! [ 5743.236805] invalid opcode: 0000 [#1] PREEMPT SMP [ 5743.241642] Modules linked in: ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core af_packet iTCO_wdt gpio_ich iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel igb ghash_clmulni_intel ptp lpc_ich cryptd pcspkr i2c_i801 mfd_core pps_core pl2303 usbserial joydev pciDriver(O) mlx4_core ioatdma dca shpchp wmi acpi_power_meter acpi_pad mperf button sg dm_mod autofs4 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea xhci_hcd processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 [ 5743.308715] CPU: 3 PID: 13625 Comm: pci Tainted: G IO 3.11.10-25-desktop #1 [ 5743.316454] Hardware name: Supermicro X10SRi-F/X10SRi-F, BIOS 1.0a 08/27/2014 [ 5743.323584] task: ffff881fcf812600 ti: ffff881fdef36000 task.ti: ffff881fdef36000 [ 5743.331061] RIP: 0010:[<ffffffff8114ab57>] [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470 [ 5743.339518] RSP: 0018:ffff881fdef37d50 EFLAGS: 00010282 [ 5743.344825] RAX: 80000000fb400037 RBX: ffff881fde9ea3c8 RCX: 80000000fb400237 [ 5743.351956] RDX: ffff881fde9ea3c0 RSI: 00007fb9a0a78000 RDI: ffff881fdde220c0 [ 5743.359087] RBP: 00000000000fb401 R08: 000000000000000b R09: 00000000fb4ff000 [ 5743.366216] R10: 00007fb9a0a7c000 R11: 00000000000000fb R12: 00007fb9a0b78000 [ 5743.373346] R13: 0000000000000001 R14: 00007fb9a0a79000 R15: 8000000000000037 [ 5743.380478] FS: 00007fb9a0a4b700(0000) GS:ffff88207fd80000(0000) knlGS:0000000000000000 [ 5743.388563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5743.394305] CR2: 000000000103a768 CR3: 0000001fdef33000 CR4: 00000000001407e0 [ 5743.401436] Stack: [ 5743.403449] ffff881fdde220c0 ffff881fcfbac828 00007fb9a0b78000 ffffea006f8b2b60 [ 5743.410908] 00007fb9a0b77fff fffffff80475a988 00007fb9a0b78000 ffff881f45e46730 [ 5743.418366] ffff881fdef337f8 00007fb9a0b78000 0000000000100000 00007fb9a0b77fff [ 5743.425828] Call Trace: [ 5743.428284] [<ffffffffa01c17df>] pcidriver_mmap_pci+0x8f/0x140 [pciDriver] [ 5743.435248] [<ffffffff8114f850>] mmap_region+0x400/0x610 [ 5743.440648] [<ffffffff8114fd58>] do_mmap_pgoff+0x2f8/0x3b0 [ 5743.446223] [<ffffffff8113bd2e>] vm_mmap_pgoff+0x8e/0xc0 [ 5743.451623] [<ffffffff8114e497>] SyS_mmap_pgoff+0x1b7/0x240 [ 5743.457286] [<ffffffff815af4ad>] system_call_fastpath+0x1a/0x1f [ 5743.463293] [<00007fb99fd5252a>] 0x7fb99fd52529 [ 5743.467906] Code: 08 08 4c 39 64 24 10 74 08 4d 89 e6 e9 b3 fe ff ff 48 83 44 24 38 08 48 8b 44 24 10 48 39 44 24 30 74 32 49 89 c6 e9 04 fe ff ff <0f> 0b 48 8b 54 24 50 48 8b 74 24 28 48 8b 7c 24 60 e8 63 b7 ef [ 5743.487876] RIP [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470 [ 5743.493981] RSP <ffff881fdef37d50> [ 5743.497493] ---[ end trace 3cb52f16fb084d98 ]--- [ 5743.617137] note: pci[13625] exited with preempt_count 1 [ 5743.617139] BUG: scheduling while atomic: pci/13625/0x00000002 [ 5743.617163] Modules linked in: ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core af_packet iTCO_wdt gpio_ich iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel igb ghash_clmulni_intel ptp lpc_ich cryptd pcspkr i2c_i801 mfd_core pps_core pl2303 usbserial joydev pciDriver(O) mlx4_core ioatdma dca shpchp wmi acpi_power_meter acpi_pad mperf button sg dm_mod autofs4 ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea xhci_hcd processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 [ 5743.617164] CPU: 3 PID: 13625 Comm: pci Tainted: G D IO 3.11.10-25-desktop #1 [ 5743.617165] Hardware name: Supermicro X10SRi-F/X10SRi-F, BIOS 1.0a 08/27/2014 [ 5743.617166] ffff881fdef37a48 ffffffff815a1822 ffff881fcf812600 ffffffff8159dc9b [ 5743.617167] ffff881fdef37aa8 ffffffff815a69f6 0000000000012900 ffff881fdef37fd8 [ 5743.617168] ffff881fdef37fd8 0000000000012900 ffff881fcf812600 ffff881fcf812600 [ 5743.617168] Call Trace: [ 5743.617176] [<ffffffff81004ad8>] dump_trace+0x88/0x310 [ 5743.617178] [<ffffffff81004e30>] show_stack_log_lvl+0xd0/0x1d0 [ 5743.617179] [<ffffffff8100626c>] show_stack+0x1c/0x50 [ 5743.617183] [<ffffffff815a1822>] dump_stack+0x50/0x89 [ 5743.617186] [<ffffffff8159dc9b>] __schedule_bug+0x48/0x56 [ 5743.617188] [<ffffffff815a69f6>] thread_return+0x4b4/0x4be [ 5743.617190] [<ffffffff815a7925>] rwsem_down_read_failed+0xc5/0x120 [ 5743.617193] [<ffffffff812dd4d4>] call_rwsem_down_read_failed+0x14/0x30 [ 5743.617196] [<ffffffff815a545e>] down_read+0xe/0x10 [ 5743.617199] [<ffffffff810bd71e>] acct_collect+0x3e/0x190 [ 5743.617203] [<ffffffff810547b6>] do_exit+0x846/0xa90 [ 5743.617205] [<ffffffff815a968a>] oops_end+0x9a/0xe0 [ 5743.617207] [<ffffffff81003331>] do_invalid_op+0x81/0xa0 [ 5743.617209] [<ffffffff815b0d5e>] invalid_op+0x1e/0x30 [ 5743.617213] [<ffffffff8114ab57>] remap_pfn_range+0x397/0x470 [ 5743.617217] [<ffffffffa01c17df>] pcidriver_mmap_pci+0x8f/0x140 [pciDriver] [ 5743.617222] [<ffffffff8114f850>] mmap_region+0x400/0x610 [ 5743.617225] [<ffffffff8114fd58>] do_mmap_pgoff+0x2f8/0x3b0 [ 5743.617228] [<ffffffff8113bd2e>] vm_mmap_pgoff+0x8e/0xc0 [ 5743.617231] [<ffffffff8114e497>] SyS_mmap_pgoff+0x1b7/0x240 [ 5743.617233] [<ffffffff815af4ad>] system_call_fastpath+0x1a/0x1f [ 5743.617236] [<00007fb99fd5252a>] 0x7fb99fd52529
Attachments (0)
Change History (8)
comment:1 Changed 10 years ago by
comment:2 Changed 10 years ago by
The crashing place in kernel, exactly on BUG_ON(!pte_none(*pte));
:
static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { pte_t *pte; spinlock_t *ptl; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(*pte)); set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); return 0; }
#define pte_offset_map_lock(mm, pmd, address, ptlp) \ ({ \ spinlock_t *__ptl = pte_lockptr(mm, pmd); \ pte_t *__pte = pte_offset_map(pmd, address); \ *(ptlp) = __ptl; \ spin_lock(__ptl); \ __pte; \ })
static inline int pte_none(pte_t pte) { return (pte_val(pte) & _PAGE_INVALID) && !(pte_val(pte) & _PAGE_SWT); }
comment:3 Changed 10 years ago by
This apparently happens if multiple processes try to mmap PCIe BAR space into the user-space simultaneously. It still the question - why? I have not find any documents forbidding this so far.
comment:5 Changed 10 years ago by
Workaround until it is properly resolved is to ensure that two pci commands are not executed or stopped simultaneous. Just best way would be to start streaming, sleep for a few tens of milliseconds and, then, enable AUTO triggering. Plus ensure that the streaming is long enough and it terminates after you disable AUTO triggering.
comment:6 Changed 10 years ago by
The problem is actually that mmap call does not accept any parameters specifying which BAR or KMEM to map. Therefore, it is done using multiple calls to kernel space.
- First, we select mode BAR or KMEM.
ioctl( ctx->handle, PCIDRIVER_IOC_MMAP_MODE, PCIDRIVER_MMAP_PCI )
- Then, the BAR is specified
ioctl( ctx->handle, PCIDRIVER_IOC_MMAP_AREA, PCIDRIVER_BAR0 + bar )
- And, finally, mmap is called
mmap( 0, board_info->bar_length[bar], PROT_WRITE | PROT_READ, MAP_SHARED, ctx->handle, 0 )
If another instance of pci is started in between of these commands, the settings when mmap is called will be inconsistent.
We can solve it either by locking on the process level within pci
or by trying make per-process configuration queue's.
comment:7 Changed 10 years ago by
Component: | Application → pcilib |
---|
Both DMA and PIO access seems working after crash.