Context Navigation

← Previous Ticket
Next Ticket →

Modify ↓

#90 closed defect (postpone)

Low streaming performance

Reported by:	Matthias Vogelgesang	Owned by:	Suren A. Chilingaryan
Priority:	major	Milestone:
Component:	pcilib	Version:
Keywords:		Cc:	Suren A. Chilingaryan

Description

The results from the benchmark tool are somewhat disappointing and probably caused by something we haven't yet under total control. Here are my observation:

Off-line decoding of frames takes about 3.5 to 4.5 ms which is enough to decode > 200 frames per second.
The overhead of libuca is negligible: running the benchmark with the mock camera (streaming 640x480 images at 8-bit) results in more than 60000 frames/s or ~ 17 GB/s bandwidth.
Running the benchmark with the ufo camera, we achieve miserable 34 frames/s at 10 bit and 24 frames/s at 12 bits, each at 0.00001s exposure time. However, this is with the "synchronous" calls.

Unfortunately, I cannot single out a specific reason for this performance.

Attachments (1)

dma.png (402.8 KB) - added by Uros Stevanovic 13 years ago.: graph depicting PC-DAQ data taking

Download all attachments as: .zip

Change History (4)

Changed 13 years ago by Uros Stevanovic

Attachment:	dma.png added

graph depicting PC-DAQ data taking

comment:1 Changed 13 years ago by Uros Stevanovic

It seems that PC-DAQ takes unusually long time to read the data. Frames are acquired using stimuli. DMA engine is immediately enabled and instructed to get the data. Frames are stored in ~100ms (32 frames), but it takes additional ~80ms to read all the data (see attachment). Data are not stored, nor decoded. Frames are acquired using UFO4 firmware, but same behavior is observed using UFO5 firmware. This may influence the results reported above.

Last edited 13 years ago by Uros Stevanovic (previous) (diff)

comment:2 follow-up: 3 Changed 13 years ago by Suren A. Chilingaryan

Resolution:	→ postpone
Status:	new → closed

I.e. according to Uros numbers we are reading 32 frames in 180ms. The frame size is currently about 18MB (due to 4x time increase in 12bit mode). This gives us approximately 3200 MB/s which is maximum of DMA engine if I remember correctly. Of course, this still should result in 150 fps at least, but...

OK. Now back to Matthias numbers. Using iss-suren1 we got about 30 frames per second in 12 bit mode. It gives us 540 MB/s per second. Measured memcpy performance on this PC is 4.61 GB/s. The default processing path of pcitool make 2 memory copies: first to free DMA buffer ASAP, second during decoding. LibUCA, as I can see, makes another copy. With 18MB frame size, it is obvious it is not preserved in the L2 cache any more. Now lets compute how much time we need for 32 frames:
DMA: 180ms
Memcopy: 3 x 124ms
Decoding: 32 '*' 5ms
=======
Overall: 712ms

OK. I can't directly tell where goes another 250ms (25% of time). But the numbers are pretty reasonable. Now, what we shall do:

We shall use a PC with high-speed memory. memcpy at ipecamera is about 7 GB/s and we can have even faster memory.
The fast-path of pcitool should be used (rawcallback). In this mode pcitool will not do any memory copies itself but send data to the specified callback as it comes in. This will eliminate 2 unnecessary memory copies.
The ufodecode should implement streaming interface. So, we will be able to use L2 cache even for large frame sizes.
I think the DMA engine can be tuned as well.

Now, I don't want to make this work twice. Therefore, I don't want to start tuning this things until we get a full-speed test system which is promised by Michele in October-November.

For this reason, I'm closing this ticket with postpone resolution. A significant architecture change in multiple components is required to achieve higher speed. We need a full-speed test bed to start the work.

comment:3 in reply to: 2 Changed 13 years ago by Matthias Vogelgesang

Replying to csa:

For this reason, I'm closing this ticket with postpone resolution. A significant architecture change in multiple components is required to achieve higher speed. We need a full-speed test bed to start the work.

Could you please not close the tickets but rather create a new milestone and assign the ticket to it?

Modify Ticket

Change Properties

Summary:
Type:		Priority:
Milestone:		Component:
Version:		Keywords:
Cc:	Set your email in Preferences

Action

leave as closed The owner will remain Suren A. Chilingaryan.

reopen The resolution will be deleted. Next status will be 'reopened'.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences.

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: