wiki:upiv_profiling

Benchmarks

we use the first and second image (transposed and multiplied by 32) from http://katrin.kit.edu/data/3dtracker/benchmark/synthetic/60_Luft_0_10_20_70.tif as benchmark images. the results by using Thorsten's program (by Jul. 2014) is documented

original image

  • See attachments for image (1,2)

processed images

  • See attachments for result (1,2) from Thorsten's program (by Aug. 2014)

circles found in images

Result Image 1         |      Result Image 2
(70,1199)  = 1  px     |     (73,1197)  = 6  px
(74,481)   = 2  px     |     (80,913)   = 10 px
(80,913)   = 10 px     |     (112,423)  = 5  px
(112,423)  = 5  px     |     (154,344)  = 3  px
(157,743)  = 4  px     |     (158,744)  = 3  px
(160,337)  = 11 px     |     (162,579)  = 5  px
(163,578)  = 5  px     |     (165,333)  = 5  px
(173,330)  = 11 px     |     (194,309)  = 10 px
(191,301)  = 3  px     |     (221,196)  = 7  px
(221,196)  = 8  px     |     (259,533)  = 10 px
(259,534)  = 10 px     |     (265,997)  = 4  px
(264,997)  = 5  px     |     (299,1144) = 10 px
(292,896)  = 2  px     |     (321,413)  = 11 px
(301,1141) = 8  px     |     (322,1042) = 2  px
(322,1042) = 2  px     |     (394,1011) = 3  px
(322,409)  = 7  px     |     (410,450)  = 3  px
(410,451)  = 3  px     |     (443,903)  = 6  px
(451,432)  = 7  px     |     (445,747)  = 3  px
(451,663)  = 7  px     |     (450,432)  = 8  px
(483,981)  = 10 px     |     (452,663)  = 7  px
(488,722)  = 3  px     |     (486,981)  = 10 px
(523,867)  = 6  px     |     (522,871)  = 10 px
(526,1171) = 10 px     |     (525,1171) = 10 px
(535,1107) = 10 px     |     (534,1107) = 10 px
(565,1012) = 6  px     |     (564,1015) = 6  px
(628,199)  = 4  px     |     (626,201)  = 3  px
(707,598)  = 9  px     |     (707,597)  = 8  px
(713,568)  = 4  px     |     (712,568)  = 4  px
(722,966)  = 3  px     |     (741,774)  = 4  px
(741,775)  = 4  px     |     (76,482)   = 10 px
(774,346)  = 3  px     |     (797,1111) = 0  px
(798,1117) = 4  px     |     (878,344)  = 3  px
(900,483)  = 3  px     |     (918,674)  = 6  px
(917,677)  = 4  px     |


Result Image 1.2       |      Result Image 2.2
(75,483) = 10 px       |     (70,1201) = 2 px
(79,912) = 10 px       |     (76,482) = 10 px
(155,742) = 4 px       |     (79,912) = 10 px
(163,582) = 1 px       |     (157,734) = 0 px
(165,330) = 4 px       |     (163,582) = 1 px
(221,202) = 2 px       |     (165,334) = 6 px
(263,1001) = 2 px      |     (191,303) = 3 px
(300,1144) = 10 px     |     (220,195) = 6 px
(321,413) = 13 px      |     (265,995) = 1 px
(409,446) = 1 px       |     (299,1144) = 10 px
(447,437) = 11 px      |     (321,412) = 12 px
(449,657) = 10 px      |     (325,1045) = 4 px
(484,981) = 10 px      |     (410,451) = 0 px
(521,870) = 10 px      |     (450,430) = 6 px
(535,1107) = 10 px     |     (451,664) = 6 px
(564,1016) = 3 px      |     (484,981) = 10 px
(627,206) = 6 px       |     (525,1171) = 10 px
(704,601) = 11 px      |     (534,1107) = 10 px
(738,774) = 2 px       |     (564,1016) = 3 px
(799,1117) = 4 px      |     (626,201) = 0 px
                             (707,598) = 9 px
                             (714,571) = 1 px
                             (737,774) = 3 px
                             (797,1109) = 1 px

profile of GPU kernels

Profiling application: /usr/local/MATLAB/R2013b/bin/glnxa64/MATLAB -r auto3d_track_tif_average4_gpu(1, 1) -nodesktop
Profiling result:

Time(%)      Time     Calls       Avg       Min       Max  Name
 79.21%  583.413s       480  1.21544s  867.25ms  1.81715s  MSortFloat(float const *, int, int2, float*, int2, int2, float, float)
 13.61%  100.225s       984  101.86ms  19.782ms  195.21ms  LoadElementsF(float const *, int2, int2, int2, unsigned char const *, float*)
  7.14%  52.5967s       504  104.36ms  18.956ms  143.72ms  BitonicSortFloatWD(float const *, float*, int2, int, unsigned int, unsigned int, int2, int2, float, float)
  0.02%  175.51ms        24  7.3130ms  7.2417ms  7.5373ms  SortBitonicMedian(unsigned short const *, unsigned short*, int2, int, unsigned int, unsigned int, int2, int2, float)
  0.01%  69.727ms        84  830.08us  785.32us  1.9273ms  [CUDA memcpy DtoH]
  0.01%  57.600ms       166  346.98us     800ns  786.79us  [CUDA memcpy HtoD]
  0.01%  37.227ms        24  1.5511ms  1.4893ms  1.6385ms  LoadElements144(unsigned short const *, int2, int2, int2, unsigned short*)
  0.00%  647.71us         2  323.86us  321.73us  325.98us  FromBack(unsigned short const *, unsigned short const *, int2, double*)
nvprof matlab -nodesktop -r 'auto3d_track_tif_average4_gpu(1, 1)'  750.38s user 179.94s system 98% cpu 15:48.89 total

profile of matlab code

Function Name	        Calls	Total Time	Self Time*
Ordfilt (MEX-file)	656	6423.917 s	6423.917 s	
selectDevice	        658	940.299 s	940.299 s	
multiSearch     	16	280.003 s	0.010 s	
centerSearch	        293	279.993 s	6.642 s	
createProfile_advanced	35160	238.218 s	238.218 s	
polyfit	                35160	35.113 s	5.410 s	

Current results using UFO

In the attachments you can find some results using the UFO framework and the amount of time that was spent to compute each resulting image. You can use the following command to extract the data.

$ tar xfvz bench.tar.gz
Last modified 11 years ago Last modified on Oct 23, 2014, 11:20:11 AM

Attachments (9)