Opened 13 years ago
Closed 13 years ago
#141 closed defect (fixed)
Slow performance on ufosrv1
| Reported by: | Matthias Vogelgesang | Owned by: | Suren A. Chilingaryan |
|---|---|---|---|
| Priority: | major | Milestone: | ufo-core-0.2 |
| Component: | Infrastructure | Version: | |
| Keywords: | Cc: | Suren A. Chilingaryan, Tomas Farago, David Haas, Tomy Rolo, Matthias Vogelgesang |
Description
David and Tomas investigated the performance of the framework on the UFO server. Unfortunately, it is pretty disappointing. So for a rather small data set, it takes about 580 seconds using a single GTX 580. This is more than an order of magnitude slower than the 12.5 seconds it takes on my desktop machine using a similar GTX 580.
Now, I profiled a bit here and there but could not yet find the real source of the problem. To me it is unclear, if it's a problem with the server itself or the software.
Attachments (0)
Change History (9)
comment:1 Changed 13 years ago by
| Component: | ufo-core → Infrastructure |
|---|---|
| Owner: | changed from Matthias Vogelgesang to Suren A. Chilingaryan |
| Status: | new → assigned |
comment:2 Changed 13 years ago by
Can you share results of your profiling? Where the time is actually spent?
comment:3 Changed 13 years ago by
| Cc: | Tomy Rolo added |
|---|
comment:4 Changed 13 years ago by
| Cc: | Matthias Vogelgesang added |
|---|
Matthias can you provide profiling information. I'd like to know which of the filters uses this time (i.e. is it I/O or computations). oprofile log will be helpful.
Alternatively, provide the dataset and script causing problems I'll do testing myself.
comment:5 Changed 13 years ago by
oprofile is not very helpful, it is showing me that most of the time is spent in the kernel, followed by calls to libOpenCL.so and the pthreads library. Anyway, we have to postpone the investigation for some days, because we need the server untouched at the TopoTomo? beamline.
comment:6 Changed 13 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Suren's change that reverted the NVIDIA CUDA version back to 4.2 fixed the problem. Now it takes about 20 seconds to reconstruct.
comment:7 Changed 13 years ago by
| Resolution: | fixed |
|---|---|
| Status: | closed → reopened |
I have to resurrect this ticket once again and it should stay open until all (performance) problems are fixed. It is clear that startup for multiple GPUs is bad. However, even when restricting the number of GPUs to one, it is extremely bad for ufosrv1. I did some measurements concerning context setup, program compilation, kernel creation, buffer creation and cleanup. On each machine I restricted the number of GPUs to one (except for the AMD machine) and enabled persistence mode:
| Machine | Setup | Compilation | Kernel | Buffer | Cleanup |
| my desktop | 0.06s | 0.000210s | 0.000007s | 0.000007s | 0.031s |
| ufosrv1 | 3.8s | 0.000166s | 0.000007s | 0.000005s | 3.6s |
| compute1 | 0.75s | 0.000162s | 0.00007s | 0.000006s | 0.3s |
| compute2 | 0.02s | 0.09s | 0.000015s | 0.000035s | 0.001371s |
| kepler | 0.36s | 0.000091s | 0.000004s | 0.000004s | 0.00916s |
With all GPUs enabled the twins behave like this:
| ufosrv1 | 20.32s | 0.000683s | 0.000008s | 0.000005s | 19.06s |
| compute1 | 7.87s | 0.000937s | 0.000009s | 0.000006s | 1.75s |
comment:8 Changed 13 years ago by
- Updated to driver to 313.18. Seems to be a bit faster.
- Fancy. I traced ocl application with ltrace, there is a lot of consecutive calls to random number generator. I guess this that the NVIDIA driver does most of these 20 seconds.
comment:9 Changed 13 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | reopened → closed |
It's considerably better now in terms of startup time. Run-time performance could be a bit better (especially compared to my desktop) but I will close this ticket again.
![(please configure the [header_logo] section in trac.ini)](/ufo/chrome/site/your_project_logo.png)
I just checked the same data set on
ipepdvcompute1and it rushed through in 14.9 seconds. I have to assume that there's something wrong with the server hardware rather than the software.