Wikipedia:Reference desk/Archives/Computing/2019 January 20

= January 20 =

Causes of poor performance of particular applications
This question is motivated by a specific issue I was having; however, I'm curious more generally also.


 * 1) I have a Lenovo Thinkpad X1 5th Generation with 8GiB RAM, 256 GiB NVME drive, an i5 Kaby Lake CPU, and an on-die CPU. I've noticed that editing PowerPoint files can be slow, especially when there are many equations and/or animations. My question: which component(s) are most likely to be responsible for this slowness (alternative view: changing which component(s) would result in the greatest speed-up) and why?
 * How, in general, can one know or even guess what the bottleneck (I think that's the term) for a particular application might be? Is the performance of e.g. the symbolic algebra facility of Mathematica most dependent on having a particular type of CPU? What about the performance of Google Chrome with many tabs?--Leon (talk) 16:31, 20 January 2019 (UTC)

In general, the way you figure out what the bottleneck is is to first instrument the application. You can do this with a stopwatch and a particularly slow powerpoint, or you can get more sophisticated with some sort of scripting. See if it speeds up the second or third time you try it so you don't get fooled later.

Now try reducing/increasing resources. Take out half of the RAM. Same speed? RAM isn't the bottleneck. Overclock or underclock the CPU by 5% Did your test run 4% or 5% slower or faster? Then the CPU is the bottleneck. A small, cheap external USB 2.0 hard drive is a great tool to see if disk IO is your bottleneck.

Be aware that if you get rid of a bottleneck something else becomes the new bottleneck, and you have to retest everything. --Guy Macon (talk) 17:21, 20 January 2019 (UTC)


 * But is there any heuristic I can apply when e.g. buying a PC for a certain set of tasks? I cannot benchmark every application with different hardware, but I know what applications I use regularly.--Leon (talk) 09:14, 21 January 2019 (UTC)


 * No. There is no heuristic. And I am having trouble understanding why you responded with "I cannot benchmark every application with different hardware, but I know what applications I use regularly" after I specifically advised to to benchmark with the applications that you feel are too slow on your current hardware.


 * Because I'd invalidate the warranty on my laptop by opening it!--Leon (talk) 18:42, 22 January 2019 (UTC)


 * The reason that there is no no heuristic is because there is no magical way to look at an app and determine what is bottlenecking it without actually benchmarking it. How would the heuristic know that playing Deus Ex: Mankind Divided needs a fast video card, that playing Stockfish uses up CPU power, or that playing Batman Arkham Knight requires a lot of RAM? And how would it know that even when badly bottlenecked Stockfish will still kick your ass?).


 * There is no magic shortcut. Here are your options:
 * Do your own testing and experimenting.
 * Find an expert who already knows and has published what he knows.
 * Become that expert yourself.
 * Throw money at the problem by buying a new system with more storage, more RAM, and faster everything.
 * Accept the low performance you already have.
 * --Guy Macon (talk) 14:34, 21 January 2019 (UTC)


 * Bring up Task Manager (ctrl-alt-del) and select the "processes" tab. Then run your program and see what percentage of the CPU, percentage of the memory, and percentage of the disk access, etc, it is using. See if one of them is close to 100%.  Bubba73 You talkin' to me? 23:50, 21 January 2019 (UTC)


 * That was easy, and actually somewhat interesting.


 * Based on resource utilisation figures, for certain sluggish tasks e.g. Geogebra animations, I found that the CPU was utilised heavily and the GPU only slightly. Whilst I did not see figures approaching 100% for CPU utilisation overall, I am aware that many (most?) applications are not very parallel, so I wouldn't expect to. This was surprising given that the CPU is relatively strong compared to the (dreadful) GPU, and I presumed that the animation would be taxing on both. Additionally, disabling hyperthreading in the firmware settings appeared to speed up Geogebra, at least without other applications running in both cases.--Leon (talk) 18:42, 22 January 2019 (UTC)


 * Yes, that i5 Kaby Lake should be good enough. But you said that you disabled hyperthreading - the i5 doesn't have hyperthreading.  But I forgot something in the CPU usage.  It will approach 100% only if all cores are being used fully.  Since the i5 has four cores, not hyperthreaded, if the CPU use is about 25%, it is probably single-threaded and maxing out one core. Bubba73 You talkin' to me? 00:28, 24 January 2019 (UTC)


 * Not the ultra low voltage version in a Thinkpad: it has two hyperthreaded cores, as does the i7.--Leon (talk) 16:39, 24 January 2019 (UTC)


 * OK, I was familiar with Sandy Bridge and Ivy Bridge, which had no hyperthreading.  But I don't see why turning it off would improve performance.  Bubba73 You talkin' to me? 04:48, 25 January 2019 (UTC)
 * Because hyperthreading comes at a price in register and ALU availability, as well as in cache hit rate. So performance for single-threaded tasks can be lower if hyperthreading is enabled. --Stephan Schulz (talk) 00:54, 27 January 2019 (UTC)
 * I didn't realize that, but it makes sense. Most of my stuff is single-threaded, but it looks like the only way to turn off hyperthreading is in the BIOS (i.e. no easy way to turn it on and off). Bubba73 You talkin' to me? 06:39, 27 January 2019 (UTC)
 * Hyperthreading started with the observation that a single thread did not usually use all on the ALU (because at any one time it didn't execute the perfect mix of operations supported). So the designers doubled the control logic and the back then very few registers (on x86) and executed two threads, to better utilize the expensive component. A good idea in principle, but if there is only on floating-point adder, only one of the threads can use it at any given time. Since then, things have changed, too - logic has become cheaper (so you can build in more adders), and memory bandwidth (and hence cache performance) more of a bottleneck. But on the other hand, modern OSes have also improved, and most of them should use a scheduler that understands virtual cores, real cores, and different processors, and that will smartly schedule threads to mimimize cache contention and to maximise locality. If you are in a situation with only a single compute-expensive thread, the impact should be minimal (the OS can simply schedule all the other stuff onto a different physical core). But if your machine is heavily loaded, performance per thread can suffer significantly, even if the number of active threads is a lower than that of virtual cores. --Stephan Schulz (talk) 09:41, 27 January 2019 (UTC)