A bit of Plasma profiling.
dimanche 19 octobre 2008 à 20:47 :: #30 :: rss
I tend to have lot of windows: between 50 and 100 windows spreads over 9 virtual desktops. My taskbar only show the windows from the current virtual desktop (I don't understand why this is not the default yet).
Switching from desktop to desktop is very fast, but in KDE 4.1, the taskbar take several seconds to update the windows. This is not acceptable for my use. This blog will explain how I did to make it fast and use this as an example/tutorial on how to profile with valgrind/callgrind so KDE developers can keep their applications fast...
The first rule of profiling is to work in release mode. We don't want be perturbed by the debug slowness. [1]
I will profile using the callgrind tool from valgrind. And kcachegrind[2] to display the results.
So let's run plasma with callgrind:
valgrind --tool=callgrind --instr-atstart=no plasma --nofork
--tool=callgrind indicate we want to start the the callgrind, --instr-atstart=no specify that I don't want to start the instrumentation of callgrind while starting because I'm only interested in the profiling while switching desktop. --nofork is a argument to plasma to specify that KUniqueApplication should not fork.
Ok, now plasma is running and is a bit slower than usually. Only a bit because instrumentation is currently off, when instrumentation will be on, it will become much slower.
In order to start instrumentation I use callgrind_control. In another konsole tab :
sleep 3 && callgrind_control -i on && sleep 20 && callgrind_control -i off
This wait 3 seconds, then start the instrumentation, then I will have 20 seconds to do some desktop switches before instrumentation will be turned off again automatically.
Once the instrumentation finished we can analyze them. Go back to the previous konsole tab, and press Ctrl+C to quit plasma[3].
There is now a file called callgrind.out.<pid> which contains all the profiling information. Let's open it with kcachegrind
kcachegrind callgrind.out.31452
On the left you should have a "flat profile" dock where you see all functions sorted by their total cost. Let's double click on the most expensive one (QEventLoop::exec). This will make it the current function. On the bottom right there is a tab called "call graph". This is the most important thing. We will go down the graph to find what are the most time consuming function.
One can quickly identify two main consuming task: QGraphicsScene::drawItems which paint the applet, and the WindowTaskItem::updateTask.

Let's start with the later issue: updateTask is called 103 times. This is a lot, but is normal because while changing desktop most of the window change their state. The problem is that we ask for the icon in few different size. So that mean lots of calls to Task::Icon. But out of those 289 icons, only 28 are using the KIconLoader. Loading icon is an expensive task, which is why we have KIconCache which cache most of the icon. However, all the time is spent for only 13 icons which are not in the cache. Why aren't those icons in the cache? After some investigations, this happen for applications without icons[4]. So the solution is to add the icon in the cache even if they are not found
The other slow thing is the drawning. Plasma::PanelSvg::mask is a very expensive operation, and is called for each paint. The solution here is to cache the result
(And in Qt 4.5, the QPixmap::createMaskFromColor will be faster.)
After those modifications we can run plasma again and notice a big change. But let's continue. By profiling again, we can see that the KColorScheme constructor is very expensive. It does lot of colors computation. And this constructor is called too often, specially in the Pager paint event. Let's cache the KColorSheme instance. [5]
Another thing I noticed is kDebugDevNull which is called very often. Because all the accessors of KWindowInfo are protected by a kWarning(condition, ...)[6]. While warning output is disabled this would do nothing, but even in release mode warnings might still be enabled. When the condition is false, a kDebugDevNull instance is created, which is cheap but not free because it constructs a QTextStream. And because those functions are called very often it becomes expensive. Let's replace it by if (condition) { kWarning() << ... } instead.

When switching desktop the taskbar updates almost immediately now, which is fine. But after running plasma for long, I noticed it becomes slow again. This happened because the same signal was re-connected to the same slot each time a task is removed from the taskbar. So when the signal was emitted (in that case it was taskChanged) the slot was called a lot and take lot of time. It is easy to fix by making sure the signal is connected only once .
There is still room for improvements, but now desktop switching is fast enough and the taskbar changes immediately.
Notes
[1] But it's still good to have the debug information available so it is possible for exemple to browse the source code and have decent function name
[2] included in the kdesdk package
[3] Alternatively one could use callgrind_control -d
[4] I run some of them, including xterm
[5] Another solution would be to have some level of cache from within KColorScheme itself.
[6] Which display a warning only if the condition is met







Commentaires
1. Commentaire de Andre
Le dimanche 19 octobre 2008 à 21:26
2. Commentaire de xavier
Le dimanche 19 octobre 2008 à 21:41
3. Commentaire de Dominic
Le dimanche 19 octobre 2008 à 22:33
4. Commentaire de strohel
Le dimanche 19 octobre 2008 à 22:33
5. Commentaire de Aurélien Gâteau
Le dimanche 19 octobre 2008 à 23:30
6. Commentaire de SSJ
Le lundi 20 octobre 2008 à 00:19
7. Commentaire de TheBlackCat
Le lundi 20 octobre 2008 à 00:21
8. Commentaire de Aaron Seigo
Le lundi 20 octobre 2008 à 03:17
9. Commentaire de gttt
Le lundi 20 octobre 2008 à 09:13
10. Commentaire de vp
Le lundi 20 octobre 2008 à 21:28
11. Commentaire de asdf
Le dimanche 26 octobre 2008 à 23:29
Ajouter un commentaire
Les commentaires pour ce billet sont fermés.