(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
This is the sixth blog in a series exploring how asynchronous and multithreaded techniques can affect the performance of loading and displaying moderately large volumes of data in WPF applications. In this final entry, I’ll describe what I think are the most important lessons to learn from all of this.
The elephant in the room throughout this discussion has been the fact that I’ve been attempting to show 180,000 items in a ListBox
. This is terrible user interface design. (Anything more than a few hundred items causes usability problems.) The scrollbar thumb becomes a very blunt instrument at this scale, and finding particular items would be a real pain. ListBox
is the wrong idiom here, and it’s amazing that WPF coped as well as it did with this frankly ridiculous scenario.
The original example that motivated this series didn’t do this. The processing was more complex, reducing a large volume of data down to a much smaller set of summarized results. I didn’t want the details of the processing work to become a distraction, so I simplified that aspect of the work in the example I showed. This had the side effect of multiplying some of the numbers involved, making certain effects easier to see. (The performance impact of a change becomes more obvious when all your timings are in seconds rather than tenths of a second.) It has also had a distorting effect, but the principles are valid—the conclusions that can be drawn from this artificial example were useful in the real scenario.
So what were those conclusions?
The first and perhaps most obvious conclusion of this series is that the asynchronous language features added in .NET 4.5 are not a silver bullet. Although I was able to improve responsiveness with a very small change in part 1, making simple use of the async
and await
keywords, this also incurred a considerable cost.
Of course, not everything will slow down by a factor of three—this particular example performs a large number of fairly small operations, which exacerbates the problem. That’s not a result of trying to put 180,000 items in a list by the way. The original problem didn’t generate that much output, but it still performed a large number of small asynchronous operations.
The asynchronous language features are brilliant, but as with all tools, you need to understand how they work. They can have a profound impact on the way your code executes.
Some of the largest performance improvements came from being careful about when exactly to make information visible to WPF data binding. This is not strictly anything to do with either multithreading or asynchrony—even the original synchronous, single-threaded version of the code saw roughly a x2.5 speedup by simply presenting WPF with all of the data in one lump instead of one item at a time.
This issue isn’t unique to data binding. In all the various XAML frameworks, you can benefit from a ‘bottom-up’ approach to constructing your UI content. If you want to add a complicated bit of content to your UI, it’s best to build it completely before you attach it to the containing UI. With the alternative, top-down approach, you add each new element as the child of an element that’s already on screen, and the problem with that is that WPF (or Silverlight or WinRT, or whatever) ends up doing certain work across the whole UI to accommodate that new element. It’s smart enough to defer some work, but there are things it can’t avoid.
(If you’re wondering what top-down vs. bottom-up looks like in practice, suppose you’re adding 10 new elements, e.g., a StackPanel
containing 9 children. Top-down, you’d start by adding the StackPanel
to the main UI, and then add each of the children to that panel. That makes the framework do certain whole-UI work 10 times. But if you add the 9 child elements to the StackPanel
before adding that panel to the live UI—a bottom-up approach—the whole-UI work only needs to be done once.)
The broad principle here is to give the UI framework updates in relatively large chunks, because this enables it to perform certain jobs once per chunk, instead of once per element.
The single most important performance factor in these examples was the cost of context switches. Since this was a WPF application, that meant regaining control of the dispatcher thread when an asynchronous operation completes, so that the method executing an await
can continue. (Attempting to finesse this with ConfigureAwait made things worse. The context switch had to happen somewhere, and taking it out of the await
expression just pushed it into the data binding system where we were unable to control it.) The nature of the context switch will depend on the environment—if you use async
and await
inside an ASP.NET application, for example, the way it recovers the context is quite different (and can be cheaper than a WPF-dispatcher-based callback). But the basic idea remains: any time an await
does work that cannot complete immediately, you’re going to have to pay a cost when the operation finally completes. ConfigureAwait
may reduce that cost, but it won’t eliminate it, and if you truly need to do certain work back on the original context, you’re going to pay the price somewhere.
The flip side of this is that when await
does not in fact need to wait, it’s very efficient. (Pre-C# 5, the usual explicit continuation style for using the TPL doesn’t offer this benefit. If you want code that handles the no-wait case as efficiently as await
without actually using await
, you have a lot of work to do.) So there are often significant performance benefits to be had by increasing the likelihood of await
expressions being able to complete immediately. In this series, simply increasing the StreamReader
buffer size had that effect, producing dramatic performance enhancements (much better than those ConfigureAwait
could have offered even if it had been viable).
Multithreading can sometimes produce particularly egregious amounts of context switching when applied in a user interface. (This is not a new problem. With Win32, the difference between sending a message to an HWND from its owning thread and sending the same message from a worker thread could easily wipe out any potential performance gains that multithreading might have offered. Once you’re on a different thread from your UI elements, you need to start being careful about the granularity of your updates. That’s as true with modern UI frameworks as it has been for years with Win32.)
That’s not to say multithreading is bad. In the end, the very best performance came from a multithreaded solution. The key is to minimize your context switches, chunking reasonably large amounts of work into each one to amortize the costs. Some sometimes it pays to be explicit about the chunking, which brings me to my final conclusion.
The Reactive Extensions for .NET are great. Since Rx became available, I think I’ve used it in every project I’ve worked on. This is why I dedicated a whole chapter to it in my recent(ish) total rewrite of O’Reilly’s Programming C#.
In this particular case, we only used a couple of its features. First, Rx provided a convenient way to chunk the data. Its Buffer
operator provides a declarative way to say: I’d like items batched into groups, but I never want a delay longer than 100ms before it gets displayed, and if items are being produced really quickly, split them into batches of no larger than 5,000. That just takes a single method call. Second, Rx also provided a way to control exactly where the context switch happened in the processing pipeline.
But this is just tiny fragment of what you can do. If you are not familiar with Rx, learn it. You owe it to yourself.