| (1 item) | |
| (1 item) | |
| (5 items) | |
| (1 item) | |
| (1 item) | |
| (2 items) | |
| (2 items) | |
| (4 items) | |
| (1 item) | |
| (6 items) | |
| (2 items) | |
| (4 items) | |
| (1 item) | |
| (4 items) | |
| (2 items) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (2 items) | |
| (2 items) | |
| (5 items) | |
| (3 items) | |
| (1 item) | |
| (1 item) | |
| (1 item) | |
| (3 items) | |
| (1 item) | |
| (1 item) | |
| (2 items) | |
| (8 items) | |
| (2 items) | |
| (7 items) | |
| (2 items) | |
| (2 items) | |
| (1 item) | |
| (2 items) | |
| (1 item) | |
| (2 items) | |
| (4 items) | |
| (1 item) | |
| (5 items) | |
| (1 item) | |
| (3 items) | |
| (2 items) | |
| (2 items) | |
| (8 items) | |
| (7 items) | |
| (3 items) | |
| (7 items) | |
| (6 items) | |
| (1 item) | |
| (2 items) | |
| (5 items) | |
| (5 items) | |
| (7 items) | |
| (3 items) | |
| (7 items) | |
| (16 items) | |
| (10 items) | |
| (27 items) | |
| (15 items) | |
| (15 items) | |
| (13 items) | |
| (16 items) | |
| (15 items) | 
This is the third blog in a series exploring how asynchronous and multithreaded techniques can affect performance when loading and displaying moderately large volumes of data in WPF applications. The first part showed how simply adding the async and await keywords does not necessarily deliver the performance you might hope for. It also showed that the exact moment at which you choose to make information visible to WPF data binding can have a large impact on performance. The second part showed how minor modifications to IO settings can speed things up significantly by reducing the number of times an asynchronous method has to relinquish its thread and later regain control. In this third part, I’ll show some multithreaded approaches to the same problem.
This uses the same examples as before—I’m using the LogReader class introduced in part 1, with the IO tweaks applied in part 2. By the way, all the timings I’m reporting here are for the first run after starting the program, because that’s more relevant to perceived UI performance than the more usual benchmarking approach of averaging hundreds of runs and discarding the outliers. The outliers are the ones your end users will notice, so for UI work, the first and worst cases are typically much more important than the average (and the first case is often also the worst case). In fact, I think that the common approaches to measuring performance can do great harm to responsiveness in practice, but that’s a topic for another post.
This next bit of code uses the TPL to run the synchronous log reading code on a thread pool thread. It uses the C# 5 asynchronous language support to wait for that work to complete, so that it can put the reader into the data binding context when the work completes.
private async void OnLoadClick(object sender, RoutedEventArgs e) { await Task.Run(() => _reader.ReadLogs(FolderPath)); DataContext = _reader; }
From one point of view, this gets the best performance yet: the UI remains responsive, and it only takes 0.8 seconds to read the data.
On the other hand, we have to wait until all the processing work is done before we see any results in the UI. Instead of asking “How long did that take?” I could argue that the more important question is “How long are we making the user wait?” By that reckoning, the original naive asynchronous implementation comes out on top. It may take 3.2 seconds (once we’ve tweaked the buffer size) but it produces usable results virtually immediately. (It seems to take about 0.01 seconds to show something, at which point the limiting factor is how long it takes your video hardware to get around to refreshing the screen.)
But you may be thinking perhaps we can get the best of both worlds? Could we show some results immediately, and still be finished in around a second? It might occur to you to modify our last attempt by binding before running, i.e., going back to putting the reader in the data context before setting it running. But that turns out not to be a great plan.
If you try swapping the two lines in the last method, so that the Task.Run executes after we put the reader in the data context, you’ll get the classic WPF cross-thread collection binding error: a NotSupportedException with an error message of “This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread.” Although WPF is perfectly happy to support individual property changes from random threads, it doesn’t like that with collection changes. 
However, with .NET 4.5, Microsoft made it possible to avoid this problem. You must protect all access to the collection with some sort of locking mechanism, which you must make available to WPF. There are various ways to do that. I’ll add a property to hold an object used for locking, and I’ll make my code acquire that when it updates the collection:
public class LogReader { public LogReader() { LogUris = new ObservableCollection<string>(); CollectionLock = new object(); } public object CollectionLock { get; private set; } public ObservableCollection<string> LogUris { get; set; } public void ReadLogs(string folder) { ... lock (CollectionLock) { LogUris.Add(uri); } ...
I also have to tell WPF that this is how the collection is protected. I’ve done that with the following in my code-behind’s constructor:
BindingOperations.EnableCollectionSynchronization( _reader.LogUris, _reader.CollectionLock);
With this in place, we can safely update the log data collection after having put the reader into the data context. The reading occurs on a worker thread, enabling the UI to remain responsive, and thanks to enabling the cross-thread collection change handling, we get our immediate UI updates.
However, this turns out to be slower than even the naive asynchronous version. Measuring the time here has turned out to be difficult, because WPF seems to queue up all of the work required for collection updates, and I’ve not found a reliable way to discover the exact moment at which it finishes draining that queue, but using a manual stopwatch I’ve found that it takes over 5 seconds to finish populating the list. (It seems to be about the same with or without the 64K buffer IO tweak, by the way. The overheads of cross-thread collection change handling are so dominant that the previously effectively modification is now lost in the noise.) That’s a lot slower than our naive asynchronous implementation was after the buffer size tweak—that took 3.2 seconds to load all the data (and it showed something immediately too).
The problem with this multithreaded approach is that we’ve lost any chance for batching. The asynchronous approaches shown in earlier parts of this series were all single-threaded. Because everything happened on the UI thread, we were forcing data binding to process our updates in chunks because we were only relinquishing the thread after having produced a chunk of items. But by moving the log reading to a separate thread, we’ve lost that influence over the UI thread.
Mind you, getting data binding to process things in chunks purely by monopolizing the dispatcher thread is a somewhat questionable technique. Perhaps if we were a bit more deliberate about things we could do better. So I’ll show a more explicit approach to chunking in the next part of this series.
I was going to end this post here, but then I received some feedback.
Petr Onderka sent me an email in response to the second part of this series, in which he suggested an alternative way to avoid the overhead of funneling too much work through the WPF dispatcher. He pointed out that the Task-based Asynchronous Pattern provides a way to disable that, with the following simple modification to the code that reads a line from a log file:
string line = await reader.ReadLineAsync().ConfigureAwait(false);
That call to ConfigureAwait declares that we don’t care about which context the method continues on. The upshot is that when a read that cannot complete immediately does eventually finish, the deferred execution of the rest of the method will happen on a thread pool thread. This means our await no longer incurs any WPF dispatcher overhead. But of course, it also means that all our list updates will happen on a worker thread, so we’ll need to use the same tricks as before to avoid problems: either we’ll need to wait until we’re done before making the list visible to data binding, or we’ll have to enable cross-thread change notification handling.
If we use the first approach—setting the DataContext after the work is complete—the ConfigureAwait solution in conjunction with the 64K StreamReader buffer takes 1.1 seconds. This is not a massive improvement: the comparable asynchronous version without ConfigureAwait took just 1.2 seconds. But remember, we had already neutralized most of the dispatcher costs—using a 64K buffer reduced the number of threads switches from about 40,000 to about 600. If I choose not to use the 64K buffer, then without ConfigureAwait we saw last time that it took 2.8 seconds (as long as we set the DataContext after loading the data). In this case, adding ConfigureAwait makes a bigger difference, bringing it down to 1.8 seconds.
So to summarize, ConfigureAwait on its own makes a useful difference—it gets us from 2.8 seconds down to 1.8. But using 64K buffers alone was significantly more effective, getting us down to 1.2 seconds. Combining the techniques can improve matters slightly further, but it’s marginal, bringing us down to 1.1 seconds. This is not as good as a synchronous code running on a worker thread, which took only 0.8 seconds. The asynchronous version still has to pay a price for continuing after an await expression. Taking the WPF dispatcher out of the picture helps, but it doesn’t bring the cost down to zero, so in any case, the most important thing is to reduce the number of times that potentially asynchronous operations are unable to complete immediately.
What about the alternative scenario, in which we set the DataContext first? (Remember, we need to do that if we want to start seeing initial results immediately, instead of having to wait until the work is complete.) If we’re using ConfigureAwait to avoid continuing via the dispatcher thread, our collection updates will happen on a worker thread, so we’ll need to enable cross-thread change notification, just like in the task-based multi-threaded solution. Unsurprisingly, we hit exactly the same problem: we’ve lost the ability to force WPF to handle the changes in reasonably large chunks, and it seems to process each change individually. So the irony is that by adding a ConfigureAwait call to prevent switching to the dispatcher thread in our own code, we actually cause a lot more work to be delivered through the dispatcher. It just happens in a different place (inside data binding’s cross-thread collection change handling).
As with the multi-threaded version shown earlier, I can’t time this scenario accurately, because I’ve not found a way to discover in code precisely when WPF finishes handling the updates. But using a manual stopwatch, this seems to take about 6 seconds. So that’s slightly slower than the using synchronous code on a separate thread, and much slower than our simple asynchronous implementation with buffer size tweaks applied (3.2 seconds).
The only comparison that makes ConfigureAwait look good is if we go back to the original naive asynchronous implementation without 64K buffers in which we set the data context before starting work. That took 8.5 seconds, and 6 seconds is obviously an improvement on that. But tweaking buffer sizes was a more fruitful approach because that got the asynchronous data-context-first approach down to 3.2 seconds. (As with in the data-context-last scenario, the ConfigureAwait option seems not to get a measurable benefit from buffer size changes, taking 6 seconds in either case.) So if we want immediate results, then once we’ve applied our most effective modification—changing the buffer size—we are better off keeping things on the UI thread. ConfigureAwait is not a good choice in this scenario.
Next time, I’ll show how we can use Rx to take explicit control of how we group our changes into chunks, and manage the number of context switches. This will enable us to get the benefit of immediate results, but with overall performance significantly closer to that of synchronous code.