(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
This is the fourth blog in a series exploring how asynchronous and multithreaded techniques can affect the performance of loading and displaying moderately large volumes of data in WPF applications. The first part showed how the async
and await
keywords alone don’t offer a panacea, and that with WPF data binding, timing can have a large impact on performance. The second part showed how minor modifications to IO settings can significantly improve performance by reducing the number of times an asynchronous method has to relinquish its thread and later regain control. The third part demonstrated some slightly unsatisfactory multithreaded approaches to the same problem. In this fourth part, I’ll show a better multithreaded solution.
Recall that last time, I showed code that produced data items on a thread pool thread. This did not perform as well as some of the asynchronous solutions, and this seems to be down to the way WPF arranges to process the items back on the dispatcher thread—it seems to end up handling them one at a time. The biggest performance improvements for the asynchronous versions came from enabling WPF to work with larger chunks of items. We need to do the same thing for our multithreaded version if it’s going to have a chance of performing well.
In short, we want our worker thread to group the items it produces into chunks, and for the dispatcher thread to receive those chunks instead of individual items. This sounds like a job for the Reactive Extensions for .NET (or Rx for short). Here’s an Rx version of our log reader:
public static IObservable<string> CreateLogReader(string folder) { return Observable.Create<string>(obs => { try { foreach (string logfile in Directory.GetFiles(folder, "*.log")) { using (var reader = new StreamReader(logfile, Encoding.UTF8, true, 65536)) { int column = -1; while (!reader.EndOfStream) { string line = reader.ReadLine(); if (line == null) { break; } string[] fields = line.Split(' '); if (line.StartsWith("#Fields")) { column = Array.IndexOf(fields, "cs-uri-stem") - 1; } else if (line[0] == '#' || fields.Length < (column + 1)) { continue; } if (column >= 0) { string uri = fields[column]; obs.OnNext(uri); } } } } obs.OnCompleted(); } catch (Exception x) { obs.OnError(x); } return Disposable.Empty; }); }
This is the same logic as before, but it now provides the values it receives to anyone who subscribes to it—it no longer decides what to do with the values itself. We can now use this with some logic that makes our requirement for chunking explicit:
private async void OnLoadClick(object sender, RoutedEventArgs e) { var logObv = LogReader.CreateLogReader(FolderPath).Publish(); var chunked = logObv.Buffer(TimeSpan.FromMilliseconds(100), 5000); var dispObs = chunked.ObserveOnDispatcher(DispatcherPriority.Background); dispObs.Subscribe(logUris => { Trace.WriteLine(logUris.Count); foreach (string uri in logUris) { _reader.LogUris.Add(uri); } }); await Task.Run(() => logObv.Connect()); await dispObs.ToTask(); }
This probably requires a little explanation. The call to Publish
returns a wrapper around our observable that lets us chain together a bunch of subscribers, while providing a Connect
method that we can call when we’re ready to kick it all off. The Buffer
method is the key here: this tells Rx that we want items from the source grouped into bunches (or ‘buffers’) that are at most 100ms apart, but which contain no more than 5,000 items. (This means that if the underlying source is producing any items at all, we won’t have to wait more than 100ms, but if it produces more than 5,000 items within 100ms, it’ll break it into multiple batches for us, with each chunk containing no more than 5,000 items. I added the 5,000 item limit because I found that giving data binding much more than this in any single chunk caused a visible degradation in responsiveness.)
Next, we subscribe to this chunked version of the data by calling ObserveOnDispatcher
. I’ve indicated that I want to process data at the Background
priority level, which ensures that my handler will only run after WPF has done more important things like handling user input and updating data bindings. (If this handler ran at a higher priority than data binding, we might not see any updates until the data has finished loading.)
Finally, I call Connect
, but I do it via Task.Run
, to ensure that the code that does the actual work of reading data out of the log files runs on a thread pool thread.
So I’ve got the potential speed benefits of a separate thread. (As we saw in part 3, in this particular application synchronous execution on a separate thread turned out to be the fastest way to read and process the data. It only fell down when we wanted to start showing results immediately.) I’ve also explicitly arranged to batch the retrieved items into reasonably large chunks if they start coming in really fast, but if the items are not coming in their thousands, we won’t sit on data for longer than 100ms, which ensures that items will become visible reasonably promptly.
The upshot is that this starts showing me data almost immediately, and finishes loading and displaying all the data in 2.3 seconds. That’s faster than my original synchronous implementation! It’s not quite as good as the later, improved synchronous version which was able to load all the data in 0.8 seconds, but that one caused the UI to freeze while it loaded the data. The multithreaded version of that also took 0.8 seconds, and avoided freezing the UI, but it waiting until it had finished loading all the data before showing anything, so it made the user wait for longer in practice.
All of the approaches that went any faster than 2.3 seconds involved preventing data binding from seeing our data until we had fully populated the collection, and you just can’t do that if you want to show some data immediately. If you want to show something as fast as possible, and the rest later, you’re committed to adding most of the rows after you’ve bound the collection to the list. So I started to wonder how much scope for improvement remained —given that we need to bind the collection before loading most of its contents, what’s the very fastest we can go?
Out of interest, I decided to try generating 180,000 lines of fake data on the UI thread after setting the DataContext
. The idea here was to find out how quickly you can load that much data into a data source that is already connected to data binding. It took 1.5 seconds (and the UI was unresponsive for that time). Since the code that loads real data took 2.3 seconds, you might think that there is still room for improvement. However, I then wondered what the minimum unavoidable overhead due to remaining responsive is.
So I tried another experiment: generating 180,000 lines of fake data on the UI thread, but returning control back out to the dispatcher’s message loop every so often. (That’s a prerequisite for remaining responsive.) This turns out to ramp up the cost surprisingly quickly. If I yield the thread just once every 50,000 items (which ends up yielding only 3 times here) it takes 1.8 seconds. Yielding every 10,000 items, which is not quite enough to keep the application feeling smoothly responsive, brings us up to 2.1 seconds. So yielding just 18 times adds some 0.6 seconds to the execution time! Yielding once every 5,000 items—comparable to my real code (although that also adds time-based chunking, so it’s not identical)—it takes 2.2 seconds.
I’m not quite sure why yielding costs so much, but it presumably has something to do with exactly how WPF defers list binding work. But the bottom line is that simply generating 180,000 lines of fake data and yielding the UI thread often enough to remain responsive takes 2.2 seconds; our Rx-based multithreaded solution takes just 2.3 seconds. (And remember, this is doing real work which takes at least 0.8 seconds to complete.) So it seems that we’re within a whisker of the best possible performance, and definitely at the point of diminishing returns.
In the final part of this series, I’ll offer some conclusions.