(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
.NET 4.5 added asynchronous language features for C# and VB. For the most part, this has made it much easier to improve a user interface’s responsiveness—you can use asynchronous APIs to perform potentially slow work in a way that will not cause your user interface to freeze, and yet you can use simple programming techniques that look very similar to those used in single-threaded code. However, this is not a panacea. There are some kinds of slow, IO-oriented work (i.e., the kind of work that often benefits most from asynchrony) where a simple application of these techniques won’t help as much as you might hope.
For example, you can run into trouble if you’re doing something that’s slow, but not quite slow enough. If you want to display a large amount of data from a file, a naive asynchronous (or, for that matter, multithreaded) approach can run into problems. The async
and await
keywords deal easily with long waits, but if you’re doing something slightly more busy you may need to apply these techniques with a bit more subtlety. This is the first in a series of blog posts exploring these issues.
The other parts can be found here:
The following code reads web server .log
files. It reads all the files in a folder, and picks out the cs-uri-stem
column, putting the result in an ObservableCollection<string>
. We can bind that to a ListBox
to show all of the URLs that have been fetched. (It’s a little crude—it ignores query strings for example, but for the log data I’m looking at, that happens not to matter, and the processing details aren’t the main point here.)
using System; using System.IO; using System.Collections.ObjectModel; public class LogReader { public LogReader() { LogUris = new ObservableCollection<string>(); } public ObservableCollection<string> LogUris { get; set; } public void ReadLogs(string folder) { foreach (string logfile in Directory.GetFiles(folder, "*.log")) { using (StreamReader reader = File.OpenText(logfile)) { int column = -1; while (!reader.EndOfStream) { string line = reader.ReadLine(); if (line == null) { break; } string[] fields = line.Split(' '); if (line.StartsWith("#Fields")) { column = Array.IndexOf(fields, "cs-uri-stem") - 1; } else if (line[0] == '#' || fields.Length < (column + 1)) { continue; } if (column >= 0) { string uri = fields[column]; LogUris.Add(uri); } } } } } }
I have a folder with about 40MB of log files, containing about 180,000 log entries. If I process this with my LogReader
, and use it as the data source for a WPF ListBox
, my aging desktop system takes about 2.6 seconds to load the information in that folder.
That’s clearly slow enough to be annoying, so you might think this would be an obvious candidate for going asynchronous. We’re loading a moderately large quantity of data from disk, so at least some of the work will be IO bound, and that’s where async usually shines.
At this point, an overenthusiastic developer might think “Ooh—a
sync
and await
!” and make the obvious modifications. With .NET 4.5 and C#5 this is pretty simple. First, we need to change the signature of the method, adding the async
keyword to enable the language features, and returning a Task
to enable asynchronous completion, error reporting, and composition with other tasks:
public async Task ReadLogsAsync(string folder)
We also need to change the line of code that reads from the file in the middle of the loop to use the asynchronous alternative to ReadLine
:
string line = await reader.ReadLineAsync();
With these changes in place, the UI remains responsive while the data loads, and we start seeing the log entries in the UI immediately. Those are two useful improvements on the synchronous version, but we’ve paid a significant price: the code now takes about 8.5 seconds to run.
Comparing this with the synchronous code’s 2.6 seconds is a little misleading. For one thing, the synchronous version takes more like 4 seconds the very first time it runs. (My system drive is a pair of SSDs, but these logs live on my larger data drive, which is of the spinning rust variety.) It only gets up to full speed once the files are in the OS’s cache, and under those circumstances, asynchronous code has much less to offer. The whole point is to free up threads while slow IO is in progress, but if everything you need is already in memory, the IO won’t be so slow. So we’re being slightly unfair on the asynchronous version by removing its main opportunity to show an advantage. However, since the IO only slows things down by about 1.4 seconds when things aren’t in the cache, it’s hard to see how the asynchronous code might make up the deficit even in more favourable conditions.
Now this might scare you off the C# asynchronous language features, because it makes them seem slow, but this is not a fair test. The reason this takes so much longer is that we’ve given the program much more work to do. When the simple, synchronous version runs on the UI thread, WPF does very little immediate work for each item we add to the LogUris
collection. Data binding will detect the change—we’ve bound a ListBox
to that collection, so it’ll be looking for change notification events—but WPF won’t fully process those changes until our code has finished with the UI thread. Data binding defers its work until the dispatcher thread has no higher priority work to do.
However, when we run the asynchronous version, the ReadLineAsync
method will keep relinquishing the thread—it’ll do that every time the StreamReader
class’s ReadLineAsync
method does not complete immediately. (Even though the files are in the cache, the OS still has to copy the data into our process, so although file reads are very quick, they will not all complete synchronously.) And each time our method relinquishes the thread, WPF’s data binding takes that opportunity to process all the work that it had deferred.
In the non-async case, WPF processes all of the additions to LogUris
as a single bulk operation, but by going async we’ve made it process lots of smaller batches. This does have two benefits: log entries start appearing in the list almost immediately, and the application remains responsive to user input while the data loads. But we’ve paid a heavy price for it: it now takes over three times as long for the operation to finish.
We can verify that this is behind the slow down by modifying the UI code that does the data binding. Instead of putting the LogReader
in the DataContext
when the app starts up, we can wait until after the log data has been read:
private async void OnLoadClick(object sender, RoutedEventArgs e) { await _reader.ReadLogsAsync(FolderPath); DataContext = _reader; }
This means that data binding doesn’t get to see the data source until we’ve finished loading the data. With that modification, it only takes about 2.8 seconds to run. This implies that there’s a small overhead for using asynchronous code—it takes 0.2 seconds longer than the synchronous version, a barely perceptible increase when you’ve already been waiting 2.6 seconds. And in exchange for that, the UI remains responsive while we’re waiting for the work to complete. (However, as we’ll see shortly, this result is slightly misleading.)
Although this is much faster than the 8.5 second case, we’ve lost something: that slower example produced useful results in the UI much faster. In fact, a user might prefer the slower version in practice, because if useful data appears immediately, you might not even notice that it takes three times longer to finish populating the list—it was probably going to take a lot longer than 8.5 seconds to scroll down through the whole list anyway. So by one important measure, the naive asynchronous method is better: it provides useful information to the user sooner.
I said earlier that comparing this 2.8 second asynchronous code with our original 2.6 second synchronous version wasn’t entirely fair. We can try the same trick of holding back the data source until we’re ready with the synchronous case:
private void OnLoadClick(object sender, RoutedEventArgs e) { _reader.ReadLogs(FolderPath); DataContext = _reader; }
This drops the time to 1 second! So it turns out that although the original synchronous code was forcing WPF to update the ListBox
in a single batch operation, there were still significant data binding overheads. Binding to the LogUris
collection caused WPF to attach a change notification handler. And because we did this before populating the list, this handler ran for each of the 180,000 entries added to that list. Although that change handler ultimately deferred the work until our code relinquished the UI thread, it turns out that arranging to defer work is not completely free. Of the 2.6 seconds the original version took, only 1 second of that actually went into reading and processing the data—the rest was all spent in data binding’s event handlers!
So the asynchronous version isn’t looking so good now. It’s a lot faster than our first 8.5 second effort, but the 2.8 second version now looks pretty slow in comparison with the equivalent synchronous code’s 1 second.
In fact, this is still not a fair comparison. For now, I’ll leave you to ponder why. I shall explain next time.