The pairs where the CreateFile operation has a detail of “Desired Access: Generic Read”, seem to be where the contents of the file are read (the first of these pairs is highlighted in the screenshot).Some of these pairs, where the detail for the CreateFile operation is “Desired Access: Read Attributes”, seem to be for reading file metadata and are very fast.There are obvious pairs of CreateFile/CloseFile operations.For the Power Query engine it seems as though the Process Name is.
I had the “Enable data preview to download in the background” option turned off just to make sure I only saw activity relating to the refresh.Some things to note about the output shown above: …to watch what happened when the unoptimised query refreshed:
There are a lot of resources online describing Process Monitor (I also took a look at this book, which is very helpful, as is the help file) and it’s also fairly intuitive, so I’m not going to attempt to explain the basics of how to use it, but with a bit of trial-and-error I set up a simple filter consisting of: The rest of this post contains the results of my experiments and the conclusions I have drawn from them, which may or may not be correct! In that post (which I suggest you read before you carry on) I mentioned that it looked as though the Power Query engine was reading data from one of the source files multiple times and Process Monitor confirms that this indeed the case.ĭisclaimer: I am not going to pretend to be an expert on Process Monitor, the Windows file system or the internals of Power Query. Take, for example, the scenario I described in my recent post on improving the performance of merge operations. Process Monitor, a free tool from Microsoft, allows you to monitor file system activity in real-time and even having spent a limited amount of time using it I can already tell that it can provide a lot of information to help identify performance issues with file-based data sources. With relational data source like SQL Server you can use tools like SQL Server Profiler to see the queries that are being run by Power Query, and I blogged recently about using Fiddler to troubleshoot OData performance issues but what about file-based data sources, which often present the most challenges regarding performance? Troubleshooting Power Query performance issues in Power BI and Excel can be difficult because it’s a bit of a black box: there’s nothing in the UI to tell you what’s going on inside the Power Query engine and the diagnostic logs are very difficult to interpret.