Active Directory/Documentation/WDS/Performance/2009 Sullivan Lab Test
The first formal performance test of centralized WDS occurred 7/10/2009. The following information is taken from the analysis and observations email summary authored by Alan Gerber.
Testing Materials
For our testing, we used two images: the first was a simplistic Windows XP image, 2881MB in size, and the second was a more typical 27565MB Windows XP lab image. We deployed to 31 computers, Dell Optiplex GX620s with Pentium 4 Hyperthreaded processors and 1GB of RAM over a 100Mbps switched network connection. All tests were performed using unicast transfers.
Procedure and Methodology
We performed two tests, which are outlined below:
Semi-serially Initiated Imaging
For this test, we booted each machine into the PXE-all WDS mode, then completed the wizard to the point where machines would begin imaging at the next step. We utilized the smaller image for this test. Once this procedure was performed for all machines, we started the imaging process on the first computer. We would then wait a few seconds, and begin imaging on the next computer, and repeated until all machines were imaging. Timing for this test was measured from the initiation of imaging on the first computer until the completion of image transfer on the last computer (ie, until the wizard completed the "Expanding image" step and moved on to the next step). This measures the amount of time it takes to transfer the image contents over the network to each computer - not the amount of time it takes to completely reimage and install the computer. This test was specifically developed with the intention of causing the filesystem cache on the WDS server to be severely mitigated from the start of the test - otherwise, in theory, the leading WDS client could request data, and the server would cache that data and hand it out to each of the other 30 clients, which could potentially result in skewed imaging performance data. Since the image size was larger than the size of the system cache (roughly about 1400MB), it was believed that this test could accomplish that goal.
Simultaneously-initiated Imaging
For this test, we booted each machine into the PXE-all WDS mode, then completed the wizard to the point where machines would begin imaging at the next step. We utilized the larger image for this test. Once this procedure was performed for all machines, we started the imaging process on all computers simultaneously - at least, as much as humanly possible. Timing for this test was measured from the initiation of imaging until the completion of image transfer on the last computer (ie, until the wizard completed the "Expanding image" step and moved on to the next step). This measures the amount of time it takes to transfer the image contents over the network to each computer - not the amount of time it takes to completely reimage and install the computer. For this test, it was expected that the server's filesystem cache would play a noticeable role in improving overall performance.
Results
Semi-serially Initiated Imaging
This test completed in approximately 15 minutes, 30 seconds. The uncertainty is due to an oversight in starting the stopwatch; at most, the variance on this number does not exceed 30 seconds. Assuming the 15:30 is correct, this results in a transfer rate of 2881MB/930sec= 3.098MB/sec per client, or a 3.098MB/sec*31= 96.038MB/sec average throughput overall. Measured in minutes, this equates to an 185.88MB/min average per client.
Simultaneously-initiated Imaging
This test completed in exactly 2 hours, 57 minutes, 26.8 seconds. To make calculations simpler, and to reflect a potential lag time in starting the stopwatch, we will assume 2:57:27 was the actual transfer time. This results in a transfer rate of 27565MB/10647sec= 2.589MB/sec per client, or a 2.589MB/sec*31= 80.259MB/sec average throughput overall. Measured in minutes, this equates to an 155.34MB/min average per client.
Additional Observations
CPU Usage
Both tests pegged the server's CPU usage at 100% for the duration of the test, and most of that CPU usage was in-kernel processing. This indicates to me that the performance bottleneck is likely to be the result of 1) the network interface (probably the load causing too many interrupts on the server for the CPU to handle), 2) the disk I/O subsystem (possibly requires too much CPU overhead), or 3) some combination of the previous items. There are other possibilities, but those are the most likely three. At maximum, almost 10,000 interrupts per second were being handled by the server.
Network Throughput
We averaged around 400Mbps throughput from the server during the tests, with the highest observed throughput being about 430Mbps, and the lowest observed around 330Mbps. All centralized WDS servers are connected via gigabit Ethernet links (1000Mbps), so we obviously haven't hit a ceiling there. There's also the additional obvious question of how much of an improvement would be seen by utilizing multicast imaging - we're still waiting to hear from ComTech regarding their support for this campus-wide.
Simultaneous Imaging/Disk Throughput
The disk subsystem reached 900 reads per second, with an average of 10.88MB per read IO request. The maximum disk queue length was measured to be 12 operations. Empirical observations seem to indicate that the server can sustain around six clients at any given time - this was found by observing the network activity on clients during the imaging process.
System Cache
The server's maximum measured memory cache faults/second, which is the number of times there is a request for a disk object that is not cached in memory, peaked at just over 11,000.
Clients
The 31 computers that were used as subjects for this test aren't the most powerful computers - in fact, they would have been cycled out of service by OIT this year had there been monies available. Lab testing on more current hardware is needed to determine if any (or how much) impact on imaging performance the clients themselves have on the imaging process.
Images & Compression
There has been some debate as to whether or not the presence of compression in the image has any impact on imaging performance, and where and when exactly the work to decompress the image data occurs. There is the possibility that using a lower image compression level (or no compression at all) will improve image transfer performance. Further research and testing is needed to determine if this is the case.
Data
Raw data is available for a limited time - until September 1, 2009. If you need access to this data after this date, please let the WDS service group know.
- Raw .blg file for import into Perfmon (46MB)
- CSV data (5.5MB)
Conclusions
I've certainly seen worse performance measurements for imaging on campus, so I'm very happy to be able to use this data to establish a baseline expectation for WDS performance characteristics. I'm sure these numbers also go a long way in allaying the fears of many on campus who were worrying that they'd have to have their labs image in an overnight or multi-day process. We can definitely allow campus users to deploy images to labs with our current infrastructure, but we will have to plan around it - we can't have everyone deploying a lab simultaneously, or it really might become a multi-day process. However, the need to plan around lab deployments isn't unreasonable at this stage of the project, and considering that the project is currently operating on donated hardware that is reasonably well-aged, I'd say that we're in a good place performance-wise.
However, it is clear that there is much that we can do to further increase imaging performance. There are still some configuration changes that we can make server-side that will probably allow us to eek out some additional performance gains. We can also look at load-balancing WDS to multiple back-end servers, and there can be no doubt that multicast imaging will improve things greatly for lab imaging scenarios as well. And with more and more campus units requesting custom image hosting, the need for newer, more powerful hardware with greater storage capabilities shouldn't be ignored, either. Despite the knowledge that there is more to be done to improve performance, I feel confident that we're moving in a positive direction.
Acknowledgements
Thanks to the entire campus community for bearing with us through the inconvenience of performing this testing, ECE and ITECS for donating the hardware that we have so that we could get this project started, and specifically to Tom Farwig and Rob Blanke in OIT Technology Support Services Learning Space Support for making Sullivan Lab available, their assistance during these tests, and for their ongoing support of this centralized WDS project.