Sabtu, 30 Agustus 2008

Managing Multi-Core Projects, Part 3: Multi-Core Development in the Enterprise

Download , Hardware , Linux , Source Code , Tips Trik Tidak ada komentar :

Part 3 of our series on managing multi-core development focuses on finding parallelism in service-oriented systems that will come after the client-server era. Multi-core means opportunities for a better customer experience—if you take the right perspective.
There are many different ways to look at the advantages of multi-core processors, so as you watch the rollout of quad-core processors this year you need to make sure that you keep the pair of glasses on that gives you just the right perspective on them. While it's true that quad-core will mean a more efficient data center, more flops per watt, and more transactions per second per square foot, what that means isn't the same to everybody.

For an application or service development manager, those particular metrics are examples of thinking from the wrong perspective. That's because for a computer center manager, it's all about efficiency, and that means maximizing throughput. But for a development manager, it has to be all about customer experience, and that means minimizing latency. Multi-core processors can provide the processing power to keep up with customer demands, but only to the extent that you can apply parallel programming to build faster-responding services and applications.

SOA and grids represent a kind of "macro concurrency," concurrency expressed in business services or compute nodes. Multi-core represents an opportunity for "micro concurrency," which brings parallelism down to a single server or on a single node. You'll need to take advantage of this low-level parallelism to get the most performance out of each transaction and out of the overall system.

In the brave new world of SOA, Web 2.0, and SaaS, how do you make use of multi-core servers to deliver the best customer experience? For each of these architectures, performance depends on efficient communication among a set of loosely-coupled services. It's in managing that communication overhead, in minimizing I/O latency as well as maximizing compute performance, that the new multi-core servers will have the greatest effect.

Two Kinds of Latency
Take the trendiest of trends, a Web 2.0 mashup, as an example. Say you build a Web application that combines data from several sources and provides an integrated dashboard, with some visualization tools. In addition to providing your customers with new capability, you've shifted the burden of page assembly from the clients to your new service.

Depending on the kinds of data manipulation required, combining data in this way can be data intensive, I/O intensive, or both. The mashup can't respond any faster than the services we're drawing from, so the slowest of these determines our minimum latency. What we need to do is to minimize both compute latency and the I/O latency of concurrent communication with other services. We can address both kinds of latency with threading, using data decomposition to structure compute threads and functional decomposition to structure I/O threads.

If the service is calculation intensive, look for parallel segments that can be split across threads using data decomposition and thread pools. Use asynchronous messaging to minimize the impact of I/O on overall latency. At a high level, proper I/O threading for interconnected services means making the process as asynchronous as possible and designing the service so that the main processing thread continues to run with minimal blocking on I/O.

At a low level, and where I/O performance is critical—in communication between nodes of a grid, for example—threading can reduce I/O latency by increasing the efficiency of messaging. Messaging is an abstraction that makes development of parallel programs easier, but a poor messaging library implementation can doom performance with excessive consumption of memory and memory bandwidth. It takes careful threading and a good API to minimize latency in getting the message off the network and into memory that the handling thread can access.

When working with a third-party messaging library, it’s hard to gauge efficiency unless you can run benchmarks. As a starting point, look for asynchronous messaging when comparing libraries.

When you thread for data parallelism, you can evaluate the result of the parallel computation and develop confidence in the correctness of your implementation. With functional decomposition for I/O concurrency, executing services concurrently, correctness is just as important, but more difficult to evaluate. You'll want to schedule significant test time early in the development cycle, with simulated services, so you can work out bugs in I/O threads before introducing dependencies on real online services.

Thread pools are a familiar part of any server system. Thread pools reduce thread management overhead, balancing the time impact of expensive thread creation with the memory impact of maintaining idle threads. Thread pools should be considered in any multithreaded design, including those for multi-core systems.

Where possible, take advantage of thread pool mechanisms provided by the platform (by the .NET CLR or by OpenMP, for example, or the Java 5 ThreadPoolExecutor). On .NET, using the CLR thread pool can provide a significant performance advantage to managed code versus unmanaged code using Win32 threads. The advantage can be enough that, contrary to what you might expect, a threaded C# application can be faster than an equivalent C++ implementation.

Getting There
The next round of high-end servers is going to be based on multi-core processors, so it's important to start thinking about how multi-core affects your development process now. Whether you're updating single-threaded legacy J2EE code or starting a new service development project from scratch, work out your multi-core strategy at the very start, before you even get into planning the details of the project itself.

Choosing the right approach starts with recognizing where you are, in terms of both the code you're starting with and the threading experience and expertise of your team. If your team has little experience, reduce the scope of the project by paring features to an absolute minimum. Forget the fine details and focus on major functionality. Find the shortest path through development and get the project to market, or into the hands of an early-adopter, as quickly as you can. In this way, you'll get the product out the door and begin to develop some threading expertise throughout your team. Because it's a small project, you'll also minimize the impact of the almost inevitable schedule slippage that comes with learning new development practices, and have an opportunity to improve your own scheduling skills for future multithreaded projects.

The minimal project approach can be applied to performance improvements of single-threaded legacy code as well as to new development efforts. With legacy code, you'll also need to plan time up front to explore how threading for data or functional decomposition might better performance. Tools such as Intel's VTune can help you find serial sections that might benefit from parallelism. The trend to more core and less clock makes threading so much more critical to performance that you might revisit threaded approaches that you'd considered not worth the development effort on single-core, single-processor servers.

With a code base that's already threaded, focus on incremental improvements. Again, multi-core increases the value of threading when measured against other performance strategies. Look for other areas that could be threaded, ways to reduce message overhead, or more highly-parallel data processing algorithms. Tools can help here, too, in particular profilers like Intel Thread Profiler.

Regardless of where you're starting from, you'll get the greatest advantage from multi-core if you can create an emphasis on threading and parallelism throughout the project lifecycle. That means not just planning for multi-core, but developing an integrated approach where the parallelism model that's adopted early in the process is improved through feedback from testing and tuning.

In our next installment, we'll push further for an integrated approach to parallelism. Then we'll break down the development cycle for enterprise software projects and discuss the steps you can take to in each phase to ensure the best multi-core performance (Internet.COm)