As an embedded developer who may have an application running on a single processor and would like to improve your performance or performance per watt by moving to multicore, you’d likely want to know and answer to this question: Is there a magic bullet? In other words, can you just move your application on to a multicore platform and it will automatically run faster?
Looking for the magic bullet. If you’re using a personal computer or server with multiple applications, it is possible to automatically get performance improvements because the different applications are independent.
Yes, they share the file system on the hard disk and a few other things managed by the OS, but they don’t share data and need to synchronize with each other to perform their services. As long as they are independent and don’t interfere with each other, you likely get performance improvements. However, with an increasing number of cores, you see diminishing returns in this type of system.
In contrast, if you are working with an embedded system, your application would have to automatically be distributed across multiple cores to achieve performance improvement. All applications have some code that is inherently sequential and they generally have areas that can be run concurrently where synchronization is required.
Different parts of an application may also share global variables and may use pointers to reference those global variables. When distributing an application across multiple cores, you gain true concurrency.
This means that global variables that could safely be accessed in a single processor situation, have to be safeguarded to avoid the multiple cores accessing the variables at the same time and corrupting the data.
You may now be wondering how you can combine speed, planning, continuous assessment and adjustment in a multicore approach to achieve desired results in the shortest period of time.
You must first start by selecting an appropriate programming model for a current project that potentially offers long-term cost savings in future projects if done right.
Four common models that are familiar to embedded system programmers are OpenMP (Open Multiprocessing), OpenCL (Open Computing Language), as well as two message passing-based protocols: MPI (Message Passing Interface) and MCAPI (Multicore Communications API).
OpenMP is commonly used for loop processing across multiple cores in SMP environments. OpenMP is a language extension using preprocessor directives and is used in systems with general purpose SMP Operating Systems such as Linux, as well as in systems with shared memory and homogeneous cores.
While embedded systems may include general purpose computing elements, they often have some requirements better served with asymmetrical approaches, which go beyond the focus area of OpenMP. The OpenMP Architecture Review Board (ARB), a technology consortium, manages OpenMP.
OpenCL is primarily used for GPU programming. It is a framework for writing programs that span CPUs, GPUs (graphics processing unit) and potentially other processors. OpenCL provides a path to execute applications that go beyond graphics processing, for example vector and math processing, covering some areas of embedded computing. The OpenCL open standard is maintained by the Khronos Group.