OpenUH: an optimizing, portable OpenMP compiler

of 11

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
11 pages
0 downs
2 views
Share
Description
OpenUH: an optimizing, portable OpenMP compiler
Tags
Transcript
    OpenUH: An Optimizing, Portable OpenMP Compiler Chunhua Liao, Oscar Hernandez, Barbara Chapman, Wenguang Chen ∗ and Weimin Zheng ∗ Department of Computer ScienceUniversity of HoustonHouston, TX, 77204, USA http://www.cs.uh.edu Technical Report Number UH-CS-05-27November 28, 2005 Keywords:  Hybrid OpenMP compiler, OpenMP translation, OpenMP runtime library Abstract OpenMP has gained wide popularity as an API for parallel programming on shared memory anddistributed shared memory platforms. Despite its broad availability, there remains a need for a portable,robust, open source, optimizing OpenMP compiler for C/C++/Fortran 90, especially for teaching andresearch, e.g. into its use on new target architectures, such as SMPs with chip multithreading, as well aslearning how to translate for clusters of SMPs. In this paper, we present our efforts to design and implementsuch an OpenMP compiler on top of Open64, an open source compiler framework, by extending its existinganalysis and optimization and adopting a source-to-source translator approach where a native back endis not available. The compilation strategy we have adopted and the corresponding runtime support aredescribed. The OpenMP validation suite is used to determine the correctness of the translation. Thecompiler’s behavior is evaluated using benchmark tests from the EPCC microbenchmarks and the NASparallel benchmark. ∗ Wenguang Chen and Weiming Zheng are with Computer Science Department, Tsinghua University, China.  1 OpenUH: An Optimizing, Portable OpenMPCompiler Chunhua Liao, Oscar Hernandez, Barbara Chapman, Wenguang Chen ∗ and Weimin Zheng ∗ Abstract OpenMP has gained wide popularity as an API for parallel programming on shared memory and distributedshared memory platforms. Despite its broad availability, there remains a need for a portable, robust, open source,optimizing OpenMP compiler for C/C++/Fortran 90, especially for teaching and research, e.g. into its use on newtarget architectures, such as SMPs with chip multithreading, as well as learning how to translate for clusters of SMPs. In this paper, we present our efforts to design and implement such an OpenMP compiler on top of Open64,an open source compiler framework, by extending its existing analysis and optimization and adopting a source-to-source translator approach where a native back end is not available. The compilation strategy we have adopted andthe corresponding runtime support are described. The OpenMP validation suite is used to determine the correctnessof the translation. The compiler’s behavior is evaluated using benchmark tests from the EPCC microbenchmarksand the NAS parallel benchmark. IndexTerms Hybrid OpenMP compiler, OpenMP translation, OpenMP runtime library I. I NTRODUCTION OpenMP [1], a set of compiler directives and runtime library routines, is the de-facto programming standard forparallel programming in C/C++ and Fortran on shared memory and distributed shared memory systems. Its popularitystems from its ease of use, incremental parallelism, performance portability and wide availability. Recent researchat language and compiler levels, including our own, has considered how to expand the set of target architecturesto include recent system configurations, such as SMPs based on Chip Multithreading processors [2], as well asclusters of SMPs [3]. However, in order to carry out such work, a suitable compiler infrastructure must be available.In order for application developers to be able to explore OpenMP on the system of their choice, a freely available,portable implementation would be desirable.Many compilers support OpenMP today, including such proprietary products as the Intel Linux compiler suite,Sun One Studio, and SGI MIPSpro compilers. However, their source code is mostly inaccessible to researchers andthey cannot be used to gain an understanding of OpenMP compiler technology or to explore possible improvementsto it. Several open source research compilers (Omni OpenMP compiler [4], OdinMP/CCp [5], and PCOMP [6])are available. But none of them translate all of the source languages that OpenMP supports, and one of them isa partial implementation only. Therefore, there remains a need for a portable, robust, open source and optimizingOpenMP compiler for C/C++/Fortran 90, especially for teaching and research into the API.In this paper, we describe the design, implementation and evaluation of OpenUH, a portable OpenMP compilerbased on the Open64 compiler infrastructure with a unique hybrid design that combines a state-of-the-art optimizinginfrastructure with a source-to-source approach. OpenUH is open source, supports C/C++/Fortran 90, includesnumerous analysis and optimization components, and is a complete implementation of OpenMP 2.5. We hope thiscompiler (which is available at [7]) will complement the existing OpenMP compilers and offer a further attractivechoice to OpenMP developers, researchers and users.The reminder of this paper is organized as follows. Section 2 describes the design of our compiler. Section 3presents details of the OpenMP implementation, the runtime support as well as the IR-to-source translation. Theevaluation of the compiler is discussed in Section 4. Section 5 reviews related work and the concluding remarksare given in Section 6 along with future work. ∗ Wenguang Chen and Weiming Zheng are with Computer Science Department, Tsinghua University, China.  2 II. T HE  D ESIGN OF  O PEN UHBuilding a basic compiler for OpenMP is not very difficult since the fundamental transformation from OpenMPto multithreaded code is straightforward and there are already some open source implementations that may serveas references. However, it is quite a challenge to build a complete, robust implementation which can handle realapplications. But such a compiler is indispensable for real-world experiments with OpenMP, such as consideringhow a new language feature or an alternative translation approach will affect the execution behavior of a variety of important codes. Given the exceptionally high cost of designing this kind of compiler from scratch, we searchedfor an existing open-source compiler framework that met our requirements.We chose to base our efforts on the Open64 [8] compiler suite, which we judged to be more suitable than, inparticular, the GNU Compiler Collection [9]. Open64 was open sourced by Silicon Graphics Inc. from its SGIPro64 compiler targeting MIPS and Itanium processors. It is now mostly maintained by Intel under the name OpenResearch Compiler (ORC) [10], which targets Itanium platforms. Several other branches of Open64, including ourown, have been created to translate language extensions or perform research into one or more compilation phases.For instance, the Berkeley UPC compiler [11], extends Open64 to implement UPC [12]. Open64 is a well-written,modularized, robust, state-of-the-art compiler with support for C/C++ and Fortran 77/90. The major modules of Open64 are the multiple language frontends, the interprocedural analyzer (IPA) and the middle end/back end, whichis further subdivided into the loop nest optimizer (LNO), global optimizer (WOPT), and code generator (CG).Five levels of a tree-based intermediate representations (IR) called WHIRL exist in Open64 to facilitate theimplementation of different analysis and optimization phases. They are classified as being Very High, High, Mid,Low, and Very Low levels, respectively. Most compiler optimizations are implemented on a specific level of WHIRL.For example, IPA and LNO are applied to High level WHIRL while WOPT operates on Mid level WHIRL. Twointernal WHIRL tools were embedded in Open64 to support the compiler developer; one was  whirlb2a , used toconvert whirl binary dump files into ASCII format, and the other was  whirl2c/whirl2f  , to translate Very High andHigh level WHIRL IR back to C or Fortran source code. However, the resulting output code was not compilable.The srcinal Open64 included an incomplete implementation of the OpenMP 1.0 specification, inherited fromSGI’s Pro64 compiler. Its legacy OpenMP code was able to handle Fortran 77/90 code with some OpenMP featuresuntil the linking phase. The C/C++ frontend of Open64 was taken from GCC 2.96 and thus could not parseOpenMP directives. Meanwhile, there was no corresponding OpenMP runtime library released with Open64. Aseparate problem of Open64 was its lack of code generators for machines other than Itaniums. One of the branchesof Open64, the ORC-OpenMP [13] compiler from Tsinghua University that was worked on by two of the authors of this paper, tackled some of these problems by extending Open64’s C frontend to parse OpenMP constructs and byproviding a tentative runtime library. Another branch working on this problem was the Open64.UH compiler effortat the University of Houston, worked on by the remainings authors of this paper. It focused on the pre-translationand OpenMP translation phases. A merge of these two efforts has resulted in the OpenUH compiler and associatedTsinghua runtime library. More recently, a commercial product based on Open64 and targeting the AMD x8664,the Pathscale EKO compiler suite [14], was released with support for OpenMP 2.0.The Open64.UH compiler effort designed a hybrid compiler with object code generation on Itaniums and source-to-source OpenMP translation on other platforms. The OpenUH compiler described in this paper uses this design,exploits improvements to Open64 from several sources and relies on an enhanced version of the Tsinghua runtimelibrary to support the translation process. It aims to preserve most optimizations on all platforms by recreatingcompilable source code right before the code generation phase.Fig. 1 depicts an overview of the design of OpenUH. It consists of the frontends, optimization modules, OpenMPtransformation module, a portable OpenMP runtime library, a code generator and IR-to-source tools. Most of these modules are derived from the corresponding srcinal Open64 module. It is a complete compiler for Itaniumplatforms, for which object code is produced, and may be used as a source-to-source compiler for non-Itaniummachines using the IR-to-source tools. The translation of a submitted OpenMP program works as follows: first, thesource code is parsed by the appropriate extended language frontend and translated into WHIRL IR with OpenMPpragmas. The next phase, the interprocedural analyzer (IPA), is enabled if desired to carry out interproceduralalias analysis, array section analysis, inlining, dead function and variable elimination, interprocedural constantpropagation and more. After that, the loop nest optimizer (LNO) will perform many standard loop analyses andoptimizations, such as dependence analysis, register/cache blocking (tiling), loop fission and fusion, unrolling,  3 automatic prefetching, and array padding. The transformation of OpenMP, which lowers WHIRL with OpenMP Front-end(C/C++ & Fortran 77/90)LNO(Loop Nest Optimizer)IPA(Inter Procedural Analyzer)Native compilers Source code w/ OpenMP directives A Portable OpenMPRuntime library Source code w/ OMP RTL callsExecutablesLinking The Open64 infrastructure WHIRL2C & WHIRL2F (IR-to-source for none-Itanium ) CG(code generator for Itanium)WOPT(global scalar optimizer)Object filesLOWER_MP (Transformation of OpenMP )   Fig. 1. OpenUH: an optimizing and portable OpenMP compiler based on Open64 pragmas into WHIRL representing multithreaded code with OpenMP runtime library calls, is performed after LNO.The global scalar optimizer (WOPT) is subsequently invoked. It transforms WHIRL into an SSA form for moreefficient analysis and optimizations and converts the SSA form back to WHIRL after the work has been done. Alot of standard compiler passes are carried out in WOPT, including control flow analysis (computing dominance,detecting loops in the flowgraph), data flow analysis, alias classification and pointer analysis, dead code elimination,copy propagation, partial redundancy elimination and strength reduction.The remainder of the process depends on the target machine: for Itanium platforms, the code generator in Open64can be directly used to generate object files. For a non-Itanium platform, the  whirl2c  or  whirl2f   translator will beinvoked instead; in this case, code represented by Mid WHIRL is translated back to compilable, multithreaded C orFortran code with OpenMP runtime calls. A native C or Fortran compiler must be invoked on the target platformto complete the translation by compiling the output from OpenUH into object files. The last step is the linking of object files with the portable OpenMP runtime library and final generation of executables for the target machine.III. T HE  I MPLEMENTATION OF  O PEN MPBased on our design and the initial status of Open64, we needed to focus our attention on developing or enhancingfour major components in order to implement OpenMP: frontend extensions to parse OpenMP constructs and convertthem into WHIRL IR with OpenMP pragmas, the internal translation of WHIRL IR with OpenMP directives intomultithreaded code, a portable OpenMP runtime library supporting the execution of multithreaded code, and theIR-to-source translators, which needed work to enable them to generate compilable and portable source code.To improve the stability of our frontends and to complement existing functionality, we integrated features fromthe Pathscale EKO 2.1 compiler. Its Fortran frontend contains many enhancements and the C/C++ frontend extendsthe more recent GCC 3.3 frontend with OpenMP parsing capability. The GCC parse tree is extended to representOpenMP pragmas and is translated to WHIRL IR to enable later phases to handle it. The following subsectionsdescribe our OpenMP translation, runtime library and IR-to-source translators.  A. OpenMP Translation An OpenMP implementation transforms code with OpenMP directives into corresponding multithreaded codewith runtime library calls. A key component is the strategy for translating parallel regions. One popular methodfor doing so is outlining, which is used in most open source compilers, including Omni [4] and OdinMP/CCp [5].Outlining denotes a strategy whereby an independent, separate function is generated by the compiler to encapsulate  4 the work contained in a parallel region. In other words, a procedure is created that contains the code that will beexecuted by each participating thread at run time. This makes it easy to pass the appropriate work to the individualthreads. In order to accomplish this, variables that are to be shared among worker threads have to be passedas arguments to the outlined function. Unfortunately, this introduces some overheads. Moreover, some compileranalyses and optimizations may be no longer applicable to the outlined function, either as a direct result of theseparation into parent and outlined function or because the translation may introduce pointer references in place of direct references to shared variables.The translation used in OpenUH is different from the standard outlining approach. In it, the compiler generates amicrotask to encapsulate the code lexically contained within a parallel region, and the microtask is nested (we alsorefer to it as inlined, although this is not the standard meaning of the term) into the srcinal function containingthat parallel region. The advantage of this approach is that all local variables in the srcinal function are visibleto the threads executing the nested microtask and thus they are shared by default. Also, optimizing compilers cananalyze and optimize both the srcinal function and the microtask, thus providing a larger scope for intraproceduraloptimizations than the outlining method. A similar approach named the Multi-Entry Threading (MET) technique [15]is used in Intel’s OpenMP compiler. Original OpenMP Code Outlined Translation int main(void) { int a,b,c; #pragma omp parallel private(c) do_sth(a,b,c); return 0; } Inlined (Nested) Translation  _INT32 main() { int a,b,c;  /*inlined (nested) microtask */ void __ompregion_main1() { _INT32 __mplocal_c;  /*shared variables are keep intact, only substitute the access to private variable*/ do_sth(a, b, __mplocal_c); } …  /*OpenMP runtime call */ __ompc_fork(&__ompregion_main1); … }  /*Outlined function with an extra argument for passing addresses*/ static void __ompc_func_0(void **__ompc_args){ int *_pp_b, *_pp_a, _p_c;  /*dereference addresses to get shared variables */ _pp_b=(int *)(*__ompc_args); _pp_a=(int *)(*(__ompc_args+1));  /*substitute accesses for all variables*/ do_sth(*_pp_a,*_pp_b,_p_c); } int _ompc_main(void){ int a,b,c; void *__ompc_argv[2];  /*wrap addresses of shared variables*/ *(__ompc_argv)=(void *)(&b); *(__ompc_argv+1)=(void *)(&a); …  /*OpenMP runtime call has to pass the addresses of shared variables*/ _ompc_do_parallel(__ompc_func_0, __ompc_argv); … } Fig. 2. OpenMP translation: outlined vs. inlined Fig. 2 illustrates each of these strategies for a fragment of C code with a single parallel region, and shows in detailhow the outlining method used in Omni differs from the inlining translation in OpenUH. In both cases, the compilergenerates an extra function (the microtask   ompregion main 1()  or the outlined function  ompc func  0() ) aspart of the work of translating the parallel region enclosing  do sth ( a,b,c ) . In each case, this function representsthe work to be carried out by multiple threads. Each translation also adds a runtime library call (  ompc fork () or  ompc do parallel () , respectively) into the main function, which takes the address of the compiler-generatedfunction as an argument and executes it on several threads. The only extra work needed in the translation to thenested microtask is to create a thread-local variable to realize the private variable  c  and to substitute this for  c  inthe call to the enclosed procedure, which now becomes  do sth ( a,b, mylocal c ) . The translation that outlines theparallel region has more to take care of, since it must wrap the addresses of shared variables  a  and  b  in the mainfunction and pass them to the runtime library call. Within the outlined procedure, they are referenced via pointers.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks