DISTRIBUTED ALGORITHMS AND THEIR COMMUNICATION COSTS
3
Author(s):
ARVIND SINGH , RAKESH KUMAR KATARE , R.K TIWARI
Vol - 5, Issue- 12 ,
Page(s) : 6 - 20
(2014 )
DOI : https://doi.org/10.32804/CASIRJ
Abstract
Abstract
We have analyzed the communication-optimal distributed algorithms for dense linear algebra computations in terms of communication costs and performance improvements. We consider all of the fundamental computations-BLAS, Cholesky and symmetric-indefinite factorizations, LU and QR decompositions, eigenvalue and singular value decompositions and compare the best distributed algorithms with the lower bounds.We considered communication between a fast memory of size M and a slow memory of unbounded size. and track both the number of words and messages that an algorithm moves.
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. “Cache-Oblivious Algorithms". In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science. FOCS '99. Washington, DC, USA: IEEE Computer Society, 1999, p. 285.
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. “Communication-optimal Parallel and Distributed Cholesky Decomposition". In: SIAM Journal on Scientific Computing 32.6 (2010), pp. 3495-3523.
- G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki. Communication-Avoiding Symmetric-Indefinite Factorization. Tech. rep. UCB/EECS-2013-127. EECS Department, University of California, Berkeley, July 2013.
- G. Ballard, J. Demmel, B. Lipshitz, O. Schwartz, and S. Toledo. “Communication efficient Gaussian elimination with partial pivoting using a shape morphing data layout". In: Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures. SPAA '13. Montreal, Quebec, Canada: ACM, 2013, pp. 232-240.
- G. Ballard, J. Demmel, and N. Knight. Avoiding Communication in Successive Band Reduction. Tech. rep. UCB/EECS-2013-131. EECS Department, University of California, Berkeley, July 2013.
- G. Ballard, J. Demmel, and I. Dumitriu. Communication-optimal parallel and sequential eigenvalue and singular value algorithms. EECS Technical Report EECS-2011-14. UC Berkeley, Feb. 2011.
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Distributed Communication Bounds for Fast Linear Algebra. Tech. rep. EECS-2012-36. UC Berkeley, Mar. 2012.
- E. Elmroth and F. Gustavson. “New serial and parallel recursive QR factorization algorithms for SMP systems". In: Applied Parallel Computing. Large Scale Scientific and Industrial Problems. Ed. by B. K_agstrom et al. Vol. 1541. Lecture Notes in Computer Science. Springer, 1998, pp. 120-128.
- F. G. Gustavson. “Recursion leads to automatic variable blocking for dense linear-algebra algorithms". In: IBM Journal of Research and Development 41.6 (1997), pp. 737-756.
- S. Toledo. “Locality of Reference in LU Decomposition with Partial Pivoting". In: SIAM Journal on Matrix Analysis and Applications 18.4 (1997), pp. 1065-1081.
- R. Agarwal, F. Gustavson, and M. Zubair. “A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication". In: IBM Journal of Research and Development 38.6 (1994), pp. 673-681.
- A. Aggarwal and J.S. Vitter. “The input/output complexity of sorting and related problems". In: Communications of the ACM 31.9 (1988), pp. 1116-1127.
- E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. Also available from http://www.netlib.org/lapack/. Philadelphia, PA, USA: SIAM, 1992.
- N. Ahmed and K. Pingali. “Automatic Generation of Block-Recursive Codes". In: Proceedings from the 6th International Euro-Par Conference on Parallel Processing. Euro-Par '00. London, UK: Springer-Verlag, 2000, pp. 368-378.
- Miroslav Rozloznik, Gil Shklarski, and Sivan Toledo. “Partitioned triangular tridiagonalization". In: ACM Transactions on Mathematical Software 37 (4 2011), 38:1-38:16.
- L. Grigori, J. Demmel, and H. Xiang. “CALU: A Communication Optimal LU Factorization Algorithm". In: SIAM Journal on Matrix Analysis and Applications 32.4 (2011), pp. 1317-1350.
- L. Trefethen and R. Schreiber. \”Average-Case Stability of Gaussian Elimination". In: SIAM Journal on Matrix Analysis and Applications 11 (1990), pp. 335-360.
- J. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.
- C. Bischof and C. Van Loan. “The WY representation for products of Householder matrices". In: SIAM Journal on Scientific and Statistical Computing 8.1 (1987).
- R. Schreiber and C. Van Loan. “A storage-efficient WY representation for products of Householder transformations". In: SIAM Journal on Scientific and Statistical Computing 10.1 (1989), pp. 53-57.
- E. Elmroth and F. Gustavson. “New serial and parallel recursive QR factorization algorithms for SMP systems". In: Applied Parallel Computing. Large Scale Scientific and Industrial Problems. Ed. by B. K_agstrom et al. Vol. 1541. Lecture Notes in Computer Science. Springer, 1998, pp. 120-128.
- J. D. Frens and D. S. Wise. “QR factorization with Mortonordered quadtree matrices for memory reuse and parallelism". In: ACM SIGPLAN Notices 38.10 (2003), pp. 144-154.
- J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. “Communication-optimal Parallel and Distributed QR and LU Factorizations". In: SIAM Journal on Scientific Computing 34.1 (2012), A206-A239.
- G. H. Golub, R. J. Plemmons, and A. Sameh. “Parallel block schemes for large-scale least-squares computations". In: High-speed computing: scientific applications and algorithm design. Champaign, IL, USA: University of Illinois Press, 1988, pp. 171- 179.
- A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. Tech. rep. 191. LAPACK Working Note, Sept. 2007.
- B. C. Gunter and R. A. Van De Geijn. “Parallel out-of-core computation and updating of the QR factorization". In: ACM Transactions on Mathematical Software 31.1 (2005), pp. 60-78.
- G. Ballard, A. Buluc_, J. Demmel, L. Grigori, B. Lipshitz, O. Schwartz, and S. Toledo. “Communication optimal parallel multiplication of sparse random matrices". In: Pro-ceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures. SPAA '13. Montreal, Quebec, Canada: ACM, 2013, pp. 222-231.
- J. Demmel, O. Marques, B. Parlett, and C. Vomel. “Performance and accuracy of LAPACK's symmetric tridiagonal eigensolvers". In: SIAM Journal on Scientific Computing 30.3 (Mar. 2008), pp. 1508-1526.
- C. Bischof, B. Lang, and X. Sun. “A Framework for Symmetric Band Reduction". In: ACM Transactions on Mathematical Software 26.4 (Dec. 2000), pp. 581-601.
- C. H. Bischof, B. Lang, and X. Sun. “Algorithm 807: The SBR Toolbox-software for successive band reduction". In: ACM Transactions on Mathematical Software 26.4 (Dec. 2000), pp. 602-616.
- Y. Nakatsukasa and N. Higham. Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. Tech. rep. MIMS EPrint 2012.52. The University of Manchester, 2012.
- L. Karlsson and B. K_agstrom. “Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures". In: Parallel Computing 37.12 (2011). 6th International Workshop on Parallel Matrix Algorithms and Applications (PMAA'10), pp. 771 -782.
- S. Mohanty and S. Gopalan. “I/O efficient QR and QZ algorithms". In: High Performance Computing (HiPC), 2012 19th International Conference on. 2012, pp. 1- 9.
- Z. Bai, J. Demmel, and M. Gu. “An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems". In: Numerische Mathematik 76.3 (1997), pp. 279-308.
- J. Demmel, I. Dumitriu, and O. Holtz. “Fast linear algebra is stable". In: Numerische Mathematik 108.1 (2007). 10.1007/s00211-007-0114-x, pp. 59-91.
|