Peer-reviewed Conference Papers

February 13th, 2014

Symbols: Abstract Abstract, Publication Publication, Presentation Presentation, BibTeX Citation BibTeX Citation, DOI Link DOI Link

  1. Thomas Naughton, Christian Engelmann, Geoffroy Vallée, and Swen Böhm. Supporting the Development of Resilient Message Passing Applications using Simulation. In Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2014, pages 271-278, Turin, Italy, February 12-14, 2014. IEEE Computer Society, Los Alamitos, CA, USA. ISSN 1066-6192. Abstract Publication Presentation BibTeX Citation DOI Link
  2. Geoffroy Vallée, Thomas Naughton, Swen Böhm, and Christian Engelmann. A Runtime Environment for Supporting Research in Resilient HPC System Software & Tools. In Proceedings of the 1st International Symposium on Computing and Networking – Across Practical Development and Theoretical Research – (CANDAR) 2013, Matsuyama, Japan, December 4-6, 2013. IEEE Computer Society, Los Alamitos, CA, USA. Acceptance rate 36% (28/78). To appear. Abstract Publication BibTeX Citation
  3. Christian Engelmann. Investigating Operating System Noise in Extreme-Scale High-Performance Computing Systems using Simulation. In Proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2013, Innsbruck, Austria, February 11-13, 2013. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-943-1. Abstract Publication Presentation BibTeX Citation
  4. David Fiala, Frank Mueller, Christian Engelmann, Kurt Ferreira, Ron Brightwell, and Rolf Riesen. Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing. In Proceedings of the 25th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2012, pages 78:1-78:12, Salt Lake City, UT, USA, November 10-16, 2012. ACM Press, New York, NY, USA. ISBN 978-1-4673-0804-5. Acceptance rate 21.2% (100/472). Abstract Publication Presentation BibTeX Citation
  5. James Elliott, Kishor Kharbas, David Fiala, Frank Mueller, Kurt Ferreira, and Christian Engelmann. Combining Partial Redundancy and Checkpointing for HPC. In Proceedings of the 32nd International Conference on Distributed Computing Systems (ICDCS) 2012, pages 615-626, Macau, SAR, China, June 18-21, 2012. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4685-8. ISSN 1063-6927. Acceptance rate 13% (71/515). Abstract Publication Presentation BibTeX Citation DOI Link
  6. Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Fei Meng, Youngjae Kim, and Christian Engelmann. NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2012, pages 957-968, Shanghai, China, May 21-25, 2012. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4675-9. Acceptance rate 21% (118/569). Abstract Publication Presentation BibTeX Citation DOI Link
  7. Swen Böhm and Christian Engelmann. File I/O for MPI Applications in Redundant Execution Scenarios. In Proceedings of the 20th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2012, pages 112-119, Garching, Germany, February 15-17, 2012. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4633-9. ISSN 1066-6192. Abstract Publication Presentation BibTeX Citation DOI Link
  8. Swen Böhm and Christian Engelmann. xSim: The Extreme-Scale Simulator. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS) 2011, pages 280-286, Istanbul, Turkey, July 4-8, 2011. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-61284-383-4. Acceptance rate 28.1% (48/171). Abstract Publication Presentation BibTeX Citation DOI Link
  9. Christian Engelmann and Swen Böhm. Redundant Execution of HPC Applications with MR-MPI. In Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2011, pages 31-38, Innsbruck, Austria, February 15-17, 2011. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-864-9. Abstract Publication Presentation BibTeX Citation DOI Link
  10. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Hybrid Checkpointing for MPI Jobs in HPC Environments. In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems (ICPADS) 2010, pages 524-533, Shanghai, China, December 8-10, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4307-9. Acceptance rate 29.6% (77/188). Abstract Publication Presentation BibTeX Citation DOI Link
  11. Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, and Galen Shipman. Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures. In Proceedings of the 23rd IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2010, pages 1-12, New Orleans, LA, USA, November 13-19, 2010. ACM Press, New York, NY, USA. ISBN 978-1-4244-7559-9. Acceptance rate 19.8% (50/253). Abstract Publication Presentation BibTeX Citation DOI Link
  12. Swen Böhm, Christian Engelmann, and Stephen L. Scott. Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments. In Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications (HPCC) 2010, pages 72-78, Melbourne, Australia, September 1-3, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4214-0. Acceptance rate 19.1% (58/304). Abstract Publication Presentation BibTeX Citation DOI Link
  13. Antonina Litvinova, Christian Engelmann, and Stephen L. Scott. A Proactive Fault Tolerance Framework for High-Performance Computing. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2010, Innsbruck, Austria, February 16-18, 2010. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-783-3. Abstract Publication Presentation BibTeX Citation DOI Link
  14. Narate Taerat, Nichamon Naksinehaboon, Clayton Chandler, James Elliott, Chokchai (Box) Leangsuksun, George Ostrouchov, Stephen L. Scott, and Christian Engelmann. Blue Gene/L Log Analysis and Time to Interrupt Estimation. In Proceedings of the 4th International Conference on Availability, Reliability and Security (ARES) 2009, pages 173-180, Fukuoka, Japan, March 16-19, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-4244-3572-2. Acceptance rate 25% (40/160). Abstract Publication BibTeX Citation DOI Link
  15. Christian Engelmann, Hong H. Ong, and Stephen L. Scott. Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems. In Proceedings of the 10th LCI International Conference on High-Performance Clustered Computing (LCI) 2009, Boulder, CO, USA, March 9-12, 2009. Abstract Publication Presentation BibTeX Citation
  16. Christian Engelmann, Geoffroy R. Vallée, Thomas Naughton, and Stephen L. Scott. Proactive Fault Tolerance Using Preemptive Migration. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2009, pages 252-257, Weimar, Germany, February 18-20, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3544-9. ISSN 1066-6192. Acceptance rate 42%. Abstract Publication Presentation BibTeX Citation DOI Link
  17. Alessandro Valentini, Christian Di Biagio, Fabrizio Batino, Guido Pennella, Fabrizio Palma, and Christian Engelmann. High Performance Computing with Harness over InfiniBand. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2009, pages 151-154, Weimar, Germany, February 18-20, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3544-9. ISSN 1066-6192. Acceptance rate 42%. Abstract Publication BibTeX Citation DOI Link
  18. Christian Engelmann, Hong H. Ong, and Stephen L. Scott. The Case for Modular Redundancy in Large-Scale High Performance Computing Systems. In Proceedings of the 8th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2009, pages 189-194, Innsbruck, Austria, February 16-18, 2009. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-784-0. Abstract Publication Presentation BibTeX Citation DOI Link
  19. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Process-Level Live Migration in HPC Environments. In Proceedings of the 21st IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2008, pages 1-12, Austin, TX, USA, November 15-21, 2008. ACM Press, New York, NY, USA. ISBN 978-1-4244-2835-9. Acceptance rate 21.3% (59/277). Abstract Publication Presentation BibTeX Citation DOI Link
  20. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Symmetric Active/Active Replication for Dependent Services. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES) 2008, pages 260-267, Barcelona, Spain, March 4-7, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3102-1. Acceptance rate 21.1% (40/190). Abstract Publication Presentation BibTeX Citation DOI Link
  21. Geoffroy R. Vallée, Kulathep Charoenpornwattana, Christian Engelmann, Anand Tikotekar, Chokchai (Box) Leangsuksun, Thomas Naughton, and Stephen L. Scott. A Framework For Proactive Fault Tolerance. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES) 2008, pages 659-664, Barcelona, Spain, March 4-7, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3102-1. Acceptance rate 21.1% (40/190). Abstract Publication Presentation BibTeX Citation DOI Link
  22. Björn Könning, Christian Engelmann, Stephen L. Scott, and George A. (Al) Geist. Virtualized Environments for the Harness High Performance Computing Workbench. In Proceedings of the 16th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2008, pages 133-140, Toulouse, France, February 13-15, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3089-5. Acceptance rate 40%. Abstract Publication Presentation BibTeX Citation DOI Link
  23. Geoffroy R. Vallée, Thomas Naughton, Christian Engelmann, Hong H. Ong, and Stephen L. Scott. System-level Virtualization for High Performance Computing. In Proceedings of the 16th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2008, pages 636-643, Toulouse, France, February 13-15, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3089-5. Acceptance rate 40%. Abstract Publication Presentation BibTeX Citation DOI Link
  24. Li Ou, Christian Engelmann, Xubin (Ben) He, Xin Chen, and Stephen L. Scott. Symmetric Active/Active Metadata Service for Highly Available Cluster Storage Systems. In Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS) 2007, Cambridge, MA, USA, November 19-21, 2007. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-703-1. Acceptance rate 49%. Abstract Publication Presentation BibTeX Citation DOI Link
  25. Emanuele Di Saverio, Marco Cesati, Christian Di Biagio, Guido Pennella, and Christian Engelmann. Distributed Real-Time Computing with Harness. In Lecture Notes in Computer Science: Proceedings of the 14th European PVM/MPI Users` Group Meeting (EuroPVM/MPI) 2007, pages 281-288, Paris, France, September 30 – October 3, 2007. Springer Verlag, Berlin, Germany. ISBN 978-3-540-75415-2. ISSN 0302-9743. Abstract Publication Presentation BibTeX Citation DOI Link
  26. Li Ou, Xubin (Ben) He, Christian Engelmann, and Stephen L. Scott. A Fast Delivery Protocol for Total Order Broadcasting. In Proceedings of the 16th IEEE International Conference on Computer Communications and Networks (ICCCN) 2007, pages 730-734, Honolulu, HI, USA, August 13-16, 2007. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-42441-251-8. ISSN 1095-2055. Acceptance rate 29.1% (160/550). Abstract Publication Presentation BibTeX Citation DOI Link
  27. Arun B. Nagarajan, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Fault Tolerance for HPC with Xen Virtualization. In Proceedings of the 21st ACM International Conference on Supercomputing (ICS) 2007, pages 23-32, Seattle, WA, USA, June 16-20, 2007. ACM Press, New York, NY, USA. ISBN 978-1-59593-768-1. Acceptance rate 23.6% (29/123). Most cited paper with 178 citations. Abstract Publication Presentation BibTeX Citation DOI Link
  28. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. On Programming Models for Service-Level High Availability. In Proceedings of the 2nd International Conference on Availability, Reliability and Security (ARES) 2007, pages 999-1006, Vienna, Austria, April 10-13, 2007. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-2775-2. Acceptance rate 28.3% (60/212). Abstract Publication Presentation BibTeX Citation DOI Link
  29. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance. In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2007, pages 1-10, Long Beach, CA, USA, March 26-30, 2007. ACM Press, New York, NY, USA. ISBN 978-1-59593-768-1. Acceptance rate 26% (109/419). Abstract Publication Presentation BibTeX Citation DOI Link
  30. Kai Uhlemann, Christian Engelmann, and Stephen L. Scott. JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management. In Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster) 2006, pages 1-10, Barcelona, Spain, September 25-28, 2006. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 1-4244-0328-6. ISSN 1552-5244. Acceptance rate 33.1% (42/127). Abstract Publication Presentation BibTeX Citation DOI Link
  31. Ronald Baumann, Christian Engelmann, and George A. (Al) Geist. A Parallel Plug-in Programming Paradigm. In Lecture Notes in Computer Science: Proceedings of the 7th International Conference on High Performance Computing and Communications (HPCC) 2006, pages 823-832, Munich, Germany, September 13-15, 2006. Springer Verlag, Berlin, Germany. ISBN 978-3-540-39368-9. ISSN 0302-9743. Abstract Publication Presentation BibTeX Citation DOI Link
  32. Jyothish Varma, Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems. In Proceedings of the 20th ACM International Conference on Supercomputing (ICS) 2006, pages 219-228, Cairns, Australia, June 28-30, 2006. ACM Press, New York, NY, USA. ISBN 1-59593-282-8. Acceptance rate 26.2% (37/141). Abstract Publication Presentation BibTeX Citation DOI Link
  33. Daniel I. Okunbor, Christian Engelmann, and Stephen L. Scott. Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems. In Proceedings of the 2nd International Conference on Computer Science and Information Systems 2006, Athens, Greece, June 19-21, 2006. Abstract Publication BibTeX Citation
  34. Kshitij Limaye, Chokchai (Box) Leangsuksun, Zeno Greenwood, Stephen L. Scott, Christian Engelmann, Richard M. Libby, and Kasidit Chanchio. Job-Site Level Fault Tolerance for Cluster and Grid Environments. In Proceedings of the 7th IEEE International Conference on Cluster Computing (Cluster) 2005, pages 1-9, Boston, MA, USA, September 26-30, 2005. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7803-9486-0. ISSN 1552-5244. Acceptance rate 39.6% (45/138). Abstract Publication BibTeX Citation DOI Link
  35. Hertong Song, Chokchai (Box) Leangsuksun, Raja Nassar, Yudan Liu, Christian Engelmann, and Stephen L. Scott. UML-based Beowulf Cluster Availability Modeling. In International Conference on Software Engineering Research and Practice (SERP) 2005, pages 161-167, Las Vegas, NV, USA, June 27-30, 2005. CSREA Press. ISBN 1-932415-49-1. BibTeX Citation
  36. Christian Engelmann and George A. (Al) Geist. Super-Scalable Algorithms for Computing on 100,000 Processors. In Lecture Notes in Computer Science: Proceedings of the 5th International Conference on Computational Science (ICCS) 2005, Part I, pages 313-320, Atlanta, GA, USA, May 22-25, 2005. Springer Verlag, Berlin, Germany. ISBN 978-3-540-26032-5. ISSN 0302-9743. Acceptance rate 35%. Abstract Publication Presentation BibTeX Citation DOI Link

Comments are closed.