Computing on High Performance Clusters with R: Packages BatchJobs and BatchExperiments

dc.contributor.authorBischl, Bernd
dc.contributor.authorLang, Michel
dc.contributor.authorMersmann, Olaf
dc.contributor.authorRahnenführer, Jörg
dc.contributor.authorWeihs, Claus
dc.date.accessioned2018-10-12T09:16:55Z
dc.date.available2018-10-12T09:16:55Z
dc.date.issued2012-01
dc.description.abstractEmpirical analysis of statistical algorithms often demands time-consuming experiments which are best performed on high performance computing clusters. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control a batch cluster within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. An important feature is that the state of computation is persistently available in a database. The user can query the status of jobs and then continue working with a desired subset. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind “apply algorithm A to problem instance P and store results”. It is possible to associate statistical designs with parameters of algorithms and problems and therefore to systematically study their influence on the results. In general our main contributions are: (a) Portability : Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (b) Reproducibility: Every computational part has an associated seed that the user can control to ensure reproducibility even when the underlying batch system changes. (c) Efficiency: Efficiently use batch computing clusters completely within R. (d) Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.en
dc.identifier.urihttp://hdl.handle.net/2003/37185
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-19181
dc.language.isoende
dc.relation.ispartofseriesTechnical report / Sonderforschungsbereich Verfügbarkeit von Information durch Analyse unter Ressourcenbeschränkung;1/2012
dc.subject.ddc004
dc.titleComputing on High Performance Clusters with R: Packages BatchJobs and BatchExperimentsen
dc.typeTextde
dc.type.publicationtypereportde
dcterms.accessRightsopen access
eldorado.secondarypublicationfalsede

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bischl_etal_2012a.pdf
Size:
484.63 KB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description: