Building R-2.9.0 with gcc4.3 and ACML4.3.0
(or R-2.12.0 with gcc4.4 and ACML4.4.0 on RedHat EL 5.5 -- with some NA inconsistency errors)
The following is an example of building R-2.9.0 with gcc4.3 and ACML4.3 on CentOS 5.3 (x86_64). You need gcc4.3 and gfortran4.3 for using ACML4.3.
# ----- config.site -----
CC=gcc43
F77=gfortran43
FC=gfortran43
CXX="g++43"
SHLIB_CXXLD="g++43"
SHLIB_CXXLDFLAGS="-shared"
#------------------------
Configuration script
#----------------------------
# for two core SMP machie
OMP_NUM_THREADS=2;export OMP_NUM_THREADS
LANG=C; export LANG
LC_ALL=C; export LC_ALL
LD_LIBRARY_PATH=/opt/acml4.3.0/gfortran64_mp/lib
export LD_LIBRARY_PATH
./configure --with-blas="-L/opt/acml4.3.0/gfortran64_mp/lib -lacml_mp" --enable-mbcs
make
make check
#----------------------------
Performance
Tested on an old dual Opteron server (Opteron 248 x2)
(R-2.9.0 with BLAS in ACML4.3.0 )
OMP_NUM_THREADS=1; export OMP_NUM_THREADS; R
...
R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
> a <- matrix(rnorm(4000000),2000)
> system.time(ainv <-solve(a))
user system elapsed
6.399 0.092 6.523
> system.time(ainv <-solve(a))
user system elapsed
6.326 0.076 6.439
> system.time(a.svd <- svd(a))
user system elapsed
34.470 0.264 34.966
> system.time(a.svd <- svd(a))
user system elapsed
34.153 0.170 34.347
> ata <- a+ t(a)
> system.time(ata.eigen <- eigen(ata))
user system elapsed
16.385 0.043 16.431
> system.time(ata.eigen <- eigen(ata))
user system elapsed
16.409 0.085 16.500
>
> q()
Save workspace image? [y/n/c]: y
$ OMP_NUM_THREADS=2; export OMP_NUM_THREADS; R
R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
....
> system.time(ainv <-solve(a))
user system elapsed
6.802 0.171 3.995
> system.time(ainv <-solve(a))
user system elapsed
6.614 0.112 3.720
> system.time(a.svd <- svd(a))
user system elapsed
48.994 0.282 26.267
> system.time(a.svd <- svd(a))
user system elapsed
48.439 0.250 25.948
> system.time(ata.eigen <- eigen(ata))
user system elapsed
20.397 0.231 11.406
> system.time(ata.eigen <- eigen(ata))
user system elapsed
20.264 0.158 11.273
Errors and Performance of R-2.12.0 with ACML4.4.0 (compiled by gcc44 and gfortran44)
Tested on a RedHat EL 5.5 x64 server (HP ProLiant DL 165 G6) with AMD Opeteron 2435 x2 (2.6GHz, 12 cores).
(with ACML4.4.0-BLAS)
Note for ACML4.4.0 with SELinux
ACML4.4.0 seems to require adaptation for SELinux as
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64/lib/libacml.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64/lib/libacml_mv.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64_mp/lib/libacml_mp.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64_mp/lib/libacml_mv.so'
config.site
----------------------------
CC=gcc44
CFLAGS="-g -O2 -std=c99"
F77=gfortran44
FC=gfortran44
CXX="g++44"
SHLIB_CXXLD="g++44"
SHLIB_CXXLDFLAGS="-shared"
---------------------------
configuration script
------------------------------
# for twelve core SMP machie
OMP_NUM_THREADS=12;export OMP_NUM_THREADS
# This locale makes an error in reg-plot-latin1.R .
LANG=C; export LANG
LC_ALL=C; export LC_ALL
LD_LIBRARY_PATH=/opt/acml4.4.0/gfortran64_mp/lib
export LD_LIBRARY_PATH
# ACML4.4.0 requires chcon for SELinux
cd R-2.12.0
./configure --with-blas="-L/opt/acml4.4.0/gfortran64_mp/lib -lacml_mp" --enable-R-shlib --enable-BLAS-shlib
make
----------------------------
Errors from "make check"
in reg-test-1b.R
--------------------------
x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))
stopifnot(identical(z, x %*% t(y)))
stopifnot(is.nan(log(0) %*% 0))
## depended on the BLAS in use: some (including the reference BLAS)
## had z[1,3] == 0 and log(0) %*% 0 as as.matrix(0).
------------------------
R-2.12.0 with ACML4.4.0
------------------------
x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))
x <- matrix(c(1, 0, NA, 1), 2, 2)
> y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
> (z <- tcrossprod(x, y))
[,1] [,2] [,3]
[1,] NA NA 0
[2,] 2 1 0
>
> x %*% t(y)
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 2 1 0
> x
[,1] [,2]
[1,] 1 NA
[2,] 0 1
> y
[,1] [,2]
[1,] 1 2
[2,] 0 1
[3,] 0 0
>
> log(0) %*% 0
[,1]
[1,] 0
------------------------
c.f. R-2.11.1 with MKL10.2 works well. ( Compiled by icc 11.1 and ifort 11.1 on CentOS 5.5 x64 with Xeon X5472 WS)
R-2.9.1 with ACML4.3.0 fails. (Compiled by gcc44 and gfortran44 on RedHat EL5.5 x64 on Xeon X3363 )
ACML-BLAS NaN propagation problem? See
a post in AMD Developer Central (11/09/2010)
BLAS Performance:
> set.seed(23456)
> ksz <- 4096
> a <- matrix(rnorm(ksz*ksz),ksz)
> system.time(a.svd <- svd(a))
user system elapsed
757.135 6.056 86.788
> system.time(a.svd <- svd(a))
user system elapsed
740.630 5.532 84.331
> ata <- a+t(a)
> system.time(ata.eigen <- eigen(ata) )
user system elapsed
299.026 1.952 51.965
> system.time(ata.eigen <- eigen(ata) )
user system elapsed
305.595 1.753 51.840
> system.time(a.eigen <- eigen(a))
user system elapsed
1697.918 31.065 199.518
> system.time(a.eigen <- eigen(a))
user system elapsed
1627.778 31.258 189.673
> system.time(a.solve <- solve(a))
user system elapsed
44.287 2.257 6.236
> system.time(a.solve <- solve(a))
user system elapsed
43.527 2.444 6.273