Building R-2.9.0 with gcc4.3 and ACML4.3.0
(or R-2.12.0 with gcc4.4 and ACML4.4.0 on RedHat EL 5.5 -- with some NA inconsistency errors)

The following is an example of building R-2.9.0 with gcc4.3 and ACML4.3 on CentOS 5.3 (x86_64). You need gcc4.3 and gfortran4.3 for using ACML4.3.
# ----- config.site -----
CC=gcc43
F77=gfortran43
FC=gfortran43
CXX="g++43"
SHLIB_CXXLD="g++43"
SHLIB_CXXLDFLAGS="-shared"
#------------------------


Configuration script

#----------------------------
# for two core SMP machie
OMP_NUM_THREADS=2;export OMP_NUM_THREADS

LANG=C;   export LANG
LC_ALL=C; export LC_ALL

LD_LIBRARY_PATH=/opt/acml4.3.0/gfortran64_mp/lib
export LD_LIBRARY_PATH

 ./configure  --with-blas="-L/opt/acml4.3.0/gfortran64_mp/lib -lacml_mp" --enable-mbcs

make 

make check
#----------------------------

Performance

Tested on an old dual Opteron server (Opteron 248 x2) (R-2.9.0 with BLAS in ACML4.3.0 )
OMP_NUM_THREADS=1; export OMP_NUM_THREADS; R
...
R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...

> a <- matrix(rnorm(4000000),2000)

> system.time(ainv <-solve(a))
   user  system elapsed 
  6.399   0.092   6.523 
> system.time(ainv <-solve(a))
   user  system elapsed 
  6.326   0.076   6.439 

> system.time(a.svd <- svd(a))
   user  system elapsed 
 34.470   0.264  34.966 
> system.time(a.svd <- svd(a))
   user  system elapsed 
 34.153   0.170  34.347 

> ata <- a+ t(a)
> system.time(ata.eigen <- eigen(ata))
   user  system elapsed 
 16.385   0.043  16.431 
> system.time(ata.eigen <- eigen(ata))
   user  system elapsed 
 16.409   0.085  16.500 
> 
> q()
Save workspace image? [y/n/c]: y

$ OMP_NUM_THREADS=2; export OMP_NUM_THREADS; R

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
....

> system.time(ainv <-solve(a))
   user  system elapsed 
  6.802   0.171   3.995 
> system.time(ainv <-solve(a))
   user  system elapsed 
  6.614   0.112   3.720 
> system.time(a.svd <- svd(a))
   user  system elapsed 
 48.994   0.282  26.267 
> system.time(a.svd <- svd(a))
   user  system elapsed 
 48.439   0.250  25.948 
> system.time(ata.eigen <- eigen(ata))
   user  system elapsed 
 20.397   0.231  11.406 
> system.time(ata.eigen <- eigen(ata))
   user  system elapsed 
 20.264   0.158  11.273 


Errors and Performance of R-2.12.0 with ACML4.4.0 (compiled by gcc44 and gfortran44)

Tested on a RedHat EL 5.5 x64 server (HP ProLiant DL 165 G6) with AMD Opeteron 2435 x2 (2.6GHz, 12 cores).
(with ACML4.4.0-BLAS)

Note for ACML4.4.0 with SELinux

ACML4.4.0 seems to require adaptation for SELinux as
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64/lib/libacml.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64/lib/libacml_mv.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64_mp/lib/libacml_mp.so'
chcon -t textrel_shlib_t '/opt/acml4.4.0/gfortran64_mp/lib/libacml_mv.so'
config.site
----------------------------
CC=gcc44
CFLAGS="-g -O2 -std=c99"
F77=gfortran44
FC=gfortran44
CXX="g++44"
SHLIB_CXXLD="g++44"
SHLIB_CXXLDFLAGS="-shared"
---------------------------
configuration script
------------------------------
# for twelve core  SMP machie
OMP_NUM_THREADS=12;export OMP_NUM_THREADS

# This locale makes an error in reg-plot-latin1.R .
LANG=C;   export LANG
LC_ALL=C; export LC_ALL

LD_LIBRARY_PATH=/opt/acml4.4.0/gfortran64_mp/lib
export LD_LIBRARY_PATH

# ACML4.4.0 requires chcon for SELinux
cd R-2.12.0
 ./configure  --with-blas="-L/opt/acml4.4.0/gfortran64_mp/lib -lacml_mp"  --enable-R-shlib --enable-BLAS-shlib

make 
----------------------------
Errors from "make check"
in reg-test-1b.R 
--------------------------
x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))
stopifnot(identical(z, x %*% t(y)))
stopifnot(is.nan(log(0) %*% 0))
## depended on the BLAS in use: some (including the reference BLAS)
## had z[1,3] == 0 and log(0) %*% 0 as as.matrix(0).
------------------------

R-2.12.0 with ACML4.4.0
------------------------
x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))

x <- matrix(c(1, 0, NA, 1), 2, 2)
> y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
> (z <- tcrossprod(x, y))
     [,1] [,2] [,3]
[1,]   NA   NA    0
[2,]    2    1    0
> 
> x %*% t(y)
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]    2    1    0
> x
     [,1] [,2]
[1,]    1   NA
[2,]    0    1
> y
     [,1] [,2]
[1,]    1    2
[2,]    0    1
[3,]    0    0
> 
> log(0) %*% 0
     [,1]
[1,]    0
------------------------
c.f. R-2.11.1 with MKL10.2 works well. ( Compiled by icc 11.1 and ifort 11.1 on CentOS 5.5 x64 with Xeon X5472 WS)
R-2.9.1 with ACML4.3.0 fails. (Compiled by gcc44 and gfortran44 on RedHat EL5.5 x64 on Xeon X3363 )
ACML-BLAS NaN propagation problem? See a post in AMD Developer Central (11/09/2010)

BLAS Performance:

> set.seed(23456)
> ksz <- 4096
> a <- matrix(rnorm(ksz*ksz),ksz)
> system.time(a.svd <- svd(a))
   user  system elapsed
757.135   6.056  86.788
> system.time(a.svd <- svd(a))
   user  system elapsed
740.630   5.532  84.331
> ata <- a+t(a)
> system.time(ata.eigen <- eigen(ata) )
   user  system elapsed
299.026   1.952  51.965
> system.time(ata.eigen <- eigen(ata) )
   user  system elapsed
305.595   1.753  51.840
> system.time(a.eigen <- eigen(a))
    user   system  elapsed
1697.918   31.065  199.518
> system.time(a.eigen <- eigen(a))
    user   system  elapsed
1627.778   31.258  189.673
> system.time(a.solve <- solve(a))
   user  system elapsed
 44.287   2.257   6.236
> system.time(a.solve <- solve(a))
   user  system elapsed
 43.527   2.444   6.273