在 Arm64 上编译测试 Spark regexisart
如需转载,请标明原文出处以及作者
陈锐 RuiChen @kiwik
2019/06/04 17:02:19
背景
今天开始大数据领域的另一个核心项目Spark在aarch64
上的编译和测试,与上一篇博客《在ARM64上编译Hadoop》
不同的是Spark的主要编程语言为Scala
,遇到的问题也不尽相同。
编译环境
- OS: CentOS 7.6
- Arch: aarch64
- Host: 华为云ARM公测实例
- Spark: git commit
8b18ef5c7b670b154494b365a77b9bb015c1a10f
(2019-06主干) - Java: 1.8.0
- Maven: 3.6.1
- Scala: 2.12.8
- Zinc: 0.3.15
还是从Spark的官方编译文档
入手,我选择用Maven
编译。Spark的代码仓库中提供了一个对于mvn
的封装bash脚本,用于按照项目
的要求下载和配置特定版本的Maven
、Zinc
和Scala
,并通过./build/mvn
执行编译命令,这引起
了第一个问题:
[root@arm-chenrui spark]# ./build/mvn --version
/root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/nailgun: line 50: /root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/ng/linux32/ng: cannot execute binary file
/root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/nailgun: line 50: /root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/ng/linux32/ng: cannot execute binary file
Using `mvn` from path: /usr/local/src/apache-maven/bin/mvn
Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00)
Maven home: /usr/local/src/apache-maven
Java version: 1.8.0_212, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.aarch64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.14.0-49.el7a.aarch64", arch: "aarch64", family: "unix"
/root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/nailgun: line 50: /root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin/ng/linux32/ng: cannot execute binary file
注意前两行和最后一行,Spark用到了Zinc对Scala和Java代码进行增量编译,提升编译速度,
./build/mvn
脚本中启动了一个Zinc server,间接调用到了nailgun
的二进制客户端代码,
nailgun的客户端代码用C编写,服务端代码用Java。想了解Zinc和Nailgun,请读这里。
如下就是Zinc中包含的调用nailgun的shell脚本,是通过uname
命令获取的操作系统和架构信息,
我使用的CentOS Linux和aarch64,会匹配到最后一个分支*linux*
, 导致将平台识别为linux32
,
在ARM
环境上执行一个x86
的二进制文件失败。
[root@arm-chenrui spark]# uname -a | tr "[:upper:]" "[:lower:]"
linux arm-chenrui 4.14.0-49.el7a.aarch64 #1 smp tue apr 10 17:22:26 utc 2018 aarch64 aarch64 aarch64 gnu/linux
# zinc-0.3.15/bin/nailgun
if ! ng_exists ; then
# choose bundled binary based on platform
declare -r una=$( uname -a | tr "[:upper:]" "[:lower:]" )
declare platform=""
case "$una" in
*cygwin*) platform="win32" ;;
*mingw32*) platform="win32" ;;
*darwin*x86_64*) platform="darwin64" ;;
*darwin*) platform="darwin32" ;;
*linux*x86_64*) platform="linux64" ;;
*linux*ppc64le*) platform="linuxppc64le" ;;
*linux*) platform="linux32" ;;
*) platform="unknown" ;;
esac
cmd="$script_dir/ng/$platform/ng"
[[ -x "$cmd" ]] && ng_cmd="$cmd"
fi
# run ng client command
"$ng_cmd" "$@"
[root@arm-chenrui bin]# pwd
/root/gopath/src/github.com/apache/spark/build/zinc-0.3.15/bin
[root@arm-chenrui bin]# tree
.
├── nailgun
├── ng
│ ├── darwin32
│ │ └── ng
│ ├── darwin64
│ │ └── ng
│ ├── linux32
│ │ └── ng
│ ├── linux64
│ │ └── ng
│ ├── linuxppc64le
│ │ └── ng
│ └── win32
│ └── ng.exe
└── zinc
7 directories, 8 files
[root@arm-chenrui bin]# file ng/linux32/ng
ng/linux32/ng: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, BuildID[sha1]=6cbc2480f56773bb2df2689506473c806d6b387f, stripped
看了下Zinc已经很多年没有人维护了,可能都在使用
sbt了,Spark也提供了使用sbt
编译出包的方法,先放下,看看
Maven
有没有规避的方法。
仔细读了一下Spark的build/mvn
脚本,结合Spark的pom.xml
中的scala-maven-plugin
配置,发现Zinc server
模式是一个可选功能,不影响正常的编译。而且Maven插件scala-maven-plugin
会自动解决依赖的Scala
包,匹配正确的版本,考虑直接使用系统的mvn
命令编译。
If there is no Zinc server currently running then the plugin falls back to regular incremental compilation.
注意:如果第一次用build/mvn
编译,会在后台启动一个Zinc
的Java进程,需要把进程Kill掉。
开始编译
使用Maven编译,仓库使用国内镜像仓库,提升速度,配置参考我的上一篇博客《在ARM64上编译Hadoop》。
编译命令mvn -DskipTests clean package
,在默认参数下编译成功,整体大约15分钟。
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 29.829 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 23.509 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 4.940 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 5.628 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 9.637 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 5.218 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 9.465 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 12.940 s]
[INFO] Spark Project Core ................................. SUCCESS [03:17 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.288 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 13.799 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 38.022 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:13 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:59 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:17 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 1.539 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 56.147 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 5.218 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 4.633 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 4.052 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 11.616 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 13.700 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 19.606 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 6.458 s]
[INFO] Spark Avro ......................................... SUCCESS [ 7.448 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:41 min
[INFO] Finished at: 2019-05-28T11:54:16+08:00
[INFO] ------------------------------------------------------------------------
增加一些参数:mvn -Phadoop-3.2 -Pyarn -Phive -Pmesos -Pkubernetes -DskipTests clean package
,
单独编译子模块:mvn -Phive -Phive-thriftserver -pl :spark-hive-thriftserver_2.12 -DskipTests clean package
这两种情况下都可以编译成功,可以看到Spark在aarch64
环境下的编译是没有问题的。从github上
看Spark的主要编程语言是Scala
、Java
和Python
,没有涉及到C
、C++
和Golang
的等平台相关
的编程语言。
执行测试
由于OpenLab计划向各个开源社区提供ARM CI环境,我们也会初步验证一下主流软件在ARM环境上的 测试流程。
PySpark Test
首先尝试执行PySpark的测试,需要安装numpy
包,CentOS已经在仓库中提供了对应的ARM版本,需要
执行命令yum install python2-numpy
,按照Spark官方文档,执行测试:
mvn -DskipTests clean package -Phive
./python/run-tests
直接执行遇到如下错误,跟snappy库相关,Spark用到了snappy-java
,它对于aarch64
有直接的
支持,但是它对于本地的glibc库版本有要求,错误看起来是CentOS 7.6配套的libstdc++.so
库版本过低,通过命令strings /lib64/libstdc++.so.6 | grep GLIBCXX_3.4
确认了一下,
CentOS 7.6默认支持gcc 4.8.5
和GLIBCXX_3.4.19
,而snappy-java要求的版本是GLIBCXX_3.4.21
。
据说CentOS在生命周期内是不会升级libstdc++.so
的版本的,而且我不想在CentOS上再编译gcc
,
手动替换libstdc++.so
的库文件,这样做可能会引起其他的问题,所以这个问题规避不了。
个人认为,这个问题是CentOS的问题,不是ARM的问题。后续尝试使用Ubuntu 16.04
测试。
Caused by: java.lang.UnsatisfiedLinkError: /root/gopath/src/github.com/apache/spark/python/target/9b3ceca5-eed5-4a07-920f-ea1ca2799335/snappy-1.1.7-6c9d7355-f005-4535-9c3f-1065c35c8348-libsnappyjava.so: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /root/gopath/src/github.com/apache/spark/python/target/9b3ceca5-eed5-4a07-920f-ea1ca2799335/snappy-1.1.7-6c9d7355-f005-4535-9c3f-1065c35c8348-libsnappyjava.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:179)
at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165)
at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:275)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1384)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:270)
... 9 more
Java Test
直接在Spark源码目录下执行mvn test
,遇到的错误如下,也是一个JNI加载本地库文件的问题。
[INFO] Running org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] Tests run: 38, Failures: 0, Errors: 38, Skipped: 0, Time elapsed: 0.261 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] copyIndexDescendingWithStart(org.apache.spark.util.kvstore.LevelDBIteratorSuite) Time elapsed: 0.229 s <<< ERROR!
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /root/gopath/src/github.com/apache/spark/common/kvstore/target/tmp/libleveldbjni-64-1-8993977504386930128.8: /root/gopath/src/github.com/apache/spark/common/kvstore/target/tmp/libleveldbjni-64-1-8993977504386930128.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]
at org.apache.spark.util.kvstore.LevelDBIteratorSuite.createStore(LevelDBIteratorSuite.java:44)
把leveldbjni-all-1.8.jar
解压了之后发现,jar包里没有包含对应的aarch64
版本的so文件。
查看了一下leveldbjni的源码仓库,最新
主干已经支持了aarch64
,但是没做1.9
版本的发布,所以在Maven中央仓库的最新版本是1.8,
不包括对于aarch64
的支持。很不幸,项目已经两年多无人维护了。
[root@arm-chenrui test]# tree META-INF/native/
META-INF/native/
├── linux32
│ └── libleveldbjni.so
├── linux64
│ └── libleveldbjni.so
├── osx
│ └── libleveldbjni.jnilib
├── windows32
│ └── leveldbjni.dll
└── windows64
└── leveldbjni.dll
5 directories, 5 files
问题
Spark的这个问题不同于上一篇博客《在ARM64上编译Hadoop》
中遇到的编译问题,这是一个典型的运行时问题。Spark在aarch64
上编译通过,但是在运行Python和
Java测试时,加载依赖库出现了问题,这种问题如果不进行全面的自动化CI测试不容易发现。因此后续
OpenLab在向开源项目提供ARM CI时,需要覆盖:编译、UT和
集成测试等多个方面。这样才能有效的促进ARM的服务器端软件生态,让客户的业务系统在ARM服务器
上运行的更为顺畅,也对ARM更有信心。
OpenLab ARM CI 还有很多工作要做。