在 Arm64 上编译 Hadoop regexisart
如需转载,请标明原文出处以及作者
陈锐 RuiChen @kiwik
2019/05/20 14:22:08
背景
现在很多云计算厂商都对外发布了ARM实例,基础设施具备的前提下,需要验证一下ARM的服务器端软件 生态是否完备,以便支持用户的业务。我们在很多开源项目中了解到的情况是,几乎所有的项目维护者 都对于支持ARM版本持积极态度,但问题是很多项目都没有ARM的CI验证环境,导致在没有验证的前提下 开源项目维护者无法对外声称支持ARM,这个问题我们计划在OpenLab 中提供ARM CI给开源项目,解决ARM验证环境缺失的问题。
为了评估主流开源软件对于ARM的支持情况,我们考虑在ARM系统上对于这些软件编译,部署和验证。
今天我要在aarch64
环境上对于Hadoop多个主流版本进行编译,其中包括了native
模块,涉及到了
一些C代码的编译和本地库的使用。
系统环境
ARM的服务器端软件生态比预料中成熟度高,很多的基础组件都有对应的ARM版本,并都集成在对应的 操作系统软件仓库中。
- OS: CentOS 7
- Arch: aarch64
- Instance: 华为云ARM公测实例
- CentOS/Maven repo: Ali公共仓库
[root@arm-chenrui hadoop]# lsb_release -a
LSB Version: :core-4.1-aarch64:core-4.1-noarch:cxx-4.1-aarch64:cxx-4.1-noarch:desktop-4.1-aarch64:desktop-4.1-noarch:languages-4.1-aarch64:languages-4.1-noarch:printing-4.1-aarch64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.6.1810 (AltArch)
Release: 7.6.1810
Codename: AltArch
[root@arm-chenrui hadoop]# uname -a
Linux arm-chenrui 4.14.0-49.el7a.aarch64 #1 SMP Tue Apr 10 17:22:26 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
步骤
编译步骤参考的是Hadoop github上的官方文档, 由于文档使用Ubuntu,而这里我们使用CentOS所以一些依赖包有所调整。CentOS仓库中对于基础软件, 例如:OpenJDK,Git,Maven都有aarch64的包可以直接使用,直接安装就好。对于CentOS和Maven 仓库,国内Ali,网易和华为都有对应的镜像源,ARM环境的配置的文章也有很多,不再赘述。
编译器和仓库配置
Java的编译器版本和Maven版本如下:
[root@arm-chenrui hadoop]# mvn --version
Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00)
Maven home: /usr/local/src/apache-maven
Java version: 1.8.0_212, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.aarch64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.14.0-49.el7a.aarch64", arch: "aarch64", family: "unix"
Hadoop在pom.xml中设置了项目级的Maven repository指向了Apache的snapshot 仓库,直接在maven的全局设置中mirror所有的repo会有问题,所以只mirror central repo,并 配置一个central repo优先。
<settings xmlns="http://maven.apache.org/SETTINGS/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd">
<localRepository>/root/.m2/repository</localRepository>
<mirrors>
<mirror>
<mirrorOf>central</mirrorOf>
<name>Ali maven central mirror</name>
<url>http://maven.aliyun.com/repository/central</url>
<id>alimaven</id>
</mirror>
</mirrors>
<profiles>
<profile>
<repositories>
<repository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>Maven Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
</repository>
</repositories>
<id>central-first</id>
</profile>
</profiles>
<activeProfiles>
<activeProfile>central-first</activeProfile>
</activeProfiles>
<pluginGroups>
<pluginGroup>org.apache.maven.plugins</pluginGroup>
<pluginGroup>org.codehaus.mojo</pluginGroup>
</pluginGroups>
</settings>
本地依赖库
Hadoop的一些native模块编译的时候,用到了一些平台相关的本地库,例如:protobuf,cmake等。 下面的列表是我将Ubuntu替换成CentOS的所有依赖包。
yum install autoconf automake libtool cmake3
yum groupinstall "Development Tools"
yum install zlib-devel pkgconfig openssl-devel cyrus-sasl-devel
yum install protobuf-compiler protobuf-devel
yum install snappy-devel bzip2-devel bzip2 fuse fuse-devel zstd
重点说两个包protobuf-devel
和cmake3
。
- cmake3: 官方文档中提到的安装包是
cmake
,在CentOS 7中默认安装的cmake版本是2.8,不满足 Hadoop编译的要求,需要指定安装cmake3
,并按照如下步骤配置默认cmake命令为cmake3。参考
$ sudo alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake 10 \
--slave /usr/local/bin/ctest ctest /usr/bin/ctest \
--slave /usr/local/bin/cpack cpack /usr/bin/cpack \
--slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake \
--family cmake
$ sudo alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake3 20 \
--slave /usr/local/bin/ctest ctest /usr/bin/ctest3 \
--slave /usr/local/bin/cpack cpack /usr/bin/cpack3 \
--slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake3 \
--family cmake
- protobuf-devel: 官方文档中没有提到这个包,但是不安装会报如下错误:
[INFO] --- hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) @ hadoop-hdfs-native-client ---
...
[WARNING] Located all JNI components successfully.
[WARNING] CUSTOM_OPENSSL_PREFIX =
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Success
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:465 (file):
[WARNING] file STRINGS file "/usr/include/google/protobuf/stubs/common.h" cannot be
[WARNING] read.
[WARNING] Call Stack (most recent call first):
[WARNING] main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING]
[WARNING]
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:471 (math):
[WARNING] math cannot parse the expression: " / 1000000": syntax error, unexpected
[WARNING] exp_DIVIDE, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or exp_NUMBER
[WARNING] (2).
[WARNING] Call Stack (most recent call first):
[WARNING] main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING]
[WARNING]
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:472 (math):
[WARNING] math cannot parse the expression: " / 1000 % 1000": syntax error,
[WARNING] unexpected exp_DIVIDE, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or
[WARNING] exp_NUMBER (2).
[WARNING] Call Stack (most recent call first):
[WARNING] main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING]
[WARNING]
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:473 (math):
[WARNING] math cannot parse the expression: " % 1000": syntax error, unexpected
[WARNING] exp_MOD, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or exp_NUMBER
[WARNING] (2).
[WARNING] Call Stack (most recent call first):
[WARNING] main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING]
[WARNING]
[WARNING] CMake Warning at /usr/share/cmake3/Modules/FindProtobuf.cmake:495 (message):
[WARNING] Protobuf compiler version 2.5.0 doesn't match library version
[WARNING] ERROR.ERROR.ERROR
[WARNING] Call Stack (most recent call first):
[WARNING] main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING]
[WARNING]
[WARNING] -- Could NOT find GSASL (missing: GSASL_LIBRARIES GSASL_INCLUDE_DIR)
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Success
[WARNING] -- Performing Test PROTOC_IS_COMPATIBLE
[WARNING] -- Performing Test PROTOC_IS_COMPATIBLE - Failed
[WARNING] CMake Warning at main/native/libhdfspp/CMakeLists.txt:86 (message):
[WARNING] WARNING: the Protocol Buffers Library and the Libhdfs++ Library must both
[WARNING] be compiled with the same (or compatible) compiler. Normally only the same
[WARNING] major versions of the same compiler are compatible with each other.
[WARNING]
[WARNING]
[WARNING] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
[WARNING] -- Using Cyrus SASL; link with /usr/lib64/libsasl2.so
- cmake和protobuf的最终版本为:
[root@arm-chenrui hadoop]# cmake --version
cmake3 version 3.13.4
CMake suite maintained and supported by Kitware (kitware.com/cmake).
[root@arm-chenrui hadoop]# protoc --version
libprotoc 2.5.0
Hadoop版本
我尝试了在Hadoop的2.8.5
,2.9.2
,3.2.0
和trunk
(commit 0d1d7c86ec34fabc62c0e3844aca3733024bc172)
版本在ARM环境上的编译,除了trunk版本之外,都可以正常编译出包,使用的Maven命令如下:
mvn clean package -Pdist,native, -DskipTests -Dtar -Dmaven.javadoc-skip=true
trunk版本由于引入了新模块hadoop-yarn-csi
,导致需要使用protoc-gen-grpc-java
的二进制
执行文件编译*.proto
文件,但是grpc没有向Maven仓库提供对应的1.15.1
版本的aarch64
包,导致编译缺少依赖包失败。2016年就有人在github上向grpc团队反馈这个问题,
但是一直没有解决。
下面是具体的错误信息和编译过程的录屏:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:13 min
[INFO] Finished at: 2019-05-17T09:28:55+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.1:compile-custom (default) on project hadoop-yarn-csi: Missing:
[ERROR] ----------
[ERROR] 1) io.grpc:protoc-gen-grpc-java:exe:linux-aarch_64:1.15.1
[ERROR]
[ERROR] Try downloading the file manually from the project website.
[ERROR]
[ERROR] Then, install it using the command:
[ERROR] mvn install:install-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.15.1 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR]
[ERROR] Alternatively, if you host your own repository you can deploy the file there:
[ERROR] mvn deploy:deploy-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.15.1 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR]
[ERROR] Path to dependency:
[ERROR] 1) org.apache.hadoop:hadoop-yarn-csi:jar:3.3.0-SNAPSHOT
[ERROR] 2) io.grpc:protoc-gen-grpc-java:exe:linux-aarch_64:1.15.1
[ERROR]
Maven多处理器架构的支持
通过Hadoop主干代码hadoop-yarn-csi
模块的pom.xml文件,我们发现了一种Maven仓库对于
多处理器架构的支持方式。下面的代码片段利用了Maven仓库的内建能力支持一个依赖包不同的操作系统
和平台版本。我们看protoc
和protoc-gen-grpc-java
这两个Maven plugin,它们都将不同操作系统和架构的可执行文件推到了Maven仓库中,然后通过
${os.detected.classifier}
自动识别编译平台环境,自动下载对应的可执行文件。这种方式可以
很方便的解决编译时的多平台问题。
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>${protobuf-maven-plugin.version}</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:3.6.1:exe:${os.detected.classifier}</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:1.15.1:exe:${os.detected.classifier}</pluginArtifact>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
问题
通过这些调研,我们发现一些问题,其中最主要的一个就是Hadoop项目缺少ARM环境的CI,导致ARM 环境编译失败的问题被合入主干,可能问题也会进入正式的release。虽然Apache社区有一个项目bigtop 对于Hadoop在多种OS和arch上有集成验证,但验证的却是Hadoop的历史版本2.8.1, 不能检测出trunk上引入的问题。这个问题我们后续会跟Apache社区以及Hadoop项目维护者一起讨论 解决方案。一种可行的方式是:OpenLab作为第三方的CI系统,提供ARM验证环境给Hadoop项目,当前 CNCF社区的containerd已经接 入OpenLab的ARM CI,当然最终还是要看Hadoop项目和Apache社区是否接受。
2020/01/14 更新
当前我们团队已经在社区上游完成了所有 aarch64
兼容性问题的修复,并将所有代码都提交
到了 Hadoop 上游社区和其依赖库的上游社区,包括:protobuf 升级、protoc-gen-grpc-java
的 ARM 支持和升级,leveldbjni 和 rocksdbjni 的 ARM 支持等,并在 Hadoop 中修复了
所有与 aarch64
相关的测试用例失败问题,现在可以直接 git clone Hadoop
的主干代码
在 ARM64 的服务器上编译部署,下面是我在我的华为云鲲鹏计算实例上的编译结果:
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ uname -a
Linux ubuntu 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue Nov 26 16:31:09 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ cat /etc/issue
Ubuntu 16.04.6 LTS \n \l
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ mvn clean install -Paarch64 -Pnative -Drequire.snappy -Drequire.zstd -Drequire.openssl -Drequire.isal -Drequire.bzip2 -DskipTests
...
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 0.140 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 0.274 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [ 0.032 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 0.428 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [ 0.551 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [ 0.031 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:25 min
[INFO] Finished at: 2020-01-14T09:16:17Z
[INFO] ------------------------------------------------------------------------
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ mvn package -Paarch64 -Pnative -Drequire.snappy -Drequire.zstd -Drequire.openssl -Drequire.isal -Drequire.bzip2 -Pdist -Dtar -Dmaven.javadoc.skip -DskipTests
...
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [ 0.164 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 0.064 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 36.762 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [ 0.027 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 0.409 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [ 0.405 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [ 0.026 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:49 min
[INFO] Finished at: 2020-01-14T09:29:18Z
[INFO] ------------------------------------------------------------------------
注意:我启用了 aarch64
和 native
的 profile,以及第三方依赖库的编译参数,为了
快速出包跳过了所有测试,根据经验所有的 Hadoop 测试在 ARM64 服务器(4C 16G)上
执行大约用时 8 小时左右。
后续我们团队还会继续在 Hadoop、Spark、Flink、Hive、HBase 等项目中推动对于 ARM64
服务器的支持,不止限于大数据领域,基础库、数据库、Web、云平台都会涉及。大家如果
遇到任何与 aarch64
相关的问题,可以加入 Slack 频道 armserverecosystem.slack.com
一起讨论。