如需转载,请标明原文出处以及作者

陈锐 RuiChen @kiwik

2019/05/20 14:22:08


背景

现在很多云计算厂商都对外发布了ARM实例,基础设施具备的前提下,需要验证一下ARM的服务器端软件 生态是否完备,以便支持用户的业务。我们在很多开源项目中了解到的情况是,几乎所有的项目维护者 都对于支持ARM版本持积极态度,但问题是很多项目都没有ARM的CI验证环境,导致在没有验证的前提下 开源项目维护者无法对外声称支持ARM,这个问题我们计划在OpenLab 中提供ARM CI给开源项目,解决ARM验证环境缺失的问题。

为了评估主流开源软件对于ARM的支持情况,我们考虑在ARM系统上对于这些软件编译,部署和验证。 今天我要在aarch64环境上对于Hadoop多个主流版本进行编译,其中包括了native模块,涉及到了 一些C代码的编译和本地库的使用。

系统环境

ARM的服务器端软件生态比预料中成熟度高,很多的基础组件都有对应的ARM版本,并都集成在对应的 操作系统软件仓库中。

  • OS: CentOS 7
  • Arch: aarch64
  • Instance: 华为云ARM公测实例
  • CentOS/Maven repo: Ali公共仓库
[root@arm-chenrui hadoop]# lsb_release -a
LSB Version:    :core-4.1-aarch64:core-4.1-noarch:cxx-4.1-aarch64:cxx-4.1-noarch:desktop-4.1-aarch64:desktop-4.1-noarch:languages-4.1-aarch64:languages-4.1-noarch:printing-4.1-aarch64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.6.1810 (AltArch) 
Release:        7.6.1810
Codename:       AltArch
[root@arm-chenrui hadoop]# uname -a
Linux arm-chenrui 4.14.0-49.el7a.aarch64 #1 SMP Tue Apr 10 17:22:26 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

步骤

编译步骤参考的是Hadoop github上的官方文档, 由于文档使用Ubuntu,而这里我们使用CentOS所以一些依赖包有所调整。CentOS仓库中对于基础软件, 例如:OpenJDK,Git,Maven都有aarch64的包可以直接使用,直接安装就好。对于CentOS和Maven 仓库,国内Ali,网易和华为都有对应的镜像源,ARM环境的配置的文章也有很多,不再赘述。

编译器和仓库配置

Java的编译器版本和Maven版本如下:

[root@arm-chenrui hadoop]# mvn --version
Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00)
Maven home: /usr/local/src/apache-maven
Java version: 1.8.0_212, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.aarch64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.14.0-49.el7a.aarch64", arch: "aarch64", family: "unix"

Hadoop在pom.xml中设置了项目级的Maven repository指向了Apache的snapshot 仓库,直接在maven的全局设置中mirror所有的repo会有问题,所以只mirror central repo,并 配置一个central repo优先。

<settings xmlns="http://maven.apache.org/SETTINGS/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd">
  <localRepository>/root/.m2/repository</localRepository>
  <mirrors>
    <mirror>
      <mirrorOf>central</mirrorOf>
      <name>Ali maven central mirror</name>
      <url>http://maven.aliyun.com/repository/central</url>
      <id>alimaven</id>
    </mirror>
  </mirrors>
  <profiles>
    <profile>
      <repositories>
        <repository>
          <snapshots>
            <enabled>false</enabled>
          </snapshots>
          <id>central</id>
          <name>Maven Repository</name>
          <url>https://repo.maven.apache.org/maven2</url>
        </repository>
      </repositories>
      <id>central-first</id>
    </profile>
  </profiles>
  <activeProfiles>
    <activeProfile>central-first</activeProfile>
  </activeProfiles>
  <pluginGroups>
    <pluginGroup>org.apache.maven.plugins</pluginGroup>
    <pluginGroup>org.codehaus.mojo</pluginGroup>
  </pluginGroups>
</settings>

本地依赖库

Hadoop的一些native模块编译的时候,用到了一些平台相关的本地库,例如:protobuf,cmake等。 下面的列表是我将Ubuntu替换成CentOS的所有依赖包。

yum install autoconf automake libtool cmake3
yum groupinstall "Development Tools"
yum install zlib-devel pkgconfig openssl-devel cyrus-sasl-devel
yum install protobuf-compiler protobuf-devel
yum install snappy-devel bzip2-devel bzip2 fuse fuse-devel zstd

重点说两个包protobuf-develcmake3

  • cmake3: 官方文档中提到的安装包是cmake,在CentOS 7中默认安装的cmake版本是2.8,不满足 Hadoop编译的要求,需要指定安装cmake3,并按照如下步骤配置默认cmake命令为cmake3。参考
$ sudo alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake 10 \
--slave /usr/local/bin/ctest ctest /usr/bin/ctest \
--slave /usr/local/bin/cpack cpack /usr/bin/cpack \
--slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake \
--family cmake

$ sudo alternatives --install /usr/local/bin/cmake cmake /usr/bin/cmake3 20 \
--slave /usr/local/bin/ctest ctest /usr/bin/ctest3 \
--slave /usr/local/bin/cpack cpack /usr/bin/cpack3 \
--slave /usr/local/bin/ccmake ccmake /usr/bin/ccmake3 \
--family cmake
  • protobuf-devel: 官方文档中没有提到这个包,但是不安装会报如下错误:
[INFO] --- hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) @ hadoop-hdfs-native-client ---
...
[WARNING] Located all JNI components successfully.
[WARNING] CUSTOM_OPENSSL_PREFIX = 
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Success
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:465 (file):
[WARNING]   file STRINGS file "/usr/include/google/protobuf/stubs/common.h" cannot be
[WARNING]   read.
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING] 
[WARNING] 
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:471 (math):
[WARNING]   math cannot parse the expression: " / 1000000": syntax error, unexpected
[WARNING]   exp_DIVIDE, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or exp_NUMBER
[WARNING]   (2).
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING] 
[WARNING] 
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:472 (math):
[WARNING]   math cannot parse the expression: " / 1000 % 1000": syntax error,
[WARNING]   unexpected exp_DIVIDE, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or
[WARNING]   exp_NUMBER (2).
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING] 
[WARNING] 
[WARNING] CMake Error at /usr/share/cmake3/Modules/FindProtobuf.cmake:473 (math):
[WARNING]   math cannot parse the expression: " % 1000": syntax error, unexpected
[WARNING]   exp_MOD, expecting exp_PLUS or exp_MINUS or exp_OPENPARENT or exp_NUMBER
[WARNING]   (2).
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING] 
[WARNING] 
[WARNING] CMake Warning at /usr/share/cmake3/Modules/FindProtobuf.cmake:495 (message):
[WARNING]   Protobuf compiler version 2.5.0 doesn't match library version
[WARNING]   ERROR.ERROR.ERROR
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfspp/CMakeLists.txt:45 (find_package)
[WARNING] 
[WARNING] 
[WARNING] -- Could NOT find GSASL (missing: GSASL_LIBRARIES GSASL_INCLUDE_DIR) 
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Success
[WARNING] -- Performing Test PROTOC_IS_COMPATIBLE
[WARNING] -- Performing Test PROTOC_IS_COMPATIBLE - Failed
[WARNING] CMake Warning at main/native/libhdfspp/CMakeLists.txt:86 (message):
[WARNING]   WARNING: the Protocol Buffers Library and the Libhdfs++ Library must both
[WARNING]   be compiled with the same (or compatible) compiler.  Normally only the same
[WARNING]   major versions of the same compiler are compatible with each other.
[WARNING] 
[WARNING] 
[WARNING] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
[WARNING] -- Using Cyrus SASL; link with /usr/lib64/libsasl2.so
  • cmake和protobuf的最终版本为:
[root@arm-chenrui hadoop]# cmake --version
cmake3 version 3.13.4
CMake suite maintained and supported by Kitware (kitware.com/cmake).
[root@arm-chenrui hadoop]# protoc --version
libprotoc 2.5.0

Hadoop版本

我尝试了在Hadoop的2.8.52.9.23.2.0trunk (commit 0d1d7c86ec34fabc62c0e3844aca3733024bc172) 版本在ARM环境上的编译,除了trunk版本之外,都可以正常编译出包,使用的Maven命令如下:

mvn clean package -Pdist,native, -DskipTests -Dtar -Dmaven.javadoc-skip=true

trunk版本由于引入了新模块hadoop-yarn-csi,导致需要使用protoc-gen-grpc-java的二进制 执行文件编译*.proto文件,但是grpc没有向Maven仓库提供对应的1.15.1 版本的aarch64包,导致编译缺少依赖包失败。2016年就有人在github上向grpc团队反馈这个问题, 但是一直没有解决。

下面是具体的错误信息和编译过程的录屏:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:13 min
[INFO] Finished at: 2019-05-17T09:28:55+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.1:compile-custom (default) on project hadoop-yarn-csi: Missing:
[ERROR] ----------
[ERROR] 1) io.grpc:protoc-gen-grpc-java:exe:linux-aarch_64:1.15.1
[ERROR]
[ERROR]   Try downloading the file manually from the project website.
[ERROR]
[ERROR]   Then, install it using the command:
[ERROR]       mvn install:install-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.15.1 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR]
[ERROR]   Alternatively, if you host your own repository you can deploy the file there:
[ERROR]       mvn deploy:deploy-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.15.1 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR]
[ERROR]   Path to dependency:
[ERROR]         1) org.apache.hadoop:hadoop-yarn-csi:jar:3.3.0-SNAPSHOT
[ERROR]         2) io.grpc:protoc-gen-grpc-java:exe:linux-aarch_64:1.15.1
[ERROR]

asciicast

Maven多处理器架构的支持

通过Hadoop主干代码hadoop-yarn-csi模块的pom.xml文件,我们发现了一种Maven仓库对于 多处理器架构的支持方式。下面的代码片段利用了Maven仓库的内建能力支持一个依赖包不同的操作系统 和平台版本。我们看protocprotoc-gen-grpc-java 这两个Maven plugin,它们都将不同操作系统和架构的可执行文件推到了Maven仓库中,然后通过 ${os.detected.classifier}自动识别编译平台环境,自动下载对应的可执行文件。这种方式可以 很方便的解决编译时的多平台问题。

<plugin>
    <groupId>org.xolstice.maven.plugins</groupId>
    <artifactId>protobuf-maven-plugin</artifactId>
    <version>${protobuf-maven-plugin.version}</version>
    <configuration>
        <protocArtifact>com.google.protobuf:protoc:3.6.1:exe:${os.detected.classifier}</protocArtifact>
        <pluginId>grpc-java</pluginId>
        <pluginArtifact>io.grpc:protoc-gen-grpc-java:1.15.1:exe:${os.detected.classifier}</pluginArtifact>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>compile</goal>
                <goal>compile-custom</goal>
            </goals>
        </execution>
    </executions>
</plugin>

问题

通过这些调研,我们发现一些问题,其中最主要的一个就是Hadoop项目缺少ARM环境的CI,导致ARM 环境编译失败的问题被合入主干,可能问题也会进入正式的release。虽然Apache社区有一个项目bigtop 对于Hadoop在多种OS和arch上有集成验证,但验证的却是Hadoop的历史版本2.8.1, 不能检测出trunk上引入的问题。这个问题我们后续会跟Apache社区以及Hadoop项目维护者一起讨论 解决方案。一种可行的方式是:OpenLab作为第三方的CI系统,提供ARM验证环境给Hadoop项目,当前 CNCF社区的containerd已经接 入OpenLab的ARM CI,当然最终还是要看Hadoop项目和Apache社区是否接受。

2020/01/14 更新

当前我们团队已经在社区上游完成了所有 aarch64 兼容性问题的修复,并将所有代码都提交 到了 Hadoop 上游社区和其依赖库的上游社区,包括:protobuf 升级、protoc-gen-grpc-java 的 ARM 支持和升级,leveldbjni 和 rocksdbjni 的 ARM 支持等,并在 Hadoop 中修复了 所有与 aarch64 相关的测试用例失败问题,现在可以直接 git clone Hadoop 的主干代码 在 ARM64 的服务器上编译部署,下面是我在我的华为云鲲鹏计算实例上的编译结果:

zuul@ubuntu:~/workspace/github.com/apache/hadoop$ uname -a
Linux ubuntu 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue Nov 26 16:31:09 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ cat /etc/issue
Ubuntu 16.04.6 LTS \n \l
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ mvn clean install -Paarch64 -Pnative -Drequire.snappy -Drequire.zstd -Drequire.openssl -Drequire.isal -Drequire.bzip2 -DskipTests
...
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  0.140 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [  0.274 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.032 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  0.428 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [  0.551 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.031 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14:25 min
[INFO] Finished at: 2020-01-14T09:16:17Z
[INFO] ------------------------------------------------------------------------
zuul@ubuntu:~/workspace/github.com/apache/hadoop$ mvn package -Paarch64 -Pnative -Drequire.snappy -Drequire.zstd -Drequire.openssl -Drequire.isal -Drequire.bzip2 -Pdist -Dtar -Dmaven.javadoc.skip -DskipTests
...
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [  0.164 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  0.064 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 36.762 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.027 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  0.409 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [  0.405 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.026 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  10:49 min
[INFO] Finished at: 2020-01-14T09:29:18Z
[INFO] ------------------------------------------------------------------------

注意:我启用了 aarch64native 的 profile,以及第三方依赖库的编译参数,为了 快速出包跳过了所有测试,根据经验所有的 Hadoop 测试在 ARM64 服务器(4C 16G)上 执行大约用时 8 小时左右。

后续我们团队还会继续在 Hadoop、Spark、Flink、Hive、HBase 等项目中推动对于 ARM64 服务器的支持,不止限于大数据领域,基础库、数据库、Web、云平台都会涉及。大家如果 遇到任何与 aarch64 相关的问题,可以加入 Slack 频道 armserverecosystem.slack.com 一起讨论。



Published

20 May 2019

Category

ARM

Tags