如需转载,请标明原文出处以及作者

陈锐 RuiChen @kiwik

2019/06/27 16:39:02


背景

今天开始在aarch64上编译 Hive,Hive 80%的代码由 Java 组成,11%是 HiveQL 。Hadoop、Spark、HBase 在 ARM 平台上都遇到了问题,果然 Hive 也不能幸免。

编译环境

编译环境的配置请参考我的另一篇博客《在ARM64上编译Hadoop》, 在这里仅列出基本的编译环境信息:

  • OS: CentOS 7.6
  • Arch: aarch64
  • Host: 华为云 ARM 公测实例
  • Hive: git commit 967a1cc98beede8e6568ce750ebeb6e0d048b8ea (2019-06主干)
  • Java: 1.8.0
  • Maven: 3.6.1

执行编译

Hive 的官方开发者文档在这里, 编译也使用 Maven,直接执行 Maven 命令如下:

mvn clean package -Pdist -DskipTests
[INFO] --- protoc-jar-maven-plugin:3.5.1.1:run (default) @ hive-standalone-metastore-common ---
[INFO] Resolving artifact: com.google.protobuf:protoc:2.5.0, platform: linux-aarch_64
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Hive Storage API 2.7.0-SNAPSHOT .................... SUCCESS [  7.637 s]
[INFO] Hive 4.0.0-SNAPSHOT ................................ SUCCESS [  0.843 s]
[INFO] Hive Classifications 4.0.0-SNAPSHOT ................ SUCCESS [  0.964 s]
[INFO] Hive Shims Common 4.0.0-SNAPSHOT ................... SUCCESS [  5.064 s]
[INFO] Hive Shims 0.23 4.0.0-SNAPSHOT ..................... SUCCESS [  7.993 s]
[INFO] Hive Shims Scheduler 4.0.0-SNAPSHOT ................ SUCCESS [  3.849 s]
[INFO] Hive Shims 4.0.0-SNAPSHOT .......................... SUCCESS [  3.413 s]
[INFO] Hive Standalone Metastore 4.0.0-SNAPSHOT ........... SUCCESS [  2.148 s]
[INFO] Hive Standalone Metastore Common Code 4.0.0-SNAPSHOT FAILURE [  1.063 s]
[INFO] Hive Common 4.0.0-SNAPSHOT ......................... SKIPPED
[INFO] Hive Service RPC 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Serde 4.0.0-SNAPSHOT .......................... SKIPPED
[INFO] Hive Metastore 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Vector-Code-Gen Utilities 4.0.0-SNAPSHOT ...... SKIPPED
[INFO] Hive Llap Common 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Llap Client 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Llap Tez 4.0.0-SNAPSHOT ....................... SKIPPED
[INFO] Hive Spark Remote Client 4.0.0-SNAPSHOT ............ SKIPPED
[INFO] Hive Metastore Server 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive Query Language 4.0.0-SNAPSHOT ................. SKIPPED
[INFO] Hive Llap Server 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Service 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Accumulo Handler 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive JDBC 4.0.0-SNAPSHOT ........................... SKIPPED
[INFO] Hive Beeline 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive CLI 4.0.0-SNAPSHOT ............................ SKIPPED
[INFO] Hive Contrib 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Druid Handler 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive HBase Handler 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive JDBC Handler 4.0.0-SNAPSHOT ................... SKIPPED
[INFO] Hive HCatalog 4.0.0-SNAPSHOT ....................... SKIPPED
[INFO] Hive HCatalog Core 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive HCatalog Pig Adapter 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive HCatalog Server Extensions 4.0.0-SNAPSHOT ..... SKIPPED
[INFO] Hive HCatalog Webhcat Java Client 4.0.0-SNAPSHOT ... SKIPPED
[INFO] Hive HCatalog Webhcat 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive HCatalog Streaming 4.0.0-SNAPSHOT ............. SKIPPED
[INFO] Hive HPL/SQL 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Streaming 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Llap External Client 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive Shims Aggregator 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive Kryo Registrator 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive TestUtils 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Kafka Storage Handler 4.0.0-SNAPSHOT .......... SKIPPED
[INFO] Hive Packaging 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Metastore Tools 4.0.0-SNAPSHOT ................ SKIPPED
[INFO] Hive Metastore Tools common libraries 4.0.0-SNAPSHOT SKIPPED
[INFO] Hive metastore benchmarks 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive Upgrade Acid 4.0.0-SNAPSHOT ................... SKIPPED
[INFO] Hive Pre Upgrade Acid 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  34.174 s
[INFO] Finished at: 2019-06-26T19:45:47+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.github.os72:protoc-jar-maven-plugin:3.5.1.1:run (default) on project hive-standalone-metastore-common: Error resolving artifact: com.google.protobuf:protoc:2.5.0: Failure to find com.google.protobuf:protoc:exe:linux-aarch_64:2.5.0 in http://maven.aliyun.com/repository/central was cached in the local repository, resolution will not be reattempted until the update interval of alimaven has elapsed or updates are forced
[ERROR] 
[ERROR] Try downloading the file manually from the project website.
[ERROR] 
[ERROR] Then, install it using the command: 
[ERROR]     mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=2.5.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR] Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]     mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=2.5.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR] 
[ERROR]   com.google.protobuf:protoc:exe:2.5.0
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   alimaven (http://maven.aliyun.com/repository/central, releases=true, snapshots=false),
[ERROR]   apache.snapshots (https://repository.apache.org/snapshots, releases=false, snapshots=true)
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hive-standalone-metastore-common
[root@arm-chenrui hive]# git log -1

果然又是protoc的问题,与我在 HBase 中遇到的问题 一样,Hive使用的2.5.0 版本过低,Google官方在3.5.0 版本才开始支持aarch64平台的protoc

我们来看一下出问题的pom.xml文件,在hive-standalone-metastore-common模块里:

<plugin>
    <groupId>com.github.os72</groupId>
    <artifactId>protoc-jar-maven-plugin</artifactId>
    <version>3.5.1.1</version>
    <executions>
      <execution>
        <phase>generate-sources</phase>
        <goals>
          <goal>run</goal>
        </goals>
        <configuration>
          <protocArtifact>com.google.protobuf:protoc:2.5.0</protocArtifact>
          <addSources>none</addSources>
          <inputDirectories>
            <include>${basedir}/src/main/protobuf/org/apache/hadoop/hive/metastore</include>
          </inputDirectories>
        </configuration>
      </execution>
    </executions>
</plugin>

Hive 这里跟其他项目有些不同,没有使用官方的 Maven Protobuf 插件org.xolstice.maven.plugins:protobuf-maven-plugin, 而是用了com.github.os72下的插件,在 Maven 仓库中查了一下这个 group, 发现它也提供了一套 Protobuf 的插件,而且竟然在 protoc 的 2.6.1-build3 版本中支持了aarch64,这给我一些启发,是否可以用com.github.os72代替com.google.protobuf? 我决定试一下。

<protocArtifact>com.github.os72:protoc:2.6.1-build3</protocArtifact>
<protocArtifact>com.google.protobuf:protoc:2.5.0</protocArtifact>

按照上述方式替换了protoc库之后,重启编译,还是失败了,具体原因是 protoc 的 2.6.1-build3 依赖于 GLIBCXX_3.4.21CXXABI_1.3.9 ,但是在我的 CentOS 7.6 上系统默认只支持到 GLIBCXX_3.4.19CXXABI_1.3.7,又是这个依赖库的问题,我在执行 Spark 测试时也遇到了,详情见我的另一篇 博客《在 Arm64 上编译测试 Spark》。 个人认为这是 CentOS 的问题,不是 ARM 的问题,后续将尝试在Ubuntu 16.04上编译。

[INFO] Protoc command: /tmp/protoc5365700188245409184.exe
[INFO] Input directories:
[INFO]     /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore
[INFO] Output targets:
[INFO]     java: /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources (add: none, clean: false, plugin: null, outputOptions: null)
[INFO] /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources does not exist. Creating...
[INFO]     Processing (java): metastore.proto
protoc-jar: executing: [/tmp/protoc5365700188245409184.exe, -I/root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore, --java_out=/root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources, /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/protoc5365700188245409184.exe)

问题

整体看来 Hive 与 Hadoop、Spark 和 HBase 遇到的问题类似,涉及到protoc等一些本地库, 而且复杂的软件系统经常是由多种编程语言混合而成,这更增加了问题的复杂性。从我们与 Apache 社区的初步沟通发现,即使是社区的 PMC 也不能确定他们的项目是否能在 ARM 环境上运行, 想想这也正常,如果我们换几个类似的问题,估计也是没有人能够回答。

1. Hadoop 能否在 Java 12 上正常运行?
2. Hadoop 能否在 Windows Server 2019 上正常运行?

把复杂的软件系统从下到上,从硬件到软件的集成起来是一个系统工程,涉及到体系架构、操作系统、 基础语言库、第三方依赖库,以及众多的版本配套关系,这一切太复杂了,我们还是留给自动化的 ARM CI来解决吧。



Published

27 June 2019

Category

ARM

Tags