在 Arm64 上编译 Hive regexisart
如需转载,请标明原文出处以及作者
陈锐 RuiChen @kiwik
2019/06/27 16:39:02
背景
今天开始在aarch64
上编译 Hive,Hive 80%的代码由
Java 组成,11%是 HiveQL 。Hadoop、Spark、HBase 在 ARM 平台上都遇到了问题,果然 Hive
也不能幸免。
编译环境
编译环境的配置请参考我的另一篇博客《在ARM64上编译Hadoop》, 在这里仅列出基本的编译环境信息:
- OS: CentOS 7.6
- Arch: aarch64
- Host: 华为云 ARM 公测实例
- Hive: git commit
967a1cc98beede8e6568ce750ebeb6e0d048b8ea
(2019-06主干) - Java: 1.8.0
- Maven: 3.6.1
执行编译
Hive 的官方开发者文档在这里, 编译也使用 Maven,直接执行 Maven 命令如下:
mvn clean package -Pdist -DskipTests
[INFO] --- protoc-jar-maven-plugin:3.5.1.1:run (default) @ hive-standalone-metastore-common ---
[INFO] Resolving artifact: com.google.protobuf:protoc:2.5.0, platform: linux-aarch_64
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hive Storage API 2.7.0-SNAPSHOT .................... SUCCESS [ 7.637 s]
[INFO] Hive 4.0.0-SNAPSHOT ................................ SUCCESS [ 0.843 s]
[INFO] Hive Classifications 4.0.0-SNAPSHOT ................ SUCCESS [ 0.964 s]
[INFO] Hive Shims Common 4.0.0-SNAPSHOT ................... SUCCESS [ 5.064 s]
[INFO] Hive Shims 0.23 4.0.0-SNAPSHOT ..................... SUCCESS [ 7.993 s]
[INFO] Hive Shims Scheduler 4.0.0-SNAPSHOT ................ SUCCESS [ 3.849 s]
[INFO] Hive Shims 4.0.0-SNAPSHOT .......................... SUCCESS [ 3.413 s]
[INFO] Hive Standalone Metastore 4.0.0-SNAPSHOT ........... SUCCESS [ 2.148 s]
[INFO] Hive Standalone Metastore Common Code 4.0.0-SNAPSHOT FAILURE [ 1.063 s]
[INFO] Hive Common 4.0.0-SNAPSHOT ......................... SKIPPED
[INFO] Hive Service RPC 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Serde 4.0.0-SNAPSHOT .......................... SKIPPED
[INFO] Hive Metastore 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Vector-Code-Gen Utilities 4.0.0-SNAPSHOT ...... SKIPPED
[INFO] Hive Llap Common 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Llap Client 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Llap Tez 4.0.0-SNAPSHOT ....................... SKIPPED
[INFO] Hive Spark Remote Client 4.0.0-SNAPSHOT ............ SKIPPED
[INFO] Hive Metastore Server 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive Query Language 4.0.0-SNAPSHOT ................. SKIPPED
[INFO] Hive Llap Server 4.0.0-SNAPSHOT .................... SKIPPED
[INFO] Hive Service 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Accumulo Handler 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive JDBC 4.0.0-SNAPSHOT ........................... SKIPPED
[INFO] Hive Beeline 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive CLI 4.0.0-SNAPSHOT ............................ SKIPPED
[INFO] Hive Contrib 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Druid Handler 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive HBase Handler 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive JDBC Handler 4.0.0-SNAPSHOT ................... SKIPPED
[INFO] Hive HCatalog 4.0.0-SNAPSHOT ....................... SKIPPED
[INFO] Hive HCatalog Core 4.0.0-SNAPSHOT .................. SKIPPED
[INFO] Hive HCatalog Pig Adapter 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive HCatalog Server Extensions 4.0.0-SNAPSHOT ..... SKIPPED
[INFO] Hive HCatalog Webhcat Java Client 4.0.0-SNAPSHOT ... SKIPPED
[INFO] Hive HCatalog Webhcat 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive HCatalog Streaming 4.0.0-SNAPSHOT ............. SKIPPED
[INFO] Hive HPL/SQL 4.0.0-SNAPSHOT ........................ SKIPPED
[INFO] Hive Streaming 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Llap External Client 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive Shims Aggregator 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive Kryo Registrator 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] Hive TestUtils 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Kafka Storage Handler 4.0.0-SNAPSHOT .......... SKIPPED
[INFO] Hive Packaging 4.0.0-SNAPSHOT ...................... SKIPPED
[INFO] Hive Metastore Tools 4.0.0-SNAPSHOT ................ SKIPPED
[INFO] Hive Metastore Tools common libraries 4.0.0-SNAPSHOT SKIPPED
[INFO] Hive metastore benchmarks 4.0.0-SNAPSHOT ........... SKIPPED
[INFO] Hive Upgrade Acid 4.0.0-SNAPSHOT ................... SKIPPED
[INFO] Hive Pre Upgrade Acid 4.0.0-SNAPSHOT ............... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 34.174 s
[INFO] Finished at: 2019-06-26T19:45:47+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.github.os72:protoc-jar-maven-plugin:3.5.1.1:run (default) on project hive-standalone-metastore-common: Error resolving artifact: com.google.protobuf:protoc:2.5.0: Failure to find com.google.protobuf:protoc:exe:linux-aarch_64:2.5.0 in http://maven.aliyun.com/repository/central was cached in the local repository, resolution will not be reattempted until the update interval of alimaven has elapsed or updates are forced
[ERROR]
[ERROR] Try downloading the file manually from the project website.
[ERROR]
[ERROR] Then, install it using the command:
[ERROR] mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=2.5.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR]
[ERROR] Alternatively, if you host your own repository you can deploy the file there:
[ERROR] mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=2.5.0 -Dclassifier=linux-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR]
[ERROR]
[ERROR] com.google.protobuf:protoc:exe:2.5.0
[ERROR]
[ERROR] from the specified remote repositories:
[ERROR] alimaven (http://maven.aliyun.com/repository/central, releases=true, snapshots=false),
[ERROR] apache.snapshots (https://repository.apache.org/snapshots, releases=false, snapshots=true)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hive-standalone-metastore-common
[root@arm-chenrui hive]# git log -1
果然又是protoc
的问题,与我在 HBase 中遇到的问题
一样,Hive使用的2.5.0
版本过低,Google官方在3.5.0
版本才开始支持aarch64
平台的protoc
。
我们来看一下出问题的pom.xml
文件,在hive-standalone-metastore-common
模块里:
<plugin>
<groupId>com.github.os72</groupId>
<artifactId>protoc-jar-maven-plugin</artifactId>
<version>3.5.1.1</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<protocArtifact>com.google.protobuf:protoc:2.5.0</protocArtifact>
<addSources>none</addSources>
<inputDirectories>
<include>${basedir}/src/main/protobuf/org/apache/hadoop/hive/metastore</include>
</inputDirectories>
</configuration>
</execution>
</executions>
</plugin>
Hive 这里跟其他项目有些不同,没有使用官方的 Maven Protobuf 插件org.xolstice.maven.plugins:protobuf-maven-plugin
,
而是用了com.github.os72
下的插件,在 Maven 仓库中查了一下这个 group,
发现它也提供了一套 Protobuf 的插件,而且竟然在 protoc 的 2.6.1-build3
版本中支持了aarch64
,这给我一些启发,是否可以用com.github.os72
代替com.google.protobuf
?
我决定试一下。
<protocArtifact>com.github.os72:protoc:2.6.1-build3</protocArtifact>
<protocArtifact>com.google.protobuf:protoc:2.5.0</protocArtifact>
按照上述方式替换了protoc
库之后,重启编译,还是失败了,具体原因是 protoc 的 2.6.1-build3
依赖于 GLIBCXX_3.4.21
和 CXXABI_1.3.9
,但是在我的 CentOS 7.6 上系统默认只支持到 GLIBCXX_3.4.19
和 CXXABI_1.3.7
,又是这个依赖库的问题,我在执行 Spark 测试时也遇到了,详情见我的另一篇
博客《在 Arm64 上编译测试 Spark》。
个人认为这是 CentOS 的问题,不是 ARM 的问题,后续将尝试在Ubuntu 16.04
上编译。
[INFO] Protoc command: /tmp/protoc5365700188245409184.exe
[INFO] Input directories:
[INFO] /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore
[INFO] Output targets:
[INFO] java: /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources (add: none, clean: false, plugin: null, outputOptions: null)
[INFO] /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources does not exist. Creating...
[INFO] Processing (java): metastore.proto
protoc-jar: executing: [/tmp/protoc5365700188245409184.exe, -I/root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore, --java_out=/root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/target/generated-sources, /root/gopath/src/github.com/apache/hive/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /tmp/protoc5365700188245409184.exe)
/tmp/protoc5365700188245409184.exe: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/protoc5365700188245409184.exe)
问题
整体看来 Hive 与 Hadoop、Spark 和 HBase 遇到的问题类似,涉及到protoc
等一些本地库,
而且复杂的软件系统经常是由多种编程语言混合而成,这更增加了问题的复杂性。从我们与 Apache
社区的初步沟通发现,即使是社区的 PMC 也不能确定他们的项目是否能在 ARM 环境上运行,
想想这也正常,如果我们换几个类似的问题,估计也是没有人能够回答。
1. Hadoop 能否在 Java 12 上正常运行?
2. Hadoop 能否在 Windows Server 2019 上正常运行?
把复杂的软件系统从下到上,从硬件到软件的集成起来是一个系统工程,涉及到体系架构、操作系统、
基础语言库、第三方依赖库,以及众多的版本配套关系,这一切太复杂了,我们还是留给自动化的
ARM CI
来解决吧。