(1)hadoop2.7.1源码编译 | http://aperise.iteye.com/blog/2246856 |
(2)hadoop2.7.1安装准备 | http://aperise.iteye.com/blog/2253544 |
(3)1.x和2.x都支持的集群安装 | http://aperise.iteye.com/blog/2245547 |
(4)hbase安装准备 | http://aperise.iteye.com/blog/2254451 |
(5)hbase安装 | http://aperise.iteye.com/blog/2254460 |
(6)snappy安装 | http://aperise.iteye.com/blog/2254487 |
(7)hbase性能优化 | http://aperise.iteye.com/blog/2282670 |
(8)雅虎YCSBC测试hbase性能测试 | http://aperise.iteye.com/blog/2248863 |
(9)spring-hadoop实战 | http://aperise.iteye.com/blog/2254491 |
(10)基于ZK的Hadoop HA集群安装 | http://aperise.iteye.com/blog/2305809 |
lzo snappy gzip是hadoop支持的三种压缩方式,目前网上推荐snappy,这里讲解如何安装snappy
1.在linux上安装snappy压缩库
cd /root
wget https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
tar -zxvf snappy-1.1.3.tar.gz
#2.在linux上编译安装
cd /root/snappy-1.1.3
./configure
make
make install
#3.默认安装在/usr/local/lib,安装成功后文件如下:
github上Hadoop源码(https://github.com/apache/hadoop/blob/trunk/BUILDING.txt)推荐的安装方式为:sudo apt-get install snappy libsnappy-dev
2.编译Hadoop2.7.1源码中模块hadoop-common
当前Hadoop新的版本在模块hadoop-common中都已经集成了相关压缩库的编解码工具,无需去其它地方下载编解码打包:
如果之前编译过Hadoop源代码,这一步骤可以不做。
官网给定的安装包中是不支持snappy压缩的,需要自己重新编译Hadoop源码,而编译源码首先需要保证linux上已经安装了linux关于snappy的库,已经在步骤1中解决。
关于如何编译Hadoop源代码,请参见http://aperise.iteye.com/blog/2246856
1.下载hadoop源代码hadoop-2.7.1-src.tar.gz放置于/root下并解压缩
cd /root wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz tar -zxvf hadoop-2.7.1-src.tar.gz
2.准备Hadoop编译必备环境,请参见http://aperise.iteye.com/blog/2246856
3.单独编译打包hadoop-common,获取对于snappy压缩的支持(如果想编译整个工程请参见http://aperise.iteye.com/blog/2246856)
cd /root/hadoop-2.7.1-src export MAVEN_OPTS="-Xms256m -Xmx512m" mvn package -Pdist,native -DskipTests -Dtar -rf :hadoop-common -Drequire.snappy -X #如果想编译整个代码且单独指定snappy库位置,命令如下: #mvn package -Pdist,native,docs,src -DskipTests -Drequire.snappy -Dsnappy.lib=/usr/local/lib4.编译完成后,在/root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/lib/native下得到如下文件:
5.编译完成后,在/root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/share/hadoop/common下得到如下文件:
3.hadooo中添加snappy支持
#1.将步骤2中编译的snappy支持文件拷贝到Hadoop中 #这里我安装的Hadoop位置为/home/hadoop/hadoop-2.7.1 cp -r /root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/lib/native/* /home/hadoop/hadoop-2.7.1/lib/native/ cp /usr/local/lib/* /home/hadoop/hadoop-2.7.1/lib/native/ #2.将步骤3编译后的hadoop-common-2.7.1.jar文件拷贝到Hadoop #这里我安装的Hadoop位置为/home/hadoop/hadoop-2.7.1 cp -r /root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/share/hadoop/common/* /home/hadoop/hadoop-2.7.1/share/hadoop/common/ #3.修改hadoop的配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml,增加如下配置: <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.Lz4Codec, org.apache.hadoop.io.compress.SnappyCodec </value> <description>A comma-separated list of the compression codec classes that can be used for compression/decompression. In addition to any classes specified with this property (which take precedence), codec classes on the classpath are discovered using a Java ServiceLoader.</description> </property> #4.修改/home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml,添加如下内容: <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> <description>Should the job outputs be compressed? </description> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>RECORD</value> <description>If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. </description> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> <description>If the job outputs are compressed, how should they be compressed? </description> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> <description>Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. </description> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> <description>If the map outputs are compressed, how should they be compressed? </description> </property>
至此,Hadoop中已经可以支持snappy压缩算法了,hbase目前还待配置,请往下看。
4.hbase中添加snappy支持
mkdir -p /home/hadoop/hbase-1.2.1/lib/native/Linux-amd64-64
#2.拷贝Hadoop中所有本地库支持到hbase中
cp -r /home/hadoop/hadoop2.7.1/lib/native/* /home/hadoop/hbase-1.2.1/lib/native/Linux-amd64-64/
至此,hbase中已经添加了对于snappy的支持。
5.在hbase中测试snappy
cd /home/hadoop/hbase-1.2.1/bin/ ./hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://hadoop-ha-cluster/hbase/data/default/signal/3a194dcd996fd03c0c26bf3d175caaec/info/0c7f62f10a4c4e548c5ff1c583b0bdfa snappy
上面hdfs://hadoop-ha-cluster/hbase/data/default/signal/3a194dcd996fd03c0c26bf3d175caaec/info/0c7f62f10a4c4e548c5ff1c583b0bdfa是存在于我Hadoop上的HDFS文件
6.HDFS中如何查看压缩格式为snappy的文件
hdfs dfs -text /aaa.snappy
相关推荐
NULL 博文链接:https://aperise.iteye.com/blog/2282670
hadoop2.7.1+hbase2.1.4+zookeeper3.6.2集合
Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建
描述了centOS6.5(虚拟机)环境下,hadoop2.7.1+hbase1.3.5版本的安装配置过程,以及个人在安装过程中遇到的问题与总结
因为配置大数据的基础环境特别费事,因此这里搭建好了一份基础环境
hadoop集群配置流程以及用到的配置文件,hadoop2.8.4、hbase2.1.0、zookeeper3.4.12
数据仓库hadoop+zookeeper+hbase集群安装方法记录,自己搭建纯手写的记录。相关软件请自行下载
1、内容概要:Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+Elasticsearch+Redash等大数据集群及组件搭建指南(详细搭建步骤+实践过程问题总结)。 2、适合人群:大数据运维、大数据相关技术及组件初学者。 3、...
hadoop+hbase+hive集群搭建
七月在线七月在线## Note, this file is written by cloud-init on first boot of an instance
Hadoop+Zookeeper+HBase环境搭建,详细步骤和实例,从零开始搭建Hadoop集群
hadoop2.7.2 +hbase1.2.0 集群环境搭建软件资源
Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6 搭建分布式集群环境详解。 详细介绍了如何搭建分布式集群环境。
徐老师大数据培训Hadoop+HBase+ZooKeeper+Spark+Kafka+Scala+Ambari
Hadoop 2.6.0+Hbase1.12+mahout0.9 集群搭建,自己写的,可以根据实际情况搭建伪分布式或者完全分布式。
jdk1.8.0_131、apache-zookeeper-3.8.0、hadoop-3.3.2、hbase-2.4.12 mysql5.7.38、mysql jdbc驱动mysql-connector-java-8.0.8-dmr-bin.jar、 apache-hive-3.1.3 2.本文软件均安装在自建的目录/export/server/下 ...
hadoop2.2+hbase0.96+hive0.12安装整合详细高可靠文档及经验总结
Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0 集群安装详细步骤
亲手在Centos7上安装,所用软件列表 apache-flume-1.8.0-bin.tar.gz apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz hadoop-2.7.4.tar.gz hbase-1.3.1-bin.tar.gz jdk-8u144-linux-x64.tar.gz kafka_2.12-1.0.0.tgz ...
从零开始hadoop+zookeeper+hbase+hive集群安装搭建,内附详细配置、测试、常见error等图文,按照文档一步一步搭建肯定能成功。(最好用有道云打开笔记)