Hadoop配置

Posted by SiyuanWang on May 20, 2018

Hadoop官网

Hadoop单机模式/伪分布式配置

Hadoop分布式配置

修改etc/hosts

    给节点起个名:
    127.0.0.1 HadoopMaster

安装ssh

    $ sudo apt-get install ssh
    $ sudo apt-get install pdsh

安装JDK,在bashrc添加:

    export JAVA_HOME=/usr/java/jdk_xxxx

etc/hadoop/core-site.xml:

添加

    <configuration>  
    <property>  
            <name>fs.defaultFS</name>  
            <value>hdfs://HadoopMaster:9000</value>  
    </property>  
    <property>  
            <name>hadoop.tmp.dir</name>  
            <value>/home/hadoop/hadoopdata</value>  
            <description>Abase for other temporary directories.</description>  
    </property>  
    </configuration> 

etc/hadoop/HDFS-site.xml:

添加

    <configuration>  
            <property>  
                    <name>dfs.replication</name>  
                    <value>1</value>  
            </property>  
    </configuration>  

ssh免密登录

    $ ssh-keygen -t rsa -P''-f ~/.ssh/id_rsa  
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys  
    $ chmod 0600 ~/.ssh/authorized_keys  

etc/hadoop/workers

HDFS操作

    格式化:         
    $ bin / hdfs namenode -format  
    打开DFS服务:        
    $ sbin / start-dfs.sh  
    在HDFS中创建目录
    $ bin/hdfs dfs -mkdir /user
    $ bin/hdfs dfs -mkdir /user/<username>
    $ bin/hdfs dfs -mkdir input
    向HDFS导入文件
    $ bin/hdfs dfs -put etc/hadoop/*.xml input
    运行示例程序
    $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z.]+'
    从HDFS导出文件
    $ bin/hdfs dfs -get output output
    停止HDFS服务
    $ sbin/stop-dfs.sh

## etc/hadoop/mapred-site.xml: 添加

    <configuration>  
    <property>  
            <name>mapreduce.framework.name</name>  
            <value>yarn</value>  
    </property>  
    <property>  
            <name>mapreduce.application.classpath</name>  
            <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>  
    </property>  
    </configuration>  

etc/hadoop/yarn-site.xml:

添加

    <configuration>  
            <property>  
                    <name>yarn.nodemanager.aux-services</name>  
                    <value>mapreduce_shuffle</value>  
            </property>  
            <property>  
                    <name>yarn.nodemanager.env-whitelist</name>  
                    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>  
            </property>  
    </configuration> 

启动Yarn

    $ sbin / start-yarn.sh  

停止Yarn

    $ stop-yarn.sh  

BugShooting

  • 遇到问题先参照Log文件
  • www.bing.com
  • Hadoop配置文件里面JDK路径不能有空格
  • 直接关闭防火墙
  • 建一个hadoop用户
  • chown chmod 多用用
  • hosts文件里面的主机名要注意不要有特殊字符