Carlos Aguni

Highly motivated self-taught IT analyst. Always learning and ready to explore new skills. An eternal apprentice.


HDFS to S3

20 May 2022 »

Install HDFS

yum -y install openjdk8
curl -O https://dlcdn.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz
tar xzvf hadoop-3.3.2.tar.gz 
export HADOOP_CLASSPATH=/root/hadoop-3.3.2/share/hadoop/tools/lib/*
export HADOOP_HOME="/root/hadoop-3.3.2"
export PATH="${HADOOP_HOME}/bin:$PATH"

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

<hadoop_home>/sbin/hdfs

<hadoop_home>/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

<hadoop_home>/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
  1. Format the system hdfs namenode -format

  2. Start NameNode and DataNode daemon: start-dfs.sh

    https://stackoverflow.com/questions/48129029/hdfs-namenode-user-hdfs-datanode-user-hdfs-secondarynamenode-user-not-defined

     export HDFS_NAMENODE_USER="root"
     export HDFS_DATANODE_USER="root"
     export HDFS_SECONDARYNAMENODE_USER="root"
     export YARN_RESOURCEMANAGER_USER="root"
     export YARN_NODEMANAGER_USER="root"
    
  3. Browse the web interface for the NameNode; by default available at: NameNode - http://localhost:9870

  4. Make the HDFS directories required to execute MapReduce jobs:

hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/<username>
  1. Copy file
hdfs -put <file> /user/

Copy from HDFS to S3 without writing temp files

https://stackoverflow.com/questions/67673048/is-it-possible-to-write-directly-to-final-file-with-distcp

hadoop distcp -direct hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1