Install HDFS
yum -y install openjdk8
curl -O https://dlcdn.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz
tar xzvf hadoop-3.3.2.tar.gz
export HADOOP_CLASSPATH=/root/hadoop-3.3.2/share/hadoop/tools/lib/*
export HADOOP_HOME="/root/hadoop-3.3.2"
export PATH="${HADOOP_HOME}/bin:$PATH"
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
<hadoop_home>/sbin/hdfs
<hadoop_home>/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<hadoop_home>/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
-
Format the system
hdfs namenode -format
-
Start NameNode and DataNode daemon:
start-dfs.sh
https://stackoverflow.com/questions/48129029/hdfs-namenode-user-hdfs-datanode-user-hdfs-secondarynamenode-user-not-defined
export HDFS_NAMENODE_USER="root" export HDFS_DATANODE_USER="root" export HDFS_SECONDARYNAMENODE_USER="root" export YARN_RESOURCEMANAGER_USER="root" export YARN_NODEMANAGER_USER="root"
-
Browse the web interface for the NameNode; by default available at: NameNode - http://localhost:9870
-
Make the HDFS directories required to execute MapReduce jobs:
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/<username>
- Copy file
hdfs -put <file> /user/
Copy from HDFS to S3 without writing temp files
https://stackoverflow.com/questions/67673048/is-it-possible-to-write-directly-to-final-file-with-distcp
hadoop distcp -direct hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1