HDFS – Hadoop Distributed File System

HDFS definition

HDFS (Hadoop Distributed File SyHDFsstem) is Hadoop’s primary file storage System.

Works well with large volumes of data, reduces I/O, high scalability, and availability and fault tolerance due to data replication.

The Hadoop file system is typically used as a column-oriented database management system called HBase.

Main components

  • NameNode: There is only one in the cluster. is responsible for:
    • Regulate the access to the files by the clients.
    • Keep in memory the file system metadata.
    • Control the file blocks that each DataNode has.
  • DataNode: They are in charge of reading and writing the requests of the clients and of replicating the blocks in the different nodes.


Commands for manipulating HDFS files

There are two ways to query and manipulate files HDFS by command line: “Hadoop FS” and “HDFS Dfs”

The difference is that FS indicates a generic file system that can point to any file system, such as local FS, HFTP FS, S3 FS, and others like HDFS. On the contrary “HDFs” is specific for the HDFs file system.

Commands to manipulate files with “HADOOP FS”

These commands are executed from the command line, and before you can use them you need to start the Hadoop service:

$ hadoop/sbin/start-dfs.sh 
$ hadoop/sbin/start-yarn.sh

Reset the structure to delete past references.

$ hadoop namenode -format

Copy local file to the data structure:

$ hadoop fs -put /ruta-local/ficheroLocal.txt /ruta-hdfs/ficheroHDFS.txt 
$ hadoop fs -put /home/datos/cosumos.csv /user/hadoop/consumos/consumos.css

Copy files from the structure to the Local:

$ hadoop fs -get /ruta-hdfs/ficheroHDFS.txt /rutalocal/ficheroLocal.txt

List the contents of the directory:

$ hadoop fs -ls /

Display the contents of a file in the structure:

$ hadoop fs -cat /ruta-hdfs/ficheroHDFS.txt

Create a directory:

$ Hadoop FS-mkdir MiDirectorio

Recursively create a directory:

$ hadoop fs -mkdir -p miDirectorio/subdirectorio

Delete a directory and all its contents:

$ hadoop fs -rm -r miDirectorio


Commands to manipulate files with “HDFS DFS”

List main directory

hdfs dfs -ls /

List subdirectory “Test”

hdfs dfs -ls /prueba

Copy Local files to the FS data structure

hdfs dfs -copyFromLocal /directorio_local/ /directorio_hdfs/

Copy files from the FS to Local data structure

hdfs dfs -get /directorio_hdfs/ /directorio_local/

Other commands: appendToFile, cat, chgrp, chmod, chown, copyFromLocal, copyToLocal, count, cp, du, dus, expunge, get, getfacl, getmerge, ls, lsr, mkdir, moveFromLocal, moveToLocal, mv, put, rm, rmr, setfacl, setrep, stat, tail, test, text, touchz.

Source: Official WEB commands

Otros artículos que pueden ser de interés:

Autor: Diego Calvo