HDFSFileTransfer
| operating_system = Linux
| license = MIT License, Boost Software License
| website = http://sourceforge.net/projects/hdfsfiletransfer/
}}
HDFSFileTransfer is a tool written in 'bash' script for a quick transfer any files into HDFS. It allows users to copy files within the same physical machine as well as between two machines. It means to copy files from linux file system with HDFS cluster to another HDFS cluster. E.g. one can have two single Hadoop clusters installed on two different linux machines. The script can copy files from one linux machine with Hadoop installed to the other one. HDFSFileTransfer is dual-licensed under the MIT license and the Boost Software License, and its source code is freely available.
Features
* Local CopyThe script runs on the same machine HDFS was installed. What the script does is to:
- Copy files from local file system
- Paste the files into HDFS
- Archive the copied files on the local file system under the archive folder
Local_Copy.jpg|Local Copy
* Copy From One Machine To Another With Different HDFS Cluster
The script allows you to copy files from linux file system with HDFS cluster installed to another HDFS cluster. It is, however, not to copy from one HDFS cluster to another one. E.g. you can have two single clustered Hadoop installed on two different linux machines – in my particular situation I have done so using VM. The script can copy files from one linux machine with Hadoop installed on the other one.
Copy_From_One_to_Another.jpg|Copy from one to another
* Email errors encountered during the process
The process consists of few validations. Should the validation fails it gets the error status. The error process:
- terminates the transfer process
- sends an email containing the error message to a user/group of people
Email_-_Error_Message.jpg|Email - Error Message
Diagram Process
The process is broken down into two parts:- initial validations - there are three checks being done at the very beginning - each time the script starts up
- a 'while' loop - all the copy and archive bits - although it is done in loop (so can be run as a deamon), there is also an option to run the process each time you wish
Diagram_Process.jpg|Diagram Process
Validations
The list of all validations carried out within the copying process:Limitations
* Copy files with blank space(s) - Hadoop 2.4.0Problem:
There is a problem when trying to copy files for which their name contains spaces.
When doing the following:
hdfs dfs –put /home/testUser/New filename.txt /user/hduser/temp
This returns the following error:
put: unexpected URISyntaxException
Solution:
See the documentation of this project available on sourceforge website.
Comments