Configure Hadoop and start Cluster Service Using Ansible Playbook
Hello Learner👨🏻💻!!!!
Here is one more article, so we all know ansible is an extra intelligent tool so in this article we configure the Hadoop cluster using Ansible playbook without going to DataNode and NameNode manually.
So Lets Start Article,
First, we discuss what is Hadoop? what is NameNode and what is DataNode?
🔰 Hadoop
Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
🔰 DataNode
DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is commodity hardware, that is, a non-expensive system that is not of high quality or high-availability
🔰 NameNode
NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.
So in this module, we launch the Hadoop cluster using Ansible Playbook over the AWS. Well, it is not so difficult to perform.
Let us understand step by step how we can do the 11.1 task:
For this, we can use AWS AMI which is available on AWS.
So for this task, we create three instances one is the Controller node and two managed nodes for configuring Hadoop Cluster.
🔰 11.1 Configure Hadoop and start cluster services using Ansible Playbook
First of all, we install ansible in our controller node for that we move to the root of AWS Linux and then we install ansible using the below command:
# sudo amazon-linux-extras install ansible2
Now,we check ansible successfully install or not using the below command:
# ansible — version
Creating Inventory in the controller node to manage or configure other nodes
Ansible inventory file defines the hosts and groups of hosts upon which commands, modules, and tasks in a playbook operate.
for this first write the detail in ansible.cfg file below the figure:
Go to vim/etc/ansible/ansible.cfg
When you press enter, you will find [defaults]. and write the text as shown in the image.
Now create an inventory file and write the details about IP address, Username, and Connection in the file.
So in AWS something is different for connectivity, for that we have not provided any security key till now so the next step to providing a security key is completely different as compare to local VMs.
for that In the Controller, node go to ssh and create security
# cd .ssh
# ssh-keygen
go to the authorized_keys file and copy the content and paste in the target node authorized
so come back to the controller node and check the connectivity using the below command:
# ansible all -m ping
Now, we write the Playbook to configure the Hadoop cluster.
For that we need to install some software:
- Install JDK.
- Install Hadoop.
- Configure the core-sitx.xml and hdfs-site.xml file.
- Create Directory.
- Format the NameNode Only.
- Start the Hadoop service in both Node.
- Check the Report.
I write a playbook for all the steps and run this playbook in both target node. In these two target nodes, one is the slave node and the other is the master node we write both names with different IP in the ip.txt file.
Here is the output to configure both nodes:
so here is successfully completed task 11
Thank you for reading this article.
The complete yml code is available on my git profile:
https://github.com/Simi16/Configure_Hadoop_using_Playbook
Keep Learning👨🏻💻