Elastic Storage Hadoop Cluster using LVM

Yashwanth Medisetti
4 min readMar 9, 2021

--

For any enterprise the crucial part of their business will always be the data that they posses. Managing the data is considered one of the most hectic tasks in the industry. Hadoop is that of kind of a tool that helps to manage the BigData. Using these clusters is quite common inside an IT infrastructure of a well established enterprise. In the same time is not shocking that these clusters run out of storage and we need to scale the storage up so that the data flow remains inside the company and between the clients. It is highly impossible task to manually attach and configure the storage devices and contribute to the cluster.

Here comes the key role of the concept of Logical Volume Mounts (LVM). Here loads of storage can be dumped inside a volume group and can be dynamically scaled in and out towards it’ s contribution to the cluster i.e., we can dynamically allot storage to the datanodes of the hadoop cluster to contribute towards the cluster.

Automating LVM on Hadoop Clusters

What are PVs , VGs and LVs ????

Physical Volume (PV) : It refers to the raw storage that is associated to the external devices like hard-disks , EBS (AWS) , etc.,. Initially PVs are created from the hardware disks.

Volume Groups (VG) : Here the loads of data brought by the PVs are dumped to form a heap of locally centralized data within the OS. Generally here the storages from multiple hardware devices are clubbed to form a single storage and treated as a new device with the storage of summation of individual hard-discs.

Logical Volumes (LV) : These are parts of VGs requested so as to allocate some storage to the path required for persistent storage of those files.

Heading to the practical demo >>>>>>

In this practical I have a hadoop cluster up and running with a single namenode and a single datanode. Datanode contributes some storage to the cluster i.e., initially the cluster is as follows ;

Initial Hadoop Cluster
PV-VG-LV

The datanode initially contributes 8GB of storage to the cluster. We will try to change the storage contributed using the concept of LVM.

For this we have 2 hard-disk devices of storage 10GB and 5GB respectively

We create Physical Volumes from these hard-disks and then club them to form a single pool of storage known as Volume Group and name it as “lvm-test”.

pvcreate /dev/xvdf /dev/xvdg

Creating the volume group ;

vgcreate lvm-test /dev/xvdf /dev/xvdg

With the VG create , we request some strorage from the VG as LV (say 6GB) , format the partition and finally mount it to the storage contributing directory of the datanode so that the storage can be stored with the cluster.

lvcreate -n lvm-part-1 — size 6G lvm-test

The formatting is done using a typical “ext4” type using the command

mkfs.ext4 /dev/lvm-test/lvm-part-1

After mounting the created logical volume to the desired path , we can find the change in the storage contributed towards the cluster.

Datanode contribution to cluster
Updated storage contribution to cluster

We can also automate this dynamic addition and deduction of storage contributed to the cluster using some automation scripts ;

Here I’ve create some python scripts that are capable of ;

  1. Creating volume groups from the physical volumes
  2. Creating Logical volumes from the Volume Groups
  3. Extending the attached LV’s storage

The 1st script is capable of creating Physical volumes from the hard-disk devices that are attached to the OS , 2nd script can be used to create an LV from a VG , format it using the ext4 type and finally mount it to the path required. The 3rd script is used to increase the already attached LV’s contribution by increasing it’s storage.

Here I’ve used the 3rd script to automatically increase the storage of the datanode’s contribution towards the cluster since I’ve already created the PVs, VGs and LVs. Running the script using the python we can achieve the following;

Executing python script to automate

I’ve increase the storage contribution by datanode to 10GB using the automation scripts.

Finally updated cluster

Get all the 3 automation scripts from below link ;

yashwanth312/lvm-automation-scripts (github.com)

Thanks for going through the article………. :)

--

--