versela.blogg.se - Vultr ubuntu filezilla

$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xmlĪdd the following lines. $ sudo chown -R hadoop:hadoop /home/hadoop/hdfsĮdit hdfs-site.xml configuration file to define the location for storing node metadata, fs-image file. Ĭreate a directory for storing node metadata and change the ownership to hadoop. $ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xmlĪdd the following lines. $ hadoop versionĮdit the core-site.xml configuration file to specify the URL for your NameNode. $ cd /usr/local/hadoop/libĭownload the Javax activation file.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64Įxport HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"īrowse to the hadoop lib directory. $ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.shĪdd the following lines to the file. $ readlink -f /usr/bin/javacĮdit the hadoop-env.sh file. To configure these components such as YARN, HDFS, MapReduce, and Hadoop-related project settings, you need to define Java environment variables in hadoop-env.sh configuration file.įind the OpenJDK directory. Hadoop has a lot of components that enable it to perform its core functions. export HADOOP_HOME=/usr/local/hadoopĮxport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeĮxport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binĮxport HADOOP_OPTS="=$HADOOP_HOME/lib/native"Īctivate the environment variables. $ sudo nano ~/.bashrcĪdd the following lines to the file. $ sudo chown -R hadoop:hadoop /usr/local/hadoopĮdit file ~/.bashrc to configure the Hadoop environment variables. $ sudo mkdir /usr/local/hadoop/logsĬhange the ownership of the hadoop directory. $ sudo mv hadoop-3.3.1 /usr/local/hadoopĬreate directory to store system logs. Move the extracted directory to the /usr/local/ directory. To get the latest version, go to Apache Hadoop official download page. $ sudo su - hadoopĭownload the latest stable version of Hadoop.

Verify if the password-less SSH is functional. $ sudo cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keysĬhange the permissions of the authorized_keys file. $ ssh-keygen -t rsaĪdd the generated public key from id_rsa.pub to authorized_keys. When you get a prompt, respond with: keep the local version currently installed $ apt install openssh-server openssh-client -y

$ sudo adduser hadoopĪdd the hadoop user to the sudo group. Create Hadoop User and Configure Password-less SSHĪdd a new user hadoop. $ sudo apt install default-jdk default-jre -y

Create a non-root user with sudo access.

Deploy a fully updated Vultr Ubuntu 20.04 Server.

In this article, you will learn how ro install and configure Apache Hadoop on Ubuntu 20.04. It is Java-based and uses Hadoop Distributed File System (HDFS) to store its data and process data using MapReduce. Apache Hadoop is an open-source software framework used to store, manage and process large datasets for various big data computing applications running under clustered systems.