Enabling Kerberos Authentication in IBM Spectrum Scale HDFS Transparency without Ambari

View Only

Enabling Kerberos Authentication in IBM Spectrum Scale HDFS Transparency without Ambari

By Linda Cham posted Fri April 17, 2020 10:48 AM

Like

IBM Spectrum Scale HDFS Transparency implementation integrates both the NameNodes and the DataNodes services and responds to the request as if it were HDFS on IBM Spectrum Scale file system (GPFS).

In HDFS, DataNodes contains the local disk for the file system and in order to scale the storage, more DataNodes are required to be added into the cluster. HDFS Transparency DataNodes can be viewed as the HDFS gateway to the IBM Spectrum Scale storage layer. The storage layer can be scaled separately from the DataNodes when using Elastic Storage Server (ESS).

In this PoC environment, the customer requirement is to reduce the number of nodes used by HDFS, especially the DataNodes. One of the issues they are facing is the high disk failure rates seen on their local disks in their Hadoop cluster which they need to manage. Because of this, IBM proposed to use ESS with HDFS Transparency as the solution to their problem.

In the customer environment, a new Hortonworks Data Platform (HDP®) with Ambari cluster was instantiated and a separate HDFS Transparency cluster was created that connects to the ESS.

To ensure successful deploy at the customer site, an internal PoC environment was created to test out setting up Kerberos with Open Source Apache Hadoop. This article will contain the ported and merged instructions executed at the customer site that have HDP and Ambari installed.

Internal PoC Environment

Software configuration

OS: Redhat 7.6

Spectrum Scale: 5.0.3.3

Open Source Apache Hadoop: 3.1.2

HDFS transparency: 3.1.0.3

Kerberos Server: krb5-server-1.15.1-34.el7.x86_64

Hardware configuration

ESS GL1S / v5.3.5.1

1 name node and 2 data nodes for HDFS Transparency with IBM Spectrum Scale.

4th deployment model

Hadoop Scale Storage Architecture

Prerequisites before enabling Kerberos

The Hadoop cluster is up and running. This cluster will be called “native Hadoop system” or “HDFS” or “native HDFS” in this blog.

The KDC server is up and running and provides Kerberos authentication for all of the Hadoop components including HDFS.

The ESS is up and running.

IBM Spectrum Scale clients required for the HDFS Transparency cluster are up and running and is part of the ESS Scale cluster.

The native Hadoop system, ESS cluster and the KDC server are able to communicate through the network.

Set up HDFS Transparency

Installation and configuration of HDFS transparency

Switch HDFS to HDFS Transparency

Configuration files

core-site.xml

hdfs-site.xml

workers

log4j.properties

hadoop-env.sh

gpfs-site.xml -- This file is for HDFS Transparency only

Change the fs.defaultFS parameter in core-site.xml under the HDP® and HDFS Transparency configuration directories.

Sync HDFS Transparency configurations

As root, on one of the HDFS Transparency nodes, start HDFS Transparency using mmhadoopctl command and verify that the NameNode and DataNodes are up and running.

Access IBM Spectrum Scale via HDFS Transparency

Enable Kerberos authentication with HDFS Transparency

Prerequisites

Key Distribution Center (KDC) server is up and running

Native Hadoop system is configured to work with Kerberos authentication on the KDC server.

Follow these steps to enable Kerberos on HDFS Transparency

Create the HDFS Transparency Principals and KeyTabs on the KDC server.

Host principals for the HDFS Transparency nodes

Service principals for the NameNode and DataNodes specific to the hosts that are running the services

User principals for the client accessing HDFS Transparency

hostname

Setting up the Kerberos clients on the HDFS Transparency nodes

The service principal for the NameNode service and the host principal are exported to KeyTab "nn.service.keytab"

The service principal for the NameNode HTTP service and the host principal are exported to KeyTab "spnego.service.keytab"

Modify the Kerberos related configuration files on all the HDFS Transparency nodes (NameNode and DataNodes)

Setting up the Kerberos clients on the HDFS Transparency nodes

Add Kerberos stanza information into HDFS Transparency configuration files.

core-site.xml

Add additional Kerberos stanza information into /var/mmfs/hadoop/etc/hadoop/hdfs-site.xml on the HDFS Transparency NameNode.

hdfs-site.xml

Configure HDFS Transparency to point to the correct IBM Spectrum Scale filesystem.

gpfs-site.xml

Configure HDFS transparency nodes

gpfs.storage.type=shared

After the configurations are modified, synchronized the configuration using mmhadoopctl from the HDFS Transparency NameNode.

Now the HDFS Transparency cluster is ready to get the Kerberos ticket
Execute “kinit” command on each of the HDFS Transparency nodes to get the Ticket Granting Ticket (TGT).

Start HDFS Transparency

Switch HDFS to HDFS Transparency

Access IBM Spectrum Scale via HDFS Transparency

su – hdfstr

Summary

Update /etc/krb5.conf based on the information of the native Hadoop system for all HDFS Transparency nodes
- default_realm name
- [realms]　definitions
- [domain_realm] definitions

Define the users and groups who require access: The user/group need to be the same on native Hadoop system and HDFS Transparency cluster.
For example, hdfstr:hadoop

The KeyTab files created should be copied to /etc/security/keytabs/ on all the nodes in the HDFS Transparency cluster.

Update /var/mmfs/Hadoop/etc/hadoop/core-site.xml based on the native Hadoop system information on the HDFS Transparency NameNode.

The fs.defaultFS needs to be set to the HDFS Transparency NameNode definition.
For example, hdfs://namename-hostname:8020

Enable Kerberos as the native Hadoop system by changing the following parameters in /var/mmfs/hadoop/etc/hadoop/core-site.xml:
- hadoop.security.authentication
- hadoop.security.authorization

Update /var/mmfs/Hadoop/etc/hadoop/hdfs-site.xml based on the native Hadoop system information on the HDFS Transparency NameNode.
- dfs.namenode.keytab.file
- dfs.namenode.kerberos.principal
- dfs.datanode.keytab.file
- dfs.datanode.kerberos.principal

Update /var/mmfs/Hadoop/etc/hadoop/gpfs-site.xml which is IBM Spectrum Scale specific configuration file on the HDFS Transparency NameNode with the following values to enable HDFS Transparency to access the IBM Spectrum Scale filesystem properly.
- gpfs.mnt.dir
- gpfs.data.dir
- gpfs.storage.type

Synchronize the updated config files from the HDFS Transparency NameNode.
Execute mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop/. Check no errors are detected.
Check that all the config files on all the NameNodes/DataNodes are correct.

Start HDFS Transparency with mmhadoopctl connector start command

On the native Hadoop client, login as the user authenticated by Kerberos to access IBM Spectrum Scale filesystem via HDFS Transparency.

#security
#kerberos
#HAdoop
#Customerexperienceandengagement
#SpectrumScaleandHadoop
#BigDataandAnalytics
#Datasecurity
#Softwaredefinedinfrastructure
#BDA
#Softwaredefinedstorage
#IBMSpectrumScale

0 comments

12 views

IBM Storage

The online community where IBM Storage users meet, share, discuss, and learn.

File and Object Storage