How to configure Kerberos in AWS EMR ?

Rishi Jain
3 min readApr 15, 2020

--

What is Kerberos?

Kerberos is an ancient name STRONGEST THREE HEADED WATCH DOG !!

If you are reading this article then I assume you already know about Kerberos or heard about it somewhere.

Just to re-phrase:

A network authentication protocol created by MIT. Kerberos uses secret-key cryptography to provide strong authentication so that passwords or other credentials aren’t sent over the network in an unencrypted format.

In this article, we will learn how to configure Kerberos in the AWS EMR cluster with Cluster dedicated KDC architecture. Regardless of the architecture that we choose, configuration steps almost remain the same.

There are basically two supported architecture:

1.Cluster dedicated KDC: KDC will be installed on EMR Master node itself

2. External KDC: Self-explanatory

Advantages of Cluster dedicated KDC

  • Amazon EMR has full ownership of the KDC.
  • The KDC on the EMR cluster is independent from centralized KDC implementations such as Microsoft Active Directory or AWS Managed Microsoft AD.
  • Performance impact is minimal because the KDC manages authentication only for local nodes within the cluster.

Considerations and Limitations

  • Kerberized clusters can not authenticate to one another, so applications can not interoperate. If cluster applications need to interoperate, you must establish a cross-realm trust between clusters, or set up one cluster as the external KDC for other clusters. If a cross-realm trust is established, the KDCs must have different Kerberos realms.
  • You must create Linux users on the EC2 instance of the master node that correspond to KDC user principals, along with the HDFS directories for each user.
  • User principals must use an EC2 private key file and kinit credentials to connect to the cluster using SSH.

Let’s Dive into Configuration Part

Log in to AWS and go to EMR console and create Security Configuration and choose “Enable Kerberos Authentication” option

Now Create an EMR cluster and make sure to use the Security configuration we just created.

Go to Advanced Option ->Security -> Security Configuration | and select configuration which we created earlier.

fill up the security settings “Realm and KDC admin password”. You will need this information at the later stage during the principal creation.

Press Create Cluster

Now SSH to nodes using the private key which used during provisioning the cluster

$ssh -i rishi-basis.pem hadoop@ec2–XX–16X–XXX–XX.us-west-2.compute.amazonaws.com

Let’s add a new user called “rishi” and add principal to run EMR Spark

Also, create hdfs directory for user “rishi”

Conclusion: We have learned to configure AWS EMR cluster with Kerberos.

--

--

Rishi Jain
Rishi Jain

Written by Rishi Jain

Software Support Engineer @StreamSets | Hadoop | DataOps | RHCA | Ex-RedHatter | Ex-Cloudera

No responses yet