How to configure Kerberos in AWS EMR ?
What is Kerberos?
Kerberos is an ancient name STRONGEST THREE HEADED WATCH DOG !!
If you are reading this article then I assume you already know about Kerberos or heard about it somewhere.
Just to re-phrase:
A network authentication protocol created by MIT. Kerberos uses secret-key cryptography to provide strong authentication so that passwords or other credentials aren’t sent over the network in an unencrypted format.
In this article, we will learn how to configure Kerberos in the AWS EMR cluster with Cluster dedicated KDC architecture. Regardless of the architecture that we choose, configuration steps almost remain the same.
There are basically two supported architecture:
1.Cluster dedicated KDC: KDC will be installed on EMR Master node itself
2. External KDC: Self-explanatory
Advantages of Cluster dedicated KDC
- Amazon EMR has full ownership of the KDC.
- The KDC on the EMR cluster is independent from centralized KDC implementations such as Microsoft Active Directory or AWS Managed Microsoft AD.
- Performance impact is minimal because the KDC manages authentication only for local nodes within the cluster.
Considerations and Limitations
- Kerberized clusters can not authenticate to one another, so applications can not interoperate. If cluster applications need to interoperate, you must establish a cross-realm trust between clusters, or set up one cluster as the external KDC for other clusters. If a cross-realm trust is established, the KDCs must have different Kerberos realms.
- You must create Linux users on the EC2 instance of the master node that correspond to KDC user principals, along with the HDFS directories for each user.
- User principals must use an EC2 private key file and
kinit
credentials to connect to the cluster using SSH.
Let’s Dive into Configuration Part
Log in to AWS and go to EMR console and create Security Configuration and choose “Enable Kerberos Authentication” option
Now Create an EMR cluster and make sure to use the Security configuration we just created.
Go to Advanced Option ->Security -> Security Configuration | and select configuration which we created earlier.
fill up the security settings “Realm and KDC admin password”. You will need this information at the later stage during the principal creation.
Press Create Cluster
Now SSH to nodes using the private key which used during provisioning the cluster
$ssh -i rishi-basis.pem hadoop@ec2–XX–16X–XXX–XX.us-west-2.compute.amazonaws.com
Let’s add a new user called “rishi” and add principal to run EMR Spark
Also, create hdfs directory for user “rishi”
Conclusion: We have learned to configure AWS EMR cluster with Kerberos.