hadoop.security.auth_to_local examples

In my previous post “An important Hadoop security configuration parameter you may have missed” I was talking about importance of the hadoop.security.auth_to_local configuration parameter and promised to provide some solutions using this parameter.

I want to focus on a couple of practical use examples in this post, and if you want to learn more about this, here are links to the existing documentation:

Continue reading

Solution for Lenovo ThinkPad problem with sleeping mode in Windows 8.1

upd: Even better solution is to use the latest Windows 8.1 ISO which can downloaded using the following Microsoft’s tool. It instantly detects the video adapter and installs the required driver.

This seems to be a standard problem for many Lenovo laptops: after Windows 8 enters sleeping mode, computer does not wake up – all the lights and fans are on, but display remains always black, until reset.

Support forums did not help much – people talk about this problem for a couple of years already, some advice updating BIOS, but this did not work in my case. Only the stock image from Lenovo Recovery CDs seemed not to have this problem (but did have Superfish and stuff ;-) )

Solution appeared to be quite easy – I just needed to install the latest Intel HD Graphics Driver :) That’s it. Here’s the location of this driver:

It just seems that Lenovo System Update software does not update this driver automatically.



Kaspersky Antivirus appears to became just another bloatware nowadays

So much disappointed… After all these years I finally decided to buy Kaspersky, and appeared it became just another bloatware now. Guess the folks don’t care about their firm’s karma anymore…


Configuring Cloudera Navigator to use external authentication

Cloudera, author of one of the most popular Hadoop distributions, has created a great tool for Hadoop security monitoring and auditing, called Cloudera Navigator. I find its initial configuration process a little bit tricky, so I wanted to document it in this post. Cloudera’s original document on how to do this is located here:

I currently use the latest version of Cloudera Hadoop distribution with Cloudera Manager 5.3.1 (trial enterprise license) and Navigator 2.2.1. It openly shows its full version and build in a tool-tip on its logo and in ‘About’ section right at the login page (so in case there’s a vulnerability published in future, hackers won’t need to spend time finding out target’s version ;-) ):


Continue reading

Plans for the weekend – firmware for my OLED business card project

Making another attempt to reactivate my OLED business card project. Due to lack of time and other priorities, I had to stop it for several months.
I think that I’ve find an optimal hardware design and power source. Given limitations of ATtiny85, I started writing firmware in assembly, now I am starting it again in C.

The board on picture was assembled to allow me work at kitchen table while watching my kids :)

At the picture (left to right): Adafruit Monochrome 128×32 I2C OLED graphic display (SSD1306), 9V battery, DYI ATtiny85 dev board, AVRISP mkII programmer


An important Hadoop security configuration parameter you may have missed

Hadoop has one security parameter, which importance I think is not stressed well enough in currently published documentation. While there are instructions on how to configure it, I did not see anyone talking about the consequences of leaving this parameter with its default value, and as far as I know, almost nobody ever changes it due to complexity. This parameter is

hadoop.security.auth_to_local – “Maps kerberos principals to local user names”

(description from current core-default.xml)

It’s telling Hadoop how to translate Kerberos principals into Hadoop user names. By default, it simply translates <user>/<part2>@<DOMAIN> into <user> for default domain (ignores the 2nd part of Kerberos principal). Here’s what current Apache Hadoop documentation says about it:

“By default, it picks the first component of principal name as a user name if the realms matches to the default_realm (usually defined in /etc/krb5.conf). For example, host/full.qualified.domain.name@REALM.TLD is mapped to host by default rule.”

This means that for example if you have users with names hdfs, Alyce and Bob, and they use the following principals to authenticate with your cluster:

Alyce – alyce@YOUR.DOMAIN,

If auth_to_local is not configured in your cluster, those are actually not the only principals that can authenticate as your Hadoop users, because the following principals, if exist, will also become your HDFS, Alyce and Bob per the default mapping:

hdfs/host123.your.domain@YOUR.DOMAIN => hdfs
hdfs/clusterB@YOUR.DOMAIN => hdfs
alyce/team2@YOUR.DOMAIN => Alyce
alyce/something.else@YOUR.DOMAIN => Alyce
bob/library@YOUR.DOMAIN => Bob
bob/research@YOUR.DOMAIN => Bob

… (very, very large list of possible combinations of second part of Kerberos principal and domain name) …

hdfs/<anything>@YOUR.DOMAIN is HDFS
alyce/<anything>@YOUR.DOMAIN is Alyce
bob/<anything>@YOUR.DOMAIN is Bob

For many regulatory bodies and auditing companies, this is a baseline security requirement for every user on the system to have only one unique identity. As we just learned, in Hadoop, by default, users de-facto can be identified with almost an infinite number of IDs. And this can be exploited by malicious users inside company to get access to sensitive data or fully take over control of the cluster.

Let’s look at an example:

First, user Bob with principal bob@LAB.LOCAL uploads a file ‘secret.txt’ to his home directory in HDFS and ensures its protected by access lists:


Continue reading

Myth about hard-coded ‘hdfs’ superuser in Hadoop

I often hear about the hard-coded ‘hdfs’ superuser in Hadoop clusters, and various challenges around managing it in scenarios when there is more than one team in the same organization using Hadoop in their projects.

I think it’s very important to mention that there is no hardcoded ‘hdfs’ superuser in Hadoop. Name Node just gives admin rights to the system user name which started its process. So if you are starting Name Node as root (please don’t do this), your superuser name will be ‘root’. If you are starting it as ‘namenode’, this will make ‘namenode’ user a superuser.

Here’s what HDFS Permissions Guide says about this (quoting entire ‘Super-User’ section):

The super-user is the user with the same identity as name node process itself. Loosely, if you started the name node, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user. There is no persistent notion of who was the super-user; when the name node is started the process identity determines who is the super-user for now. The HDFS super-user does not have to be the super-user of the name node host, nor is it necessary that all clusters have the same super-user. Also, an experimenter running HDFS on a personal workstation, conveniently becomes that installation’s super-user without any configuration.

In addition, the administrator my identify a distinguished group using a configuration parameter. If set, members of this group are also super-users.

And that’s just HDFS admin. For other components of Hadoop ecosystem, they all have their own admin users, but some in default configurations will allow other components’ admin users manage them.

I guess this myth exists because the default system user name used to start HDFS daemons by majority of automated Hadoop installations is ‘hdfs’.

(and of course don’t forget about dfs.permissions.superusergroup and dfs.cluster.administrators)