Myth about hard-coded ‘hdfs’ superuser in Hadoop

I often hear about the hard-coded ‘hdfs’ superuser in Hadoop clusters, and various challenges around managing it in scenarios when there is more than one team in the same organization using Hadoop in their projects.

I think it’s very important to mention that there is no hardcoded ‘hdfs’ superuser in Hadoop. Name Node just gives admin rights to the system user name which started its process. So if you are starting Name Node as root (please don’t do this), your superuser name will be ‘root’. If you are starting it as ‘namenode’, this will make ‘namenode’ user a superuser.

Here’s what HDFS Permissions Guide says about this (quoting entire ‘Super-User’ section):

The super-user is the user with the same identity as name node process itself. Loosely, if you started the name node, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user. There is no persistent notion of who was the super-user; when the name node is started the process identity determines who is the super-user for now. The HDFS super-user does not have to be the super-user of the name node host, nor is it necessary that all clusters have the same super-user. Also, an experimenter running HDFS on a personal workstation, conveniently becomes that installation’s super-user without any configuration.

In addition, the administrator my identify a distinguished group using a configuration parameter. If set, members of this group are also super-users.

And that’s just HDFS admin. For other components of Hadoop ecosystem, they all have their own admin users, but some in default configurations will allow other components’ admin users manage them.

I guess this myth exists because the default system user name used to start HDFS daemons by majority of automated Hadoop installations is ‘hdfs’.

(and of course don’t forget about dfs.permissions.superusergroup and dfs.cluster.administrators)

Nice remark on password complexity

pwgen, my favorite Linux tool to generate random passwords that can be memorized:

-s, –secure
Generate completely random, hard-to-memorize passwords. These should only be used for machine passwords, since otherwise it’s almost guaranteed that users will simply write the password on a piece of paper taped to the monitor…

I like to see how alternative thinking in Information Security community is emerging. Good that we started to realize people are not robots, and commands and programming will never work here. This is basic risk management, to consider ‘human elements’ in any program. And while I am sure there are people who will disagree and bring up some very good and solid arguments, I don’t understand why in world of Information Security, one of the most modern and fast-evolving professions, we are still trying to rely on ideas that are decades old, and never really worked since…