Analyze Big Data Platforms For Security and Performance

HDFS Data Activity Monitoring Quick Start

Since Apache Eagle 0.3.0-incubating. Apache Eagle will be called Eagle in the following.

This Guide describes the steps to enable data activity monitoring of “HDFS File System”.

  • Prerequisite
  • Stream HDFS audit logs into Kafka1
  • Demos “HDFS Data Activity Monitoring”

Prerequisite

Stream HDFS audit logs into Kafka

Note: This section describes how to configure Kafka log4j to stream audit logs into Eagle platform. For another option to stream HDFS audit logs into Kafka using Logstash Click Here

  • Step 1: Configure Advanced hdfs-log4j via Ambari UI2, by adding below “KAFKA_HDFS_AUDIT” log4j appender to hdfs audit logging.

     log4j.appender.KAFKA_HDFS_AUDIT=org.apache.eagle.log4j.kafka.KafkaLog4jAppender
     log4j.appender.KAFKA_HDFS_AUDIT.Topic=sandbox_hdfs_audit_log
     log4j.appender.KAFKA_HDFS_AUDIT.BrokerList=sandbox.hortonworks.com:6667
     log4j.appender.KAFKA_HDFS_AUDIT.KeyClass=org.apache.eagle.log4j.kafka.hadoop.AuditLogKeyer
     log4j.appender.KAFKA_HDFS_AUDIT.Layout=org.apache.log4j.PatternLayout
     log4j.appender.KAFKA_HDFS_AUDIT.Layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
     log4j.appender.KAFKA_HDFS_AUDIT.ProducerType=async
    

    HDFS LOG4J Configuration

  • Step 2: Edit Advanced hadoop-env via Ambari UI, and add the reference to KAFKA_HDFS_AUDIT to HADOOP_NAMENODE_OPTS.

    -Dhdfs.audit.logger=INFO,DRFAAUDIT,KAFKA_HDFS_AUDIT
    

    HDFS Environment Configuration

  • Step 3: Edit Advanced hadoop-env via Ambari UI, and append the following command to it.

    export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/hdp/current/eagle/lib/log4jkafka/lib/*
    

    HDFS Environment Configuration

  • Step 4: save the changes

  • Step 5: “Restart All” Storm3 & Kafka from Ambari.

  • Step 6: Restart name node

Restart Services

  • Step 7: Check whether logs from “/var/log/hadoop/hdfs/hdfs-audit.log” are flowing into topic sandbox_hdfs_audit_log

      $ /usr/hdp/2.2.4.2-2/kafka/bin/kafka-console-consumer.sh --zookeeper sandbox.hortonworks.com:2181 --topic sandbox_hdfs_audit_log      
    


Demos

  • Login to Eagle UI http://localhost:9099/eagle-service/ using username and password as “admin” and “secret”
  • HDFS:
    1. Click on menu “DAM” and select “HDFS” to view HDFS policy
    2. You should see policy with name “viewPrivate”. This Policy generates alert when any user reads HDFS file name “private” under “tmp” folder.
    3. In sandbox read restricted HDFS file “/tmp/private” by using command

      hadoop fs -cat /tmp/private

    From UI click on alert tab and you should see alert for the attempt to read restricted file.


Footnotes

  1. All mentions of “kafka” on this page represent Apache Kafka.

  2. All mentions of “ambari” on this page represent Apache Ambari.

  3. Apache Storm.

Copyright © 2015 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache Eagle, Eagle, Apache Hadoop, Hadoop, Apache HBase, HBase, Apache Hive, Hive, Apache Ambari, Ambari, Apache Spark, Spark, Apache Kafka, Kafka, Apache Storm, Storm, Apache Maven, Maven, Apache Tomcat, Tomcat, Apache Derby, Derby, Apache Cassandra, Cassandra, Apache ZooKeeper, ZooKeeper, Apache, the Apache feather logo, and the Apache project logo are trademarks of The Apache Software Foundation.