This project has retired. For details please refer to its Attic page.

Analyze Big Data Platforms For Security and Performance

Data Classification Tutorial

Apache Eagle data classification feature provides the ability to classify data with different levels of sensitivity. Currently this feature is available ONLY for applications monitoring HDFS, Apache Hive and Apache HBase. For example, HdfsAuditLog, HiveQueryLog and HBaseSecurityLog.

The main content of this page are

  • Connection Configuration
  • Data Classification

Connection Configuration

To monitor a remote cluster, we first make sure the connection to the cluster is configured. For more details, please refer to Site Management

Data Classification

After the configuration is The first part is about how to add/remove sensitivity to files/directories; the second part shows how to monitor these sensitive data. In the following, we take HdfsAuditLog as an example.

Part 1: Sensitivity Edit

  • add the sensitive mark to files/directories.

    • Basic: Label sensitivity files directly (recommended)

      HDFS classification HDFS classification HDFS classification

    • Advanced: Import json file/content

      HDFS classification HDFS classification HDFS classification

  • remove sensitive mark on files/directories

    • Basic: remove label directly

      HDFS classification HDFS classification

    • Advanced: delete lin batch

      HDFS classification

Part 2: Sensitivity Usage in Policy Definition

You can mark a particular folder/file as “PRIVATE”. Once you have this information you can create policies using this label.

For example: the following policy monitors all the operations to resources with sensitivity type “PRIVATE”.

sensitivity type policy

Copyright © 2015 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache Eagle, Eagle, Apache Hadoop, Hadoop, Apache HBase, HBase, Apache Hive, Hive, Apache Ambari, Ambari, Apache Spark, Spark, Apache Kafka, Kafka, Apache Storm, Storm, Apache Maven, Maven, Apache Tomcat, Tomcat, Apache Derby, Derby, Apache Cassandra, Cassandra, Apache ZooKeeper, ZooKeeper, Apache, the Apache feather logo, and the Apache project logo are trademarks of The Apache Software Foundation.