Analyze Big Data Platforms For Security and Performance

User Profile Tutorial

This document will introduce how to start the online processing on user profiles. Assume Apache Eagle has been installed and Eagle service is started.

User Profile Offline Training

  • Step 1: Start Apache Spark if not started Start Spark

  • Step 2: start offline scheduler

    • Option 1: command line

      $ cd <eagle-home>/bin
      $ bin/ --site sandbox start
    • Option 2: start via Apache Ambari Click "ops"

  • Step 3: generate a model

    Click "ops" Click "Update Now" Click "Confirm" Check

User Profile Online Detection

Two options to start the topology are provided.

  • Option 1: command line

    submit userProfiles topology if it’s not on topology UI

    $ bin/ --main --config conf/sandbox-userprofile-topology.conf start
  • Option 2: Apache Ambari

    Online userProfiles

Evaluate User Profile in Sandbox

  1. Prepare sample data for ML training and validation sample data
    • a. Download following sample data to be used for training
  2. Copy the files (downloaded in the previous step) into a location in sandbox For example: /usr/hdp/current/eagle/lib/userprofile/data/
  3. Modify <Eagle-home>/conf/sandbox-userprofile-scheduler.conf update training-audit-path to set to the path for training data sample (the path you used for Step 1.a) update detection-audit-path to set to the path for validation (the path you used for Step 1.b)
  4. Run ML training program from eagle UI
  5. Produce Apache Kafka data using the contents from validate file (Step 1.b) Run the command (assuming the eagle configuration uses Kafka topic sandbox_hdfs_audit_log)

     ./ --broker-list --topic sandbox_hdfs_audit_log
  6. Paste few lines of data from file validate onto kafka-console-producer Check http://localhost:9099/eagle-service/#/dam/alertList for generated alerts
Copyright © 2015 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache Eagle, Eagle, Apache Hadoop, Hadoop, Apache HBase, HBase, Apache Hive, Hive, Apache Ambari, Ambari, Apache Spark, Spark, Apache Kafka, Kafka, Apache Storm, Storm, Apache Maven, Maven, Apache Tomcat, Tomcat, Apache Derby, Derby, Apache Cassandra, Cassandra, Apache ZooKeeper, ZooKeeper, Apache, the Apache feather logo, and the Apache project logo are trademarks of The Apache Software Foundation.