Since Eagle 0.4.0
Configuring Apache Eagle on Cloudera is very similar to configuring it on Hortonworks, but still there are some difference. This tutorial is to address these issues before you continue to follow the tutorials originally prepared for Hortonworks.
To get Apache Eagle works on Cloudera, we need:
- Zookeeper (installed through Cloudera Manager)
- Kafka (installed through Cloudera Manager)
- Storm (
0.10.x, installed manually)
- Logstash (
2.X, installed manually on NameNode)
There are two configurations needed to be mentioned:
Open Cloudera Manager and open “kafka” configuration, then set
If Kafka cannot be started successfully, check kafka’s log. If stack trace shows:
“java.lang.OutOfMemoryError: Java heap space”. Increase heap size by setting
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"
- Step1: create a kafka topic (here I created a topic called “test”, which will be used in logstash configuration file to receive hdfsAudit log messages from Cloudera.
bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --replication-factor 1 --partitions 1 --topic test
- Step2: check if topic has been created successfully.
bin/kafka-topics.sh --list --zookeeper 127.0.0.1:2181
this command will show all created topics.
- Step3: open two terminals, start “producer” and “consumer” separately.
/usr/bin/kafka-console-producer --broker-list hostname:9092 --topic test /usr/bin/kafka-console-consumer --zookeeper hostname:2181 --topic test
- Step4: type in some message in producer. If consumer can receive the messages sent from producer, then kafka is working fine. Otherwise please check the configuration and logs to identify the root cause of issues.
You can follow logstash online doc to download and install logstash on your machine:
Or you can install it through
yum if you are using centos:
- download and install the public signing key:
rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
- Add the following lines in
/etc/yum.repos.d/directory in a file with a
.reposuffix, for example
[logstash-2.3] name=Logstash repository for 2.3.x packages baseurl=https://packages.elastic.co/logstash/2.3/centos gpgcheck=1 gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch enabled=1
- Then install it using
yum install logstash
Create conf file
Follow Apache Eagle online documentation to create logstash configuration file for Apache Eagle.
bin/logstash -f conf/first-pipeline.conf
Open a terminal and start a kafka consumer to see if it can receive the messages sent by logstash, if there is no message, double check the configuration parameters in conf file. Otherwise logstash is all set.
As Apache Storm is not in Cloudera’s stack, we need to install Storm manually.
/etc/profile, add this:
save the profile and then type:
to make it work.
storm/conf/storm.yaml, change the hostname to your own host.
Start Apache Storm
In Termial, type:
$: storm nimbus $: storm supervisor $: storm UI
Open storm UI in your browser, default URL is :
To download and install Apache Eagle, please refer to Get Started with Sandbox. .
One thing need to mention is: in
“/bin/eagle-topology.sh”, line 102:
If you are not using the default port number, change this to your own Storm UI url.
I know it takes time to finish these configuration, but now it is time to have fun!
HDFS Data Activity Monitoring with
Demo listed in HDFS Data Activity Monitoring.