September 2017 – I heard that you have a great story about…

Install Apache Maven

$ brew install maven

Install Zookeeper

$ brew install zookeeper

After the installation is complete, it will display the following message:

To have launchd start zookeeper at login:
$ ln -sfv /usr/local/opt/zookeeper/*.plist ~/Library/LaunchAgents

Start Zookeeper:

$ launchctl load ~/Library/LaunchAgents/homebrew.mxcl.zookeeper.plist

NOTE: If you don’t want/need launchctl, you can just run:
$ zkServer start

Install ZeroMQ

$ brew install zeromq

Install Apache Storm

$ brew install storm

NOTE: This above command will install everything in Cellar folder and will have a symlink in “/usr/local/opt/storm” folder.

Setup Apache Storm and its components

Edit the Storm config file: storm.yam. This file is located in /usr/local/opt/storm/libexec/conf folder. Add the following lines:

storm.zookeeper.servers:
– “localhost”
# – “server2”
#
nimbus.host: “localhost”
nimbus.thrift.port: 6627
ui.port: 8772
storm.local.dir: “/Users/gmohr/storm/data”
java.library.path: “/usr/lib/jvm”
supervisor.slots.ports:
– 6700
– 6701
– 6702
– 6703

Make sure that whatever folder you create, they have the right permissions because I initially added “/home/user/storm/data” and did not realize that you cannot create any directory under “/home” folder. This was causing issues when I started nimbus and supervisor.
Start Zookeeper, Storm Nimbus, Supervisor and UI. The following list is the location of shell scripts. Start them in the given sequence.

$ zkServer start
$ /usr/local/opt/storm/libexec/bin/storm nimbus
$ /usr/local/opt/storm/libexec/bin/storm supervisor
$ /usr/local/opt/storm/libexec/bin/storm ui

Check that everything is running smoothly by running the following command:

$ jps
NOTE: The above command should give the following result.
5282 supervisor
5267 nimbus
5460 core
5735 Jps
4235 QuorumPeerMain

Check the Storm UI page. The index page: http://localhost:8772/index.html

Create a StormCrawler project

First off, run mvn command to generate a StormCrawler project

$ mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -DarchetypeArtifactId=storm-crawler-archetype -DarchetypeVersion=1.5.1

Delete the following files under the StormCrawler project folder
- src/main/java folder
- crawler.flux file
Go back to the StromCrawler file folder and copy:
- kibana folder
- es-conf.yaml
- es-crawler.flux
- ES_IndexInit.sh
- es-injector.flux
- README.md
- NOTE: Add a seeds.txt file and add URLs to crawl
Create a package

$ mvn clean package

Start crawling process

Inject URL to the crawler process and create URL seeds to crawl

$ storm jar target/stormcralwer-0.1.jar org.apache.storm.flux.Flux –local es-injector.flux –sleep 30000

Begin crawling

$ storm jar target/stormcralwer-0.1.jar org.apache.storm.flux.Flux –local es-crawler.flux –sleep 30000

After finishing crawling process, send topology to Nimbus. This will create a topology summary section on the Storm UI dashboard. The usage of this is to check crawler topology status

$ storm jar target/stormcralwer-0.1.jar org.apache.storm.flux.Flux –remote es-crawler.flux

NOTE: In order to debug, the log file location is $STORM_HOME/libexec/logs

Screenshot of Elasticsearch index in Kibana

I heard that you have a great story about…

Programming and some algorithm stuff

Month: September 2017

Setting up StormCrawler

	aluicivica on Installing PHP PDO Informix on…
	Adrian on Installing PHP PDO Informix on…
	Abdul Rahim on jQuery::flexigrid::How to get…