Vital Hadoop Tools for Crunching Big Data _ Big Data Analytics News

May 29, 2016 | Author: mihirhota | Category: Types, Presentations

Share Embed Donate

Report this link

Short Description

publication...

Description

2/8/2014

Home

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

About

Big Data Use Cases

Analytics

Big Data

Hadoop

NoSQL

Hive

MongoDB

Impala

Big Data Events

Y OU ARE HERE : BIG DATA ANALY TICS NEWS » ANALY TICS » VITAL HADOOP TOOLS FOR CRUNCHING BIG DATA

 Categories Analytics Big Data Big Data Use Cases Cassandra Cloud Computing Cloudera Couchbase Events/Seminars Google Hadoop Hadoop Tutorials

Vital Hadoop tools for crunching Big Data  by bigdata

 07 February 2014

 Analytics, Big Data, Cassandra,

HBase

Hadoop, Hadoop Tutorials, HBase, Hive,

Hive

Impala, MapReduce, MongoDB, NoSQL,

Impala MapReduce

Pig

 Tags: Ambari, Apache Flume,

Apache Pig, Apache spark, Avro, Big Data, Hadoop, Hadoop Interview

MongoDB

Questions, Hadoop Tutorials, HBase,

NoSQL

HDFS, Hive, Hive Interview Questions,

Pig Predictive Analytics SAS Splunk Uncategorized

 Search

 Subscribe updates via Email Enter your email address:

Mahout, NoSQL, Oozie, Solr, SQL on Hadoop, sqoop, Zookeeper

 No

Comment Tw eet

1

Share

0

Like

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

55

Subscribe Delivered by FeedBurner

1/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

Big Data Analytics … Follow

+1

+ 32

Today, the most popularly term in IT world is ‘Hadoop’. Within a short span of time, Hadoop has grown massively and has proved to be useful for a large collection of diverse projects. The Hadoop community is fast evolving and has very prominent role in its ecosystem. Here is a look at the essential tools and codes that comes under the collective heading, ’Hadoop’. Hadoop:

 Recent Posts

When we think of Hadoop, the

Java and Analytics the next frontier

first thing that comes to our mind is the ‘map’ and ‘reduce’ tools. Generally, the entire group of map and reduce tools are termed as Hadoop, but the small pile of code at the center is referred as Hadoop, which is licensed under Apache. These codes are Java based and they synchronize worker nodes in executing a function stored locally. The results are then aggregated and reported. In the above process, the first

Vital Hadoop tools for crunching Big Data

step of ‘Aggregation’ is called as ‘Map’ and the second step http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

7 Tips to Succeed with Big Data in 2014 6 reasons your Big Data Hadoop project will fail in 2014 Neo4j, A Graph Database For Building Recommendation Engines, Gets A Visual Overhaul

 Archives February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 2/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

of ‘Reporting’ is called as

August 2013

‘Reduce.’ Hadoop allows programmers to concentrate on writing code for data analysis. Hadoop is also designed to work around faults and errors that are expected by individual machines.

 Meta Register Log in Entries RSS Comments RSS

Ambari: Ambari is an Apache project supported by Hortonworks. It offers a web based GUI (Graphical User Interface) with wizard scripts for setting up clusters with most of the standard components. Ambari provisions, manages and

WordPress.org

 Big Data

Java and Analytics the next frontier  08 February 2014

Vital Hadoop tools for crunching Big Data

monitors all the clusters of Hadoop jobs.

 07 February 2014

7 Tips to Succeed with Big Data in 2014

HDFS (Hadoop Distributed File System): The HDFS, distributed under Apache license offers a basic framework for splitting up data collections between multiple nodes. In HDFS, the large files are broken into blocks, where several nodes hold all of the blocks from a file. The file system is designed in a way to

 07 February 2014

6 reasons your Big Data Hadoop project will fail in 2014  06 February 2014

Neo4j, A Graph Database For Building Recommendation Engines, Gets A Visual Overhaul  05 February 2014

mix fault tolerance with high throughput. The blocks of HDFS are loaded to maintain steady streaming. They are not usually cached to minimize latency. HBase: http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

 Hadoop

Java and Analytics the next frontier  08 February 2014

Vital Hadoop tools for crunching Big 3/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

HBase is a column-oriented

Data

database management system that runs on top of HDFS.

 07 February 2014

HBase applications are written in Java, very much like the

7 Tips to Succeed with Big Data in 2014

MapReduce application. It comprises a set of tables,

 07 February 2014

where each table contains rows and columns like a traditional

6 reasons your Big Data Hadoop project will fail in 2014

database. When the data falls into the big table, HBase will

 06 February 2014

store the data, search it and automatically share the table across multiple nodes so that MapReduce jobs can run it locally. HBase offers a limited guarantee for some local changes. The changes that happen in a single row can succeed or fail at the same time.

 NoSQL

Vital Hadoop tools for crunching Big Data  07 February 2014

7 Tips to Succeed with Big Data in 2014  07 February 2014

Hive: If you are already fluent with SQL, then you can leverage Hadoop using Hive. Hive was developed by some folks at Facebook. Apache Hive regulates the process of extracting bits from all the files in HBase. It supports analysis

6 reasons your Big Data Hadoop project will fail in 2014  06 February 2014

Neo4j, A Graph Database For Building Recommendation Engines, Gets A Visual Overhaul  05 February 2014

of large datasets stored in Hadoop’s HDFS and compatible file systems. It also provides an SQL like language called HSQL (HiveSQL) that gets into the files and extracts the required snippets for the code. Sqoop:

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

4/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

Apache Sqoop is specially designed to transfer bulk data efficiently from the traditional databases into Hive or HBase. It can also be used to extract data from Hadoop and export it to external structured datastores like relational databases and enterprise data warehouses. Sqoop is a command line tool, mapping between the tables and the data storage layer, translating the tables into a configurable combination of HDFS, HBase or Hive. Pig: When the data stored is visible to Hadoop, Apache Pig dives into the data and runs the code that is written in its own language, called Pig Latin. Pig Latin is filled with abstractions for handling the data. Pig comes with standard functions for common tasks like averaging data, working with dates, or to find differences between strings. Pig also allows the user to write languages of their own, called UDF (User Defined Function), when the standard functions fall short. Zookeeper: Zookeeper is a centralized service that maintains, configures information, gives a name and provides distributed http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

5/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

synchronization across a cluster. It imposes a file system-like hierarchy on the cluster and stores all of the metadata for the machines, so we can synchronize the work of the various machines. ► Mapreduce Hadoop

NoSQL: Some Hadoop clusters integrate with NoSQL data stores that come with their own mechanisms for storing data across a cluster of nodes. This allows them to store and retrieve data with all the features of the NoSQL database, after which Hadoop can be used to schedule data analysis jobs on the same cluster. Mahout: Mahout is designed to implement a great number of algorithms, classifications and filtering of data analysis to Hadoop cluster. Many of the standard algorithms like Kmeans, Dirichelet, parallel pattern and Bayesian classifications are ready to run on the data with a Hadoop style Map and reduce. Lucene/Solr: Lucene, written in Java integrates easily with Hadoop http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

6/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

and is a natural companion for Hadoop. It is a tool meant for indexing large blocks of unstructured text. Lucene handles the indexing, while Hadoop handles the distributed queries across the cluster. Lucene-Hadoop features are rapidly evolving as new projects are being developed. Avro: Avro is a serialization system that bundles the data together with a schema for understanding it. Each packet comes with a JSON data structure. JSON explains how the data can be parsed. The header of JSON specifies the structure for the data, where the need to write extra tags in the data to mark the fields can be avoided. The output is considerably more compact than the traditional formats like XML. Oozie: A job can be simplified by breaking it into steps. On breaking the project in to multiple Hadoop jobs, Oozie starts processing them in the right sequence. It manages the workflow as specified by DAG (Directed Acyclic Graph) and there is no need for timely monitor. GIS Tools: http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

7/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

Working with geographic maps is a big job for clusters running Hadoop. The GIS (Geographic Information System) tools for Hadoop projects have adapted best Java based tools for understanding geographic information to run with Hadoop. The databases can now handle geographic queries using coordinates and the codes can deploy the GIS tools. Flume: Gathering all the data is equal to storing and analyzing it. Apache Flume dispatches ‘special agents’ to gather information that will be stored in HDFS. The information gathered can be log files, Twitter API, or website scraps. These data can be chained and subjected to analyses. Spark: Spark is the next generation that pretty much works like Hadoop that processes data cached in the memory. Its objective is to make data analysis fast to run and write with a general execution model. This can optimize arbitrary operator graphs and support in-memory computing, which lets it query data faster than disk-based engines like Hadoop. SQL on Hadoop: http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

8/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

When it’s required to run a quick ad-hoc query of all the data in the cluster, a new Hadoop job can be written, but this takes some time. When programmers started doing this more often, they came up with tools written in the simple language of SQL. These tools offer quick access to the results.

Related Posts Java and Analytics the next frontier February 8, 2014 I've been pretty verbal about Java going down the wrong path and ...

7 Tips to Succeed with Big Data in 2014 February 7, 2014 Information from Tableau Software What a year 2013 was for big data. From ...

6 reasons your Big Data Hadoop project will fail in 2014 February 6, 2014 By Steve Jones Ok so Hadoop is the bomb, Hadoop is the http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

9/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

schizzle, ...

Neo4j, A Graph Database For Building Recommendation Engines, Gets A Visual Overhaul February 5, 2014 Part of the problem with any powerful technology is how it is ...

Super Bowl ads need Big Data to be effective February 4, 2014 When the Denver Broncos take the field against the Seattle Seahawks, the ...

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

10/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

WHAT'S THIS?

AROUND THE WEB

Why

Man and 50

Wall Street Brokers

woman Photographs die while That Will trying to Blow Your

Refuse retrieve Mind ALSO to SellON BIGaDATA ANALYTICS PBH Network NEWS This cellphone Univa, MapR Proven from RDBMS Partner dominate the 18.79% Over Chicago EnterpriseInvestmentriver database The 4 Key 3 Tools but Grade Hadoop market, Moneynews YJNews Pillars of Companies Management NoSQL Hadoop Can Use are to 1 comment systems Performance Harness catching the up and Scalability Power of Big 0 Comments 1 comment 1 comment Data Sort by Best

Favorite 1 comment

Start the dis…

Be the first to comment.

« 7 Tips to Succeed with Big Data in 2014

 Tags

Analytics Big

Data

Java and Analytics the next frontier »

 Hot Topics

 About Us

About (1)

Welcome to Big Data Analytics News! The site is all about Big Data

Univa, MapR Partner Over

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

11/12

2/8/2014

Vital Hadoop tools for crunching Big Data | Big Data Analytics News

Big Data Analytics big data services business intelligence

Cloud Computing Cloudera Couchbase Database

Enterprise-Grade Hadoop Management (1) A Set of Hadoop-related open source icons (1) Improving the Big Data Toolkit (1) Couchbase rolls out native NoSQL databases for iOS, Android (1)

Data Science Data Scientists Data Warehousing EMC Facebook Google

Hadoop

Hadoop 2.x Hadoop Cluster Hadoop ETL Hadoop Interview Questions

information and provides the latest advances in Big Data, Hadoop, NoSql Databases, And Data Analytics. The site is the industry's online resource for exclusive stuff on Big Data. This site is dedicated to providing the latest news on Big Data, Big Data Analytics, Business intelligence, Data Warehousing, NoSql, Hadoop, Mapreduce, Hadoop Hive, HBase...Read more

Hadoop OpenStack

Hadoop system Hadoop Tutorials HBase HDFS Hive Hortonw orks IBM Impala Java JSON machine learning MapR

MapReduce Microsoft MongoDB MySQL

NoSQL Oracle Pig Predictive Analytics RDBMS R programming language Teradata YARN

Follow my blog with Bloglovin

Copyright © 2014. Big Data Analytics News

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/

12/12

Vital Hadoop Tools for Crunching Big Data _ Big Data Analytics News

Short Description

Description

Comments

We need your help!