Hadoop – Big data
Hadoop – Big data… sounds good with an elephant picture!!!
Yes. So is an elephant, same is our large datasets collection.
It is an open source software platform managed by Apache Software Foundation. Huge data can be managed and stored cheaply and cost effective.
Basically it’s a way of storing enormous data sets across the distributed clusters of servers and the running distributed analysis application in every cluster effectively.
It is designed to be robust.
Why do organizations need it?
Well Big data is a collection of large volume of structured and unstructured data. No matter what amount the data is. But the relevant information gathered from the data is the vital key.
- Robust computing power
- Fault tolerant
- Low cost
What are the ways in getting data into Hadoop?
- Can use third party vendors(SAS Acces or SAS data loader for Hadoop)
- Use sqoop to import structured data from Relational Database to HDFS(Hadoop Distributed File system)
- Use Flumes
- Load files into system using java commands
- Mount HDFS as a file system and copy write and read files.
What comes under Bigdata?
Social media : Such as Facebook,twitter hold information and views posted by millions across the globe.
Stock exchange data : Buy and sell data made by different companies where the customers have the share
Black box data : subset helicopter planes and jet recording voice of crew, recording of microphones in aircraft
Power grid data and Search engine data
There are three types of data :
- Structured data : Relational data
- Semi structured data : XML
- Unstructured data : Word,Pdf,Text etc.
There are 2 classes of technology in handling data :
- Operational Big Data
- Analytical Big Data
Therefore overall it is vast subject to elaborate in few words. But not limited to the following challenges :
Who should go for this course?
- Software Developers and Architects
- Analytics Professionals
- Data management professionals
- Projects managers
- Graduates looking for making career in Big Data Analytics
- Anybody who’s is interested in big data Analytics
Prerequisite for this course:
Basics of programming language
Concept of OOPs
Basics of Linux/Unix Operating system
Understanding of basic SQL statements.