Introduction to Big Data & Hadoop

of 26

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
26 pages
0 downs
Course in Big Data Analytics in association with IBM Everyday huge amount of data is created. This data comes from everywhere : sensors used to gather climate information, post to social media sites, digital pictures and videos, purchase transaction records and Cell phone GPS signals to name a few. This data is Big Data. Big data is a blanket term for any collection of data set so large and complex that it becomes difficult to process using on hand data management tools or traditional data processing applications. The challenges include capture, storage, search, sharing, transfer, analysis and visualization. Anyone who has knowledge on Java, basic UNIX and basic SQL can opt for Big Data training course.
  • 1. 1
  • 2. What this Module 1 about ? After completing this Module, you should be able to: Understand what is Big Data and its characteristics Detailed Understanding about the need for a Big Data solution Understand where Big Data is appropriate List the IBM products that make up IBM’s Big Data strategy Describe the type of data appropriate for: - Infosphere BigInsights - Infosphere Streams List the open source programs that are a part of Infosphere BigInsights. 2
  • 3. System Of Units / Binary System of Units 3 International System Of Units(SI) Binary Usage(deprecated) Kilobyte KB 10^3 2^10 megabyte MB 10^6 2^20 gigabyte GB 10^9 2^30 terabyte TB 10^12 2^40 petabyte PB 10^15 2^50 exabyte EB 10^18 2^60 zettabyte ZB 10^21 2^70 yottabyte YB 10^24 2^80
  • 4. 2.5 petabytes Memory capacity of the human brain 13 petabytes Amount that could be downloaded from the internet in two minutes, if every American (300M) got on a computer at the same time 4.75 exabytes Total genome sequences of all people on the earth 422 exabytes Total digital data created in 2008  1 Zetabyte World’s current digital storage capacity 1.8 Zettabytes Total digital data expected to be created in 2011 4 BigData @ Scale
  • 5. Explosion in data and real world events 5Source : IBM internal :
  • 6. Commercial  Web Events / Data Base Logs  Sensor Networks  RFID  Internet Text and Documents  Internet Search Indexing  CDR (Call Detail Records)  Medical Records ….. Etc Government  Regular Government Business & Commerce Needs  Military & Homeland Security Surveillance 6 Examples Of BigData
  • 7. Science  Astronomy  Atmosphere  Biological  Genomics Social  Social Networks  Social Data 7 Examples Of BigData
  • 8. BigData @ Organizations 8Source:
  • 9. Perception gap surrounding social media 9Source: IBM internal
  • 10. Big Data Characteristics 10Source:
  • 11. Challenge @ BigData to find new insights: 11 Source: IBM Internal:
  • 12. Is there really a need for Big Data? 12 Source:
  • 13. Case Study and Implementation @ Vestas 13 Vestas wind systems has 43,000 wind turbines in 65 countries over 5 continents Customer Pain Point:  Optimal place to install wind turbine  Must consider large number of location dependant factors like temperature, precipitation, wind velocity and humidity  Existing legacy process doesn’t support all data to be analyzed  Analyzing the data must be completed in hours Solution Required:  Allow to leverage all available data, drastically reduce modeling time, support future expansions in modeling techniques.  Improve accuracy of decisions for wind turbine placement
  • 14. Case Study and Implementation @ Vestas 14 Implementation using InfoSphere BigInsights :  Has created a “wind and site competence center”  Engineers will be modeling data and forecasting optimal turbine locations  Initially to use publically available weather data from nation weather data services as well as own recorded weather data  Data sources considered: global deforestation metrics, satellite images, historical metrics, geospatial data  InfoSphere BigInsights will be used to as a core infrastructure to hold generated weather data
  • 15. Big Data presents big opportunities ? 15 Source:IBM Internal:
  • 16. Traditional Vs BigData approaches: 16 Source:
  • 17. 17 Merging the Traditional and Big Data Approaches Source:IBM Internal:
  • 18. Enterprise information architecture: Big Data will be a Permanent part of your Information architecture It cannot be a silo- It Must be fully integrated In order to leverage its Value It must be easy to deploy and integrate 18Source: IBM Internal:
  • 19. IBM Big Data platform strategy:  Integrate and manage the full variety, velocity and volume of Big Data  Apply advanced analytics to information in its native form  Visualize all available data for ad- hoc analysis  Development environment for building new analytic applications  Support workload optimization and scheduling  Provide for security and governance  Integrate with enterprise software 19
  • 20. IBM Big Data platform strategy: Source: 20
  • 21. Enterprise class BigData Product @ IBM: Failure Tolerance:  High availability architecture to support hardware or application failure. Scale Economically:  Runs on scalable hardware with the ability to dynamically add additional nodes. Security & Privacy:  Security protection for granular data access control. 21 Source: IBM internal
  • 22. Different BigInsights editions for varying needs 22Source:IBM Internal:
  • 23. Different BigInsights editions for varying needs Characteristics that distinguish BigInsights include its built- in support for analytics its integration with other enterprise software, and its production readiness. For InfoSphere BigInsights , there are Two Releases: Basic Edition Enterprise Edition 23
  • 24. Infosphere Streams: 24Source:IBM Internal:
  • 25. To Summarize • An enterprise-ready Big Data platform • Innovative, customer-tested products-InfoSphere BigInsights-InfoSphere Streams • Platform and products enabled for integration with the overall enterprise infrastructure • Even though BigInsights contains open source code-Licensing is like other IBM software offering 25
  • 26. Having completed this Module, you should be able to Understand need for a Big Data solution List the IBM products that make up IBM’s Big Data Strategy Describe the type of data appropriate for: -InfoSphere BigInsights -InfoSphere Streams List the open source programs that are a part of InfoSphere BigInsights 26 To Summarize
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks