Pentaho Big Data Analytics - Blue Prints for Big Data Success

of 10

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
10 pages
0 downs
12 views
Share
Description
Though the big data opportunity is growing rapidly, research indicates that the top two big data challenges that organizations face are determining how to get value out of big data and defining a big data strategy, respectively.
Transcript
  • 1. Copyright ©2014 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at pentaho.com. Blueprints for Big Data Success Succeeding with Four Common Scenarios
  • 2. Blueprints for Big Data Success PENTAHO 2 Contents Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Current Adoption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Emerging Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Data Warehouse Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 What Is It and Why Are Companies Investing In It?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 What Does It Look Like? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Key Project Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Streamlined Data Refinery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 What Is It and Why Are Companies Investing In It?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 What Does It Look Like? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Key Project Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Customer 360-Degree View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 What Is It and Why Are Companies Investing In It?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 What Does It Look Like? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Key Project Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Monetize My Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 What Is It and Why Are Companies Investing In It?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 What Does It Look Like? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Key Project Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
  • 3. Blueprints for Big Data Success PENTAHO 3 forming entire business models. Complexity ranges from entry-level implementations relying on fairly standard technologies to advanced cases relying on combinations of technologies, some of which are not largely commercialized. The use cases are marked either ‘Current Adoption’ or ‘Emerging Adoption.’ The former indicates more widely implemented use cases that follow fairly repeatable guidelines, while the lat- ter indicates implementations that are less common today – but are expected to appear more often in the future. This paper, discusses in detail the ‘Current Adoption’ use cases. Though the big data opportunity is growing rapidly, research indicates that the top two big data chal- lenges that organizations face are determining how to get value out of big data and defining a big data strategy, respectively.2 In light of these challenges, this piece intends to identify and explain big data use cases generating business results for compa- nies today, and shed light on emerging use cases expected in the near future. The following pages, address what these use cases are, why companies are investing in them, and their common reference architectures. Mapped out below are 10 key enterprise use cases for big data, categorized according to the ability to generate business impact (Y axis) as well as level of implementation complexity (X axis). Business impact ranges from optimizing current processes to trans- Introduction By now it’s become fairly clear that big data represents a big shift in the enterprise tech- nology landscape. IDC estimates that the amount of useful data worldwide will increase 20x between 2010 and 2020, while 77% of the data relevant to enterprises will be unstructured through 2015.1 As these volume and variety trends continue, companies are increasingly turning to Hadoop, NoSQL, and other tools to tackle information issues not readily addressable with older relational database and data warehouse technologies. 1 IDC Digital Universe Study, 2012. 2 Gartner “Big Data Adoption in 2013 Shows Substance Behind the Hype,” 2013. Data Warehouse Optimization Big Data Exploration Internal Big Data as a Service On-Demand Big Data Blending Big Data Predictive Analytics Next Generation Applications Emerging Adoption Entry AdvancedUSE CASE COMPLEXITY OptimizeBUSINESSIMPACTTransform Current Adoption Streamlined Data Refinery Monetize My Data Customer 360-Degree View
  • 4. Blueprints for Big Data Success PENTAHO 4 Harnessing Machine and Sensor Data – Until recently it has been cost-prohibitive to tap into analytics on high volume data from devices like sensors, routers, and set-top boxes. Today, however, big data has enabled the use case of harnessing this information for data mining and low latency analytics – ultimately empowering organizations to take quick action on operations and service issues. Big Data Predictive Analytics – Big data offers a new set of tools for optimizing machine-learning algorithms (for training and evaluation) and using them to predict or influence outcomes (scoring). Running predictive analytics in the big data store has applications in fraud detection, recommendation engines and offer optimization. Next Generation Applications – While cloud com- puting and SaaS are not new trends, their next phase will likely hinge on big data. Application providers are innovating around data and analytics architecture to make their products more powerful, intelligent, and valuable to customers. An embedded analytics interface inside the end-user application allows the vendor to fully capitalize on this innovation. On-Demand big data Blending – Once big data stores are implemented, teams are often still subject to the time constraints of existing data warehouse infrastructure. Time-sensitive needs may require bypassing the DW altogether – “Just in time” blending avoids the need to stage data, delivering accurate, timely data from all sources to analytics. Internal big data as a Service – Enterprises are tapping into big data as a shared database service, to be provisioned across a number of application development teams for data ingestion and access. The goal is to achieve economies of scale and cost savings relative to a more silo-based approach. ETL and analytics solutions are included as components of the centralized enterprise stack. Included below is a brief definition for each of these use cases: CURRENT ADOPTION Data Warehouse Optimization – The traditional data warehouse (DW) is strained by rising data vol- umes, meaning stakeholders can’t get the analytics they need on time. Expanding DW capacity can be costly, so organizations tap big data to offload less frequently used data and improve DW performance. Streamlined Data Refinery – Here the big data store becomes the landing and processing zone for data from many diverse sources, before it is pushed downstream for low-latency analytics (most likely to an analytical database for rapid queries). ETL and data management cost savings are scaled up, and big data becomes an essential part of the analytics process. Customer 360 Degree View – The 360 View blends a variety of operational and transactional data sources to create an on-demand analytical view across customer touch points. It also includes provid- ing customer-facing employees and partners with information made available inside everyday line- of-business applications. Monetize My Data – In this case, enriched and de-identified data sets are delivered as a service to 3rd party customers. It leverages powerful data processing and embedded analytics to generate a new revenue stream for the enterprise. EMERGING ADOPTION Big Data Exploration – Companies are dumping massive data into big data stores, but they aren’t always sure what information is in there (“dark data”) – or if it can be leveraged in a productive way. To “get their feet wet,” analysts will run basic data mining algorithms and work to correlate patterns they find with data from other sources.
  • 5. Blueprints for Big Data Success PENTAHO 5 Data Warehouse Optimization WHAT IS IT AND WHY ARE COMPANIES INVESTING IN IT? Data warehouse optimization is one of the most com- monly seen business use cases for big data, driven primarily by two pains – cost and operational per- formance. As the volume of data a company needs to store and access grows, existing data warehouse capacity becomes strained. This leads to deteriorat- ing query performance and access to data for IT and business users. In addition, it creates pressure to buy additional data warehouse storage capacity from incumbent vendors – a very pricey and possibly only temporary solution as data keeps expanding. As a result, enterprises have looked to big data, spe- cifically Hadoop, to reduce this pressure. Hadoop’s distributed computing model provides for powerful processing on commodity hardware, storing data in HDFS (Hadoop Distributed File System) can be an order of magnitude cheaper than traditional data warehouse storage. Specifically, Hadoop storage cost is approximately $1,000 per Terabyte (TB) vs. approximately $5,000 to $10,000 per TB or more for fully load data warehouse storage including required hardware, servers, etc.3 As such, IT organizations will transfer less frequently used data from their DW to Hadoop to save on data storage costs, while satisfy- ing SLAs and compliance requirements to deliver data on time. WHAT DOES IT LOOK LIKE? In this example, we have an enterprise that is leverag- ing data from CRM and ERP systems as well as other sources. A Hadoop cluster has been implemented to offload less frequently used data from the existing data warehouse, saving on storage costs and speed- ing query performance as analysts need to access information from the analytical data mart. KEY PROJECT CONSIDERATIONS While Data Warehouse Optimization is one of the most common big data use cases seen today, it still requires time, effort, and planning to execute. Hadoop is still an emerging technology, and using the ‘out of box’ tools accompanying Hadoop distributions requires Java coding expertise to create the routines that actually offload the DW data into Hadoop. Devel- opers and analysts with Hadoop expertise are often difficult for enterprises to hire in sufficient numbers, and can command compensation approximately 50% higher than staff with skills in SQL and other more traditional tools.4 Pentaho is valuable in providing an intuitive graphical user interface (GUI) for big data integration that eliminates manual coding and makes Hadoop acces- sible to all data developers. This accelerates time to value and reduces labor costs. Even if enterprises already have a data integration solution in place, legacy platforms don’t have complete no-coding solutions to integrate existing data sources and databases with Hadoop. PDI PDI CRM ERP Systems Other Data Sources Analytical Data Mart Ingest RelationalLayer Data Warehouse Hadoop Cluster PDI 3 Information Week, “How Hadoop Cuts Big Data Costs”, 2012. 4 O’Reilly, “2013 Data Science Salary Survey”, 2013.
  • 6. Blueprints for Big Data Success PENTAHO 6 Streamlined Data Refinery WHAT IS IT AND WHY ARE COMPANIES INVESTING IN IT? In the face of exploding volumes of structured transaction, customer, and other data, traditional ETL systems slow down, making analytics unwork- able. The “Data Refinery” solution streamlines most data sources through a scalable big data processing hub, using Hadoop for transformation. Refined data is pushed to an analytical database for low-latency self-service analytics across diverse data. This use case is often a logical extension of the cost savings and operational enhancements of DW Optimi- zation. At this point, a greater amount and variety of data is being loaded into Hadoop – making Hadoop more than an archive, but a source of valuable multi- source business information, just waiting to be queried. As such, this use case is more transformative than DW Optimization. The organization can establish usable analytics on diverse sources of data at high volume, thanks to faster queries, rapid ingestion, and powerful processing provided by the combination of Hadoop and an analytical database (such as Vertica or Greenplum). By the same token, teams can engi- neer data sets for predictive analytics more quickly. WHAT DOES IT LOOK LIKE? This example below shows a refinery architecture for an electronic marketing firm that delivers per- sonalized offers. Online campaign, enrollment, and transactional data is ingested via Hadoop, processed and then sent on to an analytical database. A busi- ness analytics front-end includes reporting and ad hoc analysis for business users. KEY PROJECT CONSIDERATIONS The staff and productivity challenges from DW Opti- mization still ring true in this case. Not surprisingly, return on investment can be enhanced with tools that eliminate coding and simplify the process of inte- grating big data stores to various relational systems. Otherwise, this use case is generally a more expan- sive and lengthier integration project, which may involve consolidating many point-to-point system connections into a centralized Hub model. The proj- ect becomes more complex to execute as the variety of data types and sources increases. This underlines the importance of selecting data integration and ana- lytics platforms with highly flexible connectivity to a wide variety of current and emerging data systems. Given the emerging importance of analytical insights from Hadoop in this use case, collaboration between data developers and business analyst becomes more important. An integrated platform is needed for data connectivity and business intelligence – its much more difficult to effectively coordinate when IT and business users are leveraging isolated toolsets. Finally, an analytical database is normally a key part of this architecture. These databases are optimized for business intelligence, usually through faster query performance, greater scalability, multi-dimensional analysis ‘cubes’, and/or in-memory functionality. By comparison, traditional transactional databases may not provide for the required level of query performance and analytics functionality. PDI PDI Analyzer Reports Location, Email, Other Data Hadoop Cluster Analytical Database Transactions – Batch Real-time Enrollments Redemptions
  • 7. Blueprints for Big Data Success PENTAHO 7 Customer 360-Degree View WHAT IS IT AND WHY ARE COMPANIES INVESTING IN IT? Companies have long sought to bring a variety of data sources together to create an on-demand analytical view across customer touch points. Lever- aging both big data and traditional data sources in a fully integrated environment organizations can accomplish this and achieve tremendous actionable customer insight. Whereas DW Optimization and Streamlined Data Refinery are primarily cost and efficiency driven use cases, the Customer 360 is clearly aimed at boosting customer lifetime value, especially in competitive consumer markets where churn is a key concern (such as telecommunications, hospitality, and consumer financial services). The two main levers for success are raising cross-sell/up-sell revenues and minimizing churn risk. This use case is enabled on the back-end by bringing virtually all customer touch point data into a single repository for fast queries (most likely NoSQL or Hadoop). It’s enabled on the front-end by bringing relevant metrics into a centralized location for business users. By blending together previously isolated data, the Customer 360 gives sales and services teams a more complete understanding of the buyer, while providing a better picture of how a brand’s products and services are perceived. Equip- ping employees with this insight at the point of their interaction with customers gives them the power to make more productive and profitable decisions on the fly. WHAT DOES IT LOOK LIKE? In the example above, a financial services company ingests data from various sources into a single big data store, in this case NoSQL. From there, the data is processed and summarized at the customer unique ID level in order to build the 360-degree view. Accurate and governed customer data is then routed to the appropriate analytics views for each role, including call center staff, research analysts, and data scientists. KEY PROJECT CONSIDERATIONS While this implementation can be transformative for businesses, it can also be highly complex and resource-intensive. On top of the big data labor resources challenges and point-to-point integra- tion challenges described in previous use cases, the Customer 360 requires significant strategic planning CRM System Online Interactions Claims Admin. Info Documents Images PDI PDINoSQL Call Center View Research Analysts Predictive Analytics The two main levers for success are raising cross-sell/up-sell revenues and minimizing churn risk.
  • 8. Blueprints for Big Data Success PENTAHO 8 from a business perspective. First, specific revenue- related goals should be tied to the project. Stake- holders must identify both the potential drivers of customer satisfaction and potential opportunities for customer-facing staff to take advantage of that data. At the same time, the relevant business end- users must be a part of the planning process, so that information gets delivered from the right sources to the right people in the right fashion. Analytics must be presented to users in a way they will be sure to adopt – this means making them easy to access and intuitive to understand, as well as embedding the analytics into crucial operational applications. From a technical perspective, a NoSQL solution such as MongoDB may be the big data store of choice if an enterprise is looking to route many time-sensitive streams of customer info into a single collection that can be distributed across servers quickly and easy. Hadoop is a better fit where data can be processed in batches and must be stored historically. Often bot
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks