w h i t e
p a p e r
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
September 2011
By Philip Carter
Sponsored by
w h i t e pa p e r
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Brave New World of Big Data The ‘Big Data Era’ has arrived — multi-petabyte data warehouses, social media interactions, real-time sensory data feeds, geospatial information and other new data sources are presenting organisations with a range of challenges, but also significant opportunities. IDC believes that as CIOs start to adopt the new class of technologies required to process, discover and analyse these massive data sets that cannot be dealt with using traditional databases and architectures, it will become clear that the real value will be derived from the high-end analytics that can be performed on the increasing volumes, velocity and variety of data that organisations are generating – or Big Data analytics. One of the key differences between analytics in the traditional mode, and what we are dealing with in terms of the Big Data era is that we are gathering data that we may or may not need – and from the perspective of analysis, this means ‘we don’t know what we don’t know’ – hence, the variables and models are likely to be entirely new, requiring a different infrastructure strategy and perhaps most importantly, new skill sets. The objective of this white paper is to explore the initial impact that Big Data is having on organisations, particularly the IT departments – which is being forced to re-assess architectures, delivery models and future roadmaps. It will explore the following areas in more detail:
Defining Big Data. This is not in the context of the quantity or threshold that actually quantifies Big Data (as this is changing all the time, and will be applied differently, depending on the vertical and market segment), but more in terms of a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture, discovery and/or analysis.
Hadoop, Mapreduce, Key Value Store? There is a lot of hype around the new technologies that are being used by the market to deal with the Big Data phenomenon. We will highlight some of these and their relative importance. The Value of Big Data… in Analytics. The bottom line here is that it is getting more complicated to process and analyse these 1
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
large and growing data sets – and it essentially requires a re-assessment of the broader information management strategies for the majority of organisations that have started their business analytics journey. Why Big Data Analytics is Important (and Different). Many have asked the question – what is new with this trend? This section will highlight the traditional use of business analytics in the old ‘pre-Big Data’ world, versus Big Data analytics in the ‘New World’. This will also look at the various use cases that IDC expects to see being most commonly used across a variety of industries. The Skill Factor – the Rise of the Data Scientist. With the raft of new technologies and organisational structures
that need to be put in place as the Big Data phenomenon becomes a reality, there will be increasing demand for ‘data scientists’ – the next-generation analytical professionals who are able to extract information from large data sets and then present value-added content of business value to non-data experts – who also have the unique skill of understanding the new models that need to be put in place. Mapping out the Big Data Analytics Journey. The Big Data analytics journey will be an iterative one – it is therefore important to map this out in the context of a broader framework. This section aims to do exactly that, and also provide some recommendations to CIOs as they embark on this exciting journey into the brave new world of Big Data analytics.
Situation Overview The Rise of Business Analytics Much has been written on how the amount of data in the world is exploding in volume. According to the recent IDC Digital Universe study, the amount of information created and replicated will surpass 1.9 zettabytes (1.8 trillion gigabytes) in 2011 – growing by a factor of 9 in just five years. Big data is a dynamic that seemed to appear from almost nowhere. But in reality, Big Data is not new – and it is moving into mainstream and getting a lot more attention. The growth of Big Data is being enabled by inexpensive storage, a proliferation of sensor and data capture technology, increasing connections to information via the cloud and virtualised storage infrastructure, as well as innovative software and analysis tools. It is no surprise then that business analytics as a
technology area is rising on the radars of CIOs and line-of-business (LOB) executives. To validate this, as part of a recent survey of 5,722 end users in the US market, business analytics ranked in the top five IT initiatives of organisations. The key drivers for business analytics adoption remained conservative or defensive. The focus on cost control, customer retention and optimising operations is likely a reflection of the continued economic uncertainty. However, 2
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
According to more than 1000 CIOs and LOB executives that were interviewed as part of the Asia/Pacific C-Suite Barometer in February 2011, business analytics was rated as the number one technology area that would enable their organisations to gain a competitive edge in the year ahead.
top drivers vary significantly by organisation size and industry. Similarly, IDC surveyed 693 European organisations in February 2011 where 51% of respondents said that BI and analytics are high-priority technologies. In emerging markets such as Asia/Pacific, the focus is very much on capturing the next wave of growth.
Figure 1: The Rise of Business Analytics Q: You (CIO/CTO) mentioned ‘harnessing ICT to gain competitive advantage’… which of the following technologies or solutions would be your leading choice to better harness ICT?
TOP 5 Business intelligence/ analytics Network Social media/ online channel Collaboration (including video, mobility,) Cloud computing/ services 0
5
10
15
20
25
30
35
%
Source: IDC, 2011
With more businesses in Asia investing in IT to ride the hyper growth wave in emerging markets, they are harnessing analytics-led solutions to gain better customer insights, manage risk and financial metrics more effectively, and at the same time, strive for unique market differentiation. Historically, organisations have made significant investments in applications with the objective of automating business processes and capturing data to improve operational efficiency. Many of these projects are still ongoing, but what is becoming increasingly clear to the senior management of these entities is that they (and their business managers) have not been able to get hold of the right information (mainly due to poorly integrated systems and
questionable data quality) at the right time (due to performance and scalability issues) to the right stakeholders within their organisations for the critical decision-making capabilities needed to drive the necessary business impact. And where they are unable to do this, the line of business is procuring and deploying their own solutions in a new wave of ‘shadow IT’ investments focusing on business analytics, thereby forcing CIOs to re-examine these issues with a specific focus on driving better IT-business alignment. These are taking place even without the ‘Big Data’ dynamic in the picture – which when added, creates the ‘perfect storm’ for Big Data analytics to take centre stage. 3
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
A Note on Terminology: BI or Analytics? We have some challenges when defining and using terminology for business analytics. Because the BI market is mature, many terms have been around for a long time and have either become obsolete or have been redefined over the years. For example, the term ‘BI’ itself is sometimes used in a narrow sense (only query, reporting, and analysis [QRA] technology) and at times, in a broad sense to refer to the whole of what IDC calls business analytics (including data warehousing and analytic applications in addition to front-end tools). The term ‘analytics’ is relatively new and its meaning is often unclear — does it refer to advanced analytics including predictive analytics, optimisation and forecasting, or analytic applications? In some submarkets, such as Web
analytics, the term ‘analytics’ simply means a dashboard on top of some data. For the purpose of this white paper, we interpret ‘BI’ to mean either QRA tools or BI across the board (in its narrow definition), or ‘business analytics’ (in its broad definition) in IDC terminology. We interpret ‘analytics’ to mean either advanced analytics (data mining, statistics, optimisation and forecasting) or analytic applications (FPSM, CRM and marketing analytics, supply chain analytics, etc.). Business Analytics is a combination of the above (and also includes data warehousing technologies) and this is highlighted by IDC’s Business Analytics Taxonomy for 2011 (see figure 2 below):
Figure 2: IDC Business Analytics Taxonomy Performance Management & Analytic Applications Financial Performance & Strategy Management
CRM Analytic Applications
Budgeting, Planning, Consolidation, Profitability, Strategy Management
Supply Chain Analytic Applications Procurement, logistics, inventory, manufacturing
Production Planning Analytic Applications Demand, supply, and production planning
Business Intelligence Tools
Sales, Customer Service, Contact Centre, Marketing, Web Site Analytics, Price Optimisation
Query, Reporting, and Analysis Tools Dashboards, production reporting, OLAP, ad-hoc query
Services Operations Analytic Applications
Advanced Analytics Tools
Financial services, education, government, healthcare, communications services, etc.
Workforce Analytic Applications
Data mining and statistics
Content Analysis Tools
Spatial Information Analytics Tools
Data Warehouse Management Platform Data Warehouse Management Data Warehouse Generation Data extraction, transformation, loading; data quality Source: IDC, 2011
4
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Defining ‘Big Data’ Big Data is not so much about the content that is created, nor is it even about consumption. It is more about the analysis of the data and how that needs to be done. It is not really a ‘thing’, but instead a dynamic/activity that crosses many IT borders. IDC defines Big Data in this way:
“Big Data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.”
Figure 3: Defining ‘Big Data’ Unstructured Data (Video, rich media etc)
Data Volumes
Semi-Structured (e.g. Weblogs, social media feeds)
Data = Big, Complex, High Velocity & Wide Variety
Time
The Volume. One is embodied more in the structured data realm. Some of this is held in transactional data stores and is linked to the ever-present electronic trail that individuals and businesses create in the wake of rapidly increasing online activity. Sensory data (machine-to-machine) contribute to this area too. The other is in existing data warehouses or data marts, which have over time grown to petabyte scale. The Variety. The other aspect of this Big Data phenomenon is the need to analyse semi-structured and unstructured data. Text, video and other forms of media will require a completely different architecture and technologies to perform for the required analysis. For example, if you look at the social media phenomenon, many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube. This dynamic becomes more
Source: IDC, 2011
complex in Asia with local social media sites like RenRen in China and Nate in Korea. The Velocity. There will also be demand to analyse this data on a more regular basis – for example, taking into account all transactions rather than a sample to obtain a more complete view of risk on a trade in real time.
In summary, Big Data refers to data sets whose volume, variety, velocity and complexity make it impossible for current databases and architectures to store and manage. IDC intentionally does not define Big Data as larger than a certain threshold (i.e. terabytes), mainly since this threshold would be a moving target depending on the sector, as well as the fact that it will obviously grow over time. More important is the value that organisations can derive from this phenomenon – and the resulting need to rethink their information strategies to extract the value.
5
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Other Definitions: Hadoop, Mapreduce, Key Value Store With the focus on Big Data going mainstream, a range of new technologies have hit the market. The table below gives an overview of these technologies, with associated context (note that the list is not exhaustive).
Table 1: Big Data Technologies (Terminology) Technology
Context
Big Table
Proprietary distributed database system built on the Google File System. Inspiration for HBase.
Cassandra
An open source (free) database management system designed to handle huge amounts of data on a distributed system. This system was originally developed at Facebook and is now managed as a project of the Apache Software foundation.
Data Warehouse & Analytical Appliance
Consists of an integrated set of servers, storage, operating system(s), database, business intelligence, data mining and other software specifically pre-installed and pre-optimised for data warehousing.
Distributed System
Multiple computers, communicating through a network, used to solve a common computational problem. The problem is divided into multiple tasks, each of which is solved by one or more computers working in parallel. Improved price:performance ratio, higher reliability and more scalability.
Google File System
Proprietary distributed files system developed by Google: part of the inspiration for Hadoop.
Hadoop
An open source (free) software framework for processing huge data sets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System. It was originally developed at Yahoo! and now managed as a project of the Apache Software Foundation.
HBase
An open source (free) distributed, non-relational database modeled on Google’s Big Table. It was originally developed by Powerset and is now managed as a project by the Apache Software Foundation as part of Hadoop.
MapReduce
A software framework introduced by Google for processing huge data sets on certain kinds of problems on a distributed system. Also implemented in Hadoop.
Non-relational database/ Key Value Store
A non-relational database is one that does not store data in tables (rows and columns) – in contrast to a relational database. Key Value Stores allow for the management of schema-less (noSQL) entities.
Although some of these terms will be used throughout this white paper, the focus is not to examine them in too much detail – because as one IT executive recently mentioned – ‘to know the technology is one thing, but to apply it in the right environment is something entirely different’. The new technology needs to be tied back to business requirements as much as possible – not just examining the technology for the sake of
it. Having said that, most IT executives are not aware of the technologies and trends developing in this area – and where they are aware of it, their strategy is to put a couple of people in their enterprise architecture team to experiment with the new technologies (i.e. in memory, Hadoop, MapReduce, Key Value Stores etc) that are being used to deal with the ‘Big Data’ phenomenon.
6
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Big Data Analytics: The Old World vs. The New Era Many have asked the question – what is new with this trend? This section highlights the traditional use of business analytics in the old ‘pre-Big Data’ world, versus Big Data analytics in the ‘Brave New World’. This will also look at the various use cases that IDC expects to see being used most commonly across a variety of industries. The majority of IT organisations have progressed in terms of their infrastructure architectures over time; from predominantly mainframe-based environments in the 1980s to a focus on clientserver in the 1990s and the Web at the turn of the century, to what is now popularly known as ‘private cloud’. This supposed state of ‘nirvana’ constitutes a consolidated, virtualised set of infrastructure resources (server, storage and network) that can be self-provisioned in an automated fashion by
business users – complete with SLAs that have the security, performance, availability and cost profiles transparent to all in the form of a service catalog. Very few organisations, if any, have achieved this state of infrastructure ‘nirvana’, and are still battling with a spaghetti-like tangle of compute resources in their datacenter. And now, we have this external force of Big Data as mentioned earlier that is forcing CIOs to rearchitect their infrastructure – particularly in the context of how analytics capabilities are deployed in an enterprise-wide fashion. Below is an overview of the changes that IDC sees happening in the infrastructure world that is increasingly impacting the Big Data analytics world:
Table 2: Old World vs. New Era (Big Data Infrastructure) Old World
New Era
Tenancy
Infrastructure Silos
Pooled resources
Architecture
Performance ‘tuned’
Linear scalability (linked to distributed parallel processing and ‘in memory’ storage)
Delivery Model
On Premise
Hybrid (with cloud bursting capabilities) and widespread use of the appliance
7
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Based on IDC’s research in this space, here are three suggestions for CIOs in dealing with these issues: Cloud Bursting. The private cloud journey will line up well with the enterprisewide analytical requirements highlighted earlier, but CIOs need to ensure that workload assessments are conducted rigorously and that risk is mitigated where possible. Critical to this approach will be the evaluation of cloud bursting capabilities from external vendors (i.e. Infrastructure as a service), particularly as organisations start to leverage more real-time analytics environments, to ensure that the use of infrastructure resources maps closely to demand – and that there are no issues in terms of performance and availability. Analytical Appliance. In terms of delivery models, IDC has seen significant performance benefits from analytical appliances for customers that are dealing with the impact of Big Data. In addition, since the software is optimised and pre-integrated with appliances, the deployment timeframes are typically shorter. As part of a recent global survey of CIOs, 10% of the respondents indicated that they will be looking at analytical appliances as a delivery model in 2011. IDC also believes that the demand for reference architectures will rise as CIOs look to integrate these appliances within existing data warehousing environments. In line with this increased adoption of the analytical appliance as a delivery model, IDC believes that IT departments will allocate less budget towards technical skills (i.e. installation, configuration and management), and more on
the high-end analytical skills needed to help drive the necessary business impact across multiple functions. Enterprise Architecture. Enterprise analytics needs an enterprise architecture that scales effectively with growth – and the rise of Big Data analytics means that this issue needs to be addressed more urgently. Organisations need to look at creating a ‘high performance analytical environment’ that leverages in-database analytics, parallel processing as well as in-memory storage to deal with the increased volume, velocity and variety of data. Particularly, in terms of dealing with unstructured data, more attention needs to be paid to Hadoop – an open source software framework set up by Apache that allows for the distributed processing of large data sets across clusters of computers. However, there will be an ongoing tension between global standards and local requirements – and the use of Hadoop would be a good example of this. Another would be the ability to process mixed workloads (e.g. analytical and operational) in the same infrastructure environment such as the appliance that was mentioned earlier. CIOs need to consider ways in which they can deliver value in terms of solving specific business problems, while at the same time, being cognizant of global architecture standards and specifications. While certain global governance models will not allow for the usage of some of these technologies in a production environment, business expectations will force IT departments to re-assess the way the enterprise architecture agenda is utilised at a local level.
8
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
analytics journey. But the impact is potentially enormous. If you look at optimising the price on every item in a global retail chain or detecting fraud in real time – you get a sense of the type of problems that Big Data analytics can be used to solve.
The bottom line here is that it is getting more complicated to process and analyse these large, complex and growing data sets – and it essentially requires a re-assessment of the broader information management strategy for the majority of organisations that have started their business
Table 3: Old World vs. New Era (Big Data Analytics) Old World
New Era
Data Sets
Predefined
All-encompassing and iterative
Data Velocity
Batch
Proactive and dynamic (real-time where appropriate)
Data Analysis
Predominantly Historic
Predictive, Forecasting & Optimisation
cases can be best mapped out across two of the Big Data dimensions – namely velocity and variety as outlined below:
However, despite the clear potential of such analytics – it is important to understand that it will not necessarily be relevant or applicable to every use case. IDC believes that these use
Figure 4: Potential Use Cases for Big Data Analytics Real time
Credit & Market Risk in Banks Fraud Detection (Credit Card) & Financial Crimes (AML) in Banks (including Social Network Analysis)
Event-based Marketing in Financial Services and Telecoms Markdown Optimization in Retail Claims and Tax Fraud in Public Sector
Data Velocity
Predictive Maintenance in Aerospace Demand Forecasting in Manufacturing
Traditional Data Warehousing
Social Media Sentiment Analysis Disease Analysis on Electronic Health Records
Text Mining
Video Surveillance/ Analysis
Batch Structured
Semi-structured
Unstructured
Data Variety 9
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
A better sense of the potential impact of deploying Big Data analytics to drive high value impact can be derived by exploring these use cases in more detail: Real-time Fraud Detection in Banks. Involves the ability to detect, prevent and manage fraud across multiple products, lines of business and channels for a bank. This requires the ability to capture the history for different types of entities (e.g. card, account, customer, terminal ID or IP address) involved in transactions, amplifying accuracy in detecting customer behaviours that fall outside the norm during point-of-sale (POS) transactions. This information can be used by multiple predictive models, for fraud detection and credit risk assessment. Markdown Optimisation in Retail. The ability for retailers to optimise prices for a
wide range of products in real time based on demand forecasting scenarios (that include the impact of promotions, seasonality and important calendar events) has a major impact on margins. These capabilities can also be augmented by social media sentiment analysis to ascertain customer demand for certain products on a more real-time basis. Disease Analysis on Electronic Health Records. As healthcare services evolve, analysts can get hold of a patient’s entire medical history in electronic format. This will present a major opportunity for Big Data analytics. For example, in the case of a disease such as diabetes, the ability to correlate patient medical history with dietary data (potentially from market basket analysis in retail) and optimised exercise schedules will provide medical practitioners with new insights that they had only previously dreamt of.
The Skill Factor As highlighted earlier, IDC believes that the real value from Big Data will be derived from the high-end analytics that can be performed on the increasing volumes, velocity and variety of data that organisations are generating. In Asia (outside some of the MNCs because this is mainly being driven out of the US and Europe), most organisations are not aware of the type and level of skills that are required. IDC also believes that this is linked to the general lack of awareness and skill available historically in the high-end analytics arena (regardless of the Big Data phenomenon). High-end analytics will require new sets of skills in two key categories: Technical skills. For the new class of technologies required to process, discover and analyse these massive data sets that cannot be dealt with using traditional databases and architectures (i.e. in memory, Hadoop, MapReduce, Key Value Stores etc). Some of these technologies will be delivered as an appliance – and skills to better understand how
the software interacts with the hardware to leverage the data will be required. The new type of business analyst/ statistician. One of the key differences between analytics in the ‘Old World’ and what we are dealing in terms of the Big Data era is that we are gathering data that we may or may not need – and from the perspective of analysis, this means ‘we don’t know 10
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
what we don’t know’ – i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way the analytical power users approach their work by creating a ‘Sandbox Mentality’ where discovery is always the starting point. Generally, a background in data mining and statistics would be a good starting point for this type of analysis. Moving forward, there will be increasing demand for ‘data scientists’ – the next-generation business analyst with strong statistical skills who are able to extract information from large data sets and then present value to non-analytical experts – but with the unique skill of understanding the new algorithms and analytical models that will have the most significant business impact in the short term. Globally, IDC is seeing a lot of interest in this more analytically inclined skill set. Roles and responsibilities have not been defined – but it basically fits in with the earlier comments in terms of ‘we don’t know what we don’t know’ – i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. It requires a very ‘out-of-the-box’ type and creativity in terms of
the analytics that needs to be done on these new data types and structures. For example, if you look at the social media phenomenon (contributing to the semi-structured and unstructured data part of Big Data), many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube (massive amounts as you can expect). This dynamic becomes more complex in Asia with local social media sites like RenRen in China and Nate in Korea. Currently, IT is not the first port of call for the chief marketing officer since it lacks the skills to understand what needs to be done (and in many cases, is still trying to work out what role it should play in the policy or governance of the use of social media). So the make-up of the IT department needs to be re-assessed in terms of technical, business and relationship skills. The maturity model below highlights how IDC sees these skills (both technical and business) mapping out in the context of the organisations that have adopted business analytics over time – with a view to how this could evolve in the era of Big Data analytics:
11
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Figure 5: The Big Data Analytics Maturity Model
Old World
Phase
New Era
Impact
Pilot
Departmental Analytics
Enterprise Analytics
Big Data Analytics
Staff Skills (IT)
Little or no expertise in analytics – basic knowledge of BI tools
Data warehouse team focused on performance, availability and security
Advanced data modelers and stewards key part of the IT department
Business Analytics Competency Centre (BACC) that includes ‘data scientists’
Staff Skills (Business/IT)
Functional knowledge of BI tools
Few business analysts – limited usage of advanced analytics
Savvy analytical modelers and statisticians utilised
Complex problem solving integrated into Business Analytics Competency Centre (BACC)
Technology & Tools
Simple historical BI reporting and dashboards
Data warehouse implemented, broad usage of BI tools, limited analytical data marts
In database mining, and limited usage of parallel processing and analytical appliance
Widespread adoption of appliance for multiple workloads. Architecture and governance for emerging technologies
Financial Impact
No substantial financial impact. No ROI models in place
Certain revenue generating KPIs in place with ROI clearly understood
Significant revenue impact (measured and monitored on a regular basis)
Business strategy and competitive differentiation is based on analytics
Data Governance
Little or none (Skunk works)
Initial data warehouse model and architecture
Data definitions and models standardised
Clear master data management strategy
Line of Business (LOB)
Frustrated
Visible
Aligned (including LOB executives)
Cross-departmental (with CEO visibility)
CIO Engagement
Hidden
Limited
Involved
Transformative
% of Customers (IDC Estimates)
20%
65%
In terms of capturing and developing the right skills in the era of Big Data analytics, the creation of a Business Analytics Competency Centre that sits across the business and IT departments will be critical. IDC believes that this type of structure not only clarifies the roles and responsibilities of key stakeholders for this transformation, it also drives internal visibility, provides a mechanism for education as well as bridging the IT/business gap (and the marketing and sales teams in particular – as key individuals from these departments will need to be represented) since improving decision making amongst front-office staff will be the primary focus of these projects. In conjunction with the skills dimension, IDC believes that this structure should be involved in the following areas: Technology identification/deployment Business case creation and ROI justification Data governance frameworks with clear
10%
5%
policies and guidelines around master data management, data quality and data models Ensure IT/Business alignment by involving the critical stakeholders at the right time Involve the CIO as the supporter of the necessary transformation from an IT perspective that will in turn create the necessary business impact Very few organisations have reached the level of maturity that can truly harness the potential that Big Data analytics represents – and practically speaking, it is a major challenge to have ticked off all the relevant boxes, but this transformation is a necessary one in order for organisations to truly differentiate themselves in the current economic environment. The CIO (and the IT department) needs to play a critical role in this transformation. The next section highlights some suggestions that IDC believes should be taken into account in the context of this journey. 12
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
The CIO ‘Big Data Analytics’ Checklist Architect for the Future. Historically, a lot of work in analytics has been focused on ‘workarounds’ due to the limited scalability of the underlying hardware. As a result, many IT departments would create materialised views or pre-calculated data structures so that business users could work off these without impacting the performance of the systems that were processing the underlying data. Clustering, parallel processing and in-memory technologies mean that all that underlying data can now be used in the analytical environment. However, it is important not to fall into the same trap of blindly adding capacity based on availability. There is a need to assess multiple delivery models (i.e. cloud – particularly for bursting capabilities, analytical appliances as well as the traditional client/server or 3-tiered Web architecture approach) on a case by case basis, as one size will definitely not fit all. Create a ‘Sandbox Mentality’. One of the key differences between analytics in the traditional old-school batch mode and what we are dealing with in terms of the Big Data era is that we are gathering data that we may or may not need – and from an analysis perspective, this means ‘we don’t know what we don’t know’ – i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way that analytical power users go about developing their models by creating more of a ‘Sandbox Mentality’ where a discovery process is always the starting point, particularly in terms of drawing linkages between unstructured, semi-structured and structured data. As part of this, new types of skills will need to be brought on board to understand social media nuance (i.e. more likely to be from Gen Y, Z or even the Millennials). Not Too Much ‘Tinkering’. Whenever a new set of cool technologies hits the market, there is a tendency for IT departments
to ‘tinker’ – which impacts the immediate business benefits. So while a certain amount of experimentation is a good thing (as outlined in the context of the ‘Sandbox Mentality’ highlighted earlier – Hadoop and Mapreduce definitely fit into this category), CIOs need to be careful that not too much time is wasted on experimentation versus delivering business value. Get the Team Right. The first step in this process involves the CIO assessing his/ her own IT department to examine relevant skill levels and organisational structures. In some cases, it will necessitate an internal transformation to get the business to take notice of the change. It then requires that the right people are empowered to execute the IT analytics strategy with the relevant processes and governance structures in place to enable them to effectively deliver the business expectations. Part of this will require a much deeper understanding of the capabilities of the underlying analytics technology for the CIO, but it will also involve working with LOB executives to hire the right type of analytically minded managers and knowledge workers who can leverage the underlying technological capabilities at the most optimal levels. Take Analytics to the Enterprise. The majority of IT projects in this space have been focused on building a data warehouse combined with a variety of BI tools to surface the underlying information to the end users. However, in terms of sophisticated analytics functionality, the lack of IT skills meant that these projects have been largely departmental and tactical in nature, leading to a ‘silo-ed’ mentality. As a result, to assess something such as risk-adjusted profitability (combining financial, credit scoring and customer data) would be impossible. This needs to change; and it requires a different level of IT/business collaboration to do so, with the CIO personally focused on an enterprise-wide approach in deploying analytics to ensure that these projects are successful. 13
Big Data Analytics:
Future Architectures, Skills and Roadmaps for the CIO
Governance and Enablement. This is where existing investments made in data warehousing technologies, if done correctly, will pay dividends. The data models and reference architecture that IT has in place will ensure that data definitions and standards are consistent across the various business departments. Further work needs to be done in the master data management (MDM) space in terms of bridging the operational and analytical gap around data governance – but fundamentally, this platform should provide the necessary management and control that IT requires. When it comes to business enablement, IDC sees a new class of projects emerging that combines
business analytics with business process management capabilities – more specifically, decision management software components that include tools for rule management, data mining, query and reporting, complex event processing (CEP), collaboration, BPM suites, search, and content analysis. IDC believes that IT departments that can complement previous investments in data warehousing and business intelligence technologies with a better understanding of the decisionmaking process in each of their organisations and the underlying decision management software will be best placed to manage the IT governance versus business enablement dilemma.
Conclusion Despite the varying levels of maturity and adoption of business analytics, businesses are definitely gearing up for the utilisation of more advanced solutions and offerings in this space. In line with this, organisations need to plan strategically and build a robust roadmap before adopting business analytics. The new generation of business managers is more aware of the benefits of competing on business analytics and will be looking to drive adoption of this technology area more aggressively. Moving forward, IDC believes that a new approach is required to proactively ‘effect’ the necessary change, with a specific focus on the following areas: Elevating the status of the CIO to that of one with more transformative impact on the organisation by playing an integral role in the deployment of the enterprise analytics strategy – and ensuring that these technologies have the expected business impact An assessment of alternative delivery models (such as the appliance, in memory and Hadoop for Big Data) Capturing higher-level LOB attention and visibility as the next wave of business analytics projects are integrated with complex event processing (CEP) and business activity monitoring (BAM) technologies to drive a new class of projects that IDC defines as ‘decision management’ The role of the CIO is gradually becoming much more important in the boardroom and is playing a key role in the purchase behaviour of advanced applications such as business analytics. Moreover, the CIO and the IT department need to leverage a broader set of business analytics capabilities to create a new information management strategy that deals with the emerging Big Data dynamic as well as delivering improved decision-making capabilities to the business stakeholders across the organisation. 14
#AP14962U
ABOUT THIS PUBLICATION This publication was produced by IDC Go-to-Market Services. IDC Go-to-Market Services makes IDC content available in a wide range of formats for distribution by various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee. COPYRIGHT AND RESTRICTIONS Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. For permission requests, contact the GMS information line at 65-6829-7757 or
[email protected]. Translation and/or localization of this document requires an additional license from IDC. For more information on IDC, visit www.idc.com. For more information on IDC GMS, visit www.idc.com/gms. IDC Asia/Pacific, 80 Anson Road, #38-00 Fuji Xerox Towers, Singapore 079970. P. 65.6226.0330 F. 65.6220.6116 www.idc.com. Copyright 2011 IDC. Reproduction is forbidden unless authorized. All rights reserved.