Data Science and Its Relationship To Bid Data and Data Driven Decision Making

October 14, 2022 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download Data Science and Its Relationship To Bid Data and Data Driven Decision Making...

Description

 

ORIGINAL ARTIC ORIGINAL ARTICLE LE

DATA SCIENCE  AND ITS RELATIONSHIP  TO BIG DATA AND DATA-DRIVEN DECISION MAKING

 .   y    l   n   o   e   s   u    l   a 1 2    n Foster Provost  and Tom Fawcett    o   s   r   e   p   r   o    F  .    7    1    /    0    2    / Abstract    9    0    t   a   Companies have realized they need to hire data scientists, academic institutions are scrambling to put together    m   o   c  . data-science programs, and publications are touting data science as a hot—even ‘‘sexy’’—career choice. However,    b   uthere is confusion about what exactly data science is, and this confusion could lead to disillusionment as the    p    t   r   econcept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to     b   e    i    l  . pin down exactly what is data science. One reason is that data science is intricately intertwined with other    e   nimportant concepts also of growing importance, such as big data and data-driven decision making. Another reason     i    l   n   o natural tendency to associat associatee what a practition practitioner er does with the definition definition of the practitioner practitioner’s ’s field; this can    is the natural   m result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science    o   r    f  precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in     4    8  .    4  . order for data science to serve business effectively, it is important (i) to understand its relationships to other     2    8  . important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once     4    7we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore,   y    b    d  science. In this article, we present a perspective    eonly once we embrace (ii) should we be comfortable calling it data  science.    dtha addresses ses all these these concep concepts. ts. We close close by off offeri ering, ng, as exampl examples, es, a parti partial al list list of fundam fundament ental al pri princi nciple ples  s    athatt addres   o    l underlying data science.   n   w   o    D

analyses than previously possible. The convergence of these phenom phe nomena ena has given given rise rise to the inc increa reasin singly gly widesp widesprea read d business busin ess appli application cation of data science. science.

Introduction With vast amounts amounts of data data  now available, companies in almost alm ost every every indust industry ry are foc focuse used d on exp exploi loitin tingg dat dataa for competitive advantage. The volume and variety of data have far outstripped the capacity of manual analysis, and in some cases have exceeded the capacity of conventional databases. At the same time, computers have become far more powerful, networking is ubiquitous, and algorithms have been devel-

Companies across industries have realized that they need to hire more data scientists. Academic institutions are scrambling to put together programs to train data scientists. Publications are touting data science as a hot career choice and even ‘‘sexy.’’1 However, there is confusion about what exactly  is da data ta scie scienc nce, e, an and d th this is co conf nfus usio ion n co coul uld d we well ll le lead ad to

oped that can connect datasets to enable broader and deeper 1 2

Leonard N. Stern School of Business, New York University, New York, New York. Data Scientists, LLC, New York, New York and Mountain View, California.

DOI: 10.1089/big.2013.1508

  

 MARY ANN LIEBERT, INC.

  

  VOL VOL. 1 NO. NO. 1

  

  MARCH MARCH 2013 2013 BIG DA DATA  TA 

BD51

 

DATA SCIENCE AND BIG DATA  Provost Prov ost and Fawce Fawcett tt

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down what exactly is data science. One reason is that data science is intricately intertwined with other important concepts, like big data and data-driven decision making, which are also growing in importance and attention. Anothe Ano therr rea reason son is the natural natural tenden tendency, cy, in the absen absence ce of  academic programs to teach one otherwise, to associate what

rel relatio ationship nship manageme management nt to analyze analyze customer customer beha behavior vior in order to manage attrition and maximize expected customer value. The finance industry uses data science for credit scoring and trading and in operations via fraud detection and workforce management. Major retailers from Wal-Mart to Amazon applyy data science appl science throughout throughout their business businesses, es, from marketing to supply-chain management. Many firms have differentiated themselves strategically with data science, sometimes

a practitioner actually does with the definition of the practitioner’s field; this can result in overlooking the fundamentals of the field.

to the point of evolving into data-mining companies.

sta stand nd and ex expl plain ain ex exac actl tlyy what what CHOICE data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data  science .

sense, and knowledge of a particular application must be brought to bear. A data-science perspective provides practitioners with structure and principles, which give give the data data sci scient entist ist a fra framew mework ork to system systemati atical cally ly treat treat problems of extracting useful knowledge from data.

But data science involves much more than just data-mining algorithms algor ithms.. Successful Successful data scientists scientists must be able to view  At the moment, trying to define the boundaries of data scibusiness problems from a data perspective. There is a funence preci precisely sely is not of fore foremost most importance importance.. Data Data-sc -scienc iencee dament dam ental al str struct ucture ure to data-an data-analy alytic tic thi thinkin nking, g, and basic basic academic programs are being developed, and in an academic pri princi nciple pless that that should should be unders understood tood.. Data sci scienc encee draws draws setting we can debate its boundaries. However, in order for from many ‘‘traditional’’ fields of study. Fundamental prindata science to serve business effectively, it is important (i) to ciples of causal analysis must be understood. A large portion unders und erstand tand its rel relatio ationshi nships ps to these these other other importan importantt and of what what has tra traditi ditiona onally lly been stu studie died d wit within hin the field field of  closely related concepts, and (ii) to statis statistic ticss is fun fundam dament ental al to data data begin to understand what are the science. scien ce. Methods Methods and methology  methology  fundamental principles underlying for visualizing data are vital. There ‘‘PUBLICATIONS ARE TOUTING dataa scienc dat science. e. Once Once we em embr brac acee are are al also so pa part rtic icul ular ar area areass wh wher eree (i (ii) i),, we ca can n much much be bett tter er un unde derr- DATA SCIENCE AS A HOT CAREER in intu tuit itio ion, n, crea creativ tivit ity, y, comm common on

AND EVEN ‘SEXY.’’’

In this article, we present a perspective that addresses all these concepts. We first work to disentangle this set of closely interrelated concepts. In the process, we highlight data science as the connective tissue between data-processing technologies (including (inc luding those for ‘‘big ‘‘big data’’) data’’) and data-driven data-driven decision making. We discuss the complicated issue of data science as a field versus data science as a profession. Finally, we offer as examples a list of some fundamental principles underlying data science.

Data Science At a high level, data level,  data science  is  is a set of fundamental principles that support and guide the principled extraction of information and knowledge from data. Possibly the most closely  related concept to data science is   data mining —the —the actual extraction of knowledge from data via technologies that incorporate these principles. There are hundreds of different data-m dat a-mini ining ng alg algori orithm thms, s, and a great great dea deall of det detail ail to the methods of the field. We argue that underlying all these many  details is a much smaller and more concise set of fundamental principles. These principles and techniques are applied broadly across functional areas in business. Probably the broadest business applic app licatio ations ns are in market marketing ing for tas tasks ks such such as tar target geted ed market mar keting ing,, online online advert advertisi ising, ng, and recomm recommend endatio ations ns for cross-selling. Data science also is applied for general customer

52BD

 

Data Science in Action For concreteness, let’s look at two brief case studies of analyzing data to extract predictive patterns. These studies illustrate lustr ate different sorts of applic applications ations of data science. The first was reported in the New the  New York Times : Hurricane Frances was on its way, barreling across the Caribbean, threatening a direct hit on Florida’s Atlantic Atla ntic coast. Residents Residents made for higher ground, but far away, in Bentonville, Ark., executives at WalMart Stores decided that the situation offered a great opport opp ortuni unity ty for one of the their ir newest newest data-d data-driv riven en weapons.predictive technology. A we week ek ah ahea ead d of th thee stor storm’ m’ss la land ndfa fall ll,, Li Lind ndaa M. Dil Dillma lman, n, Wal Wal-Mar -Mart’s t’s chi chief ef inf inform ormati ation on offi officer cer,, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck  several weeks earlier. Backed by the trillions of bytes’ wo wort rth h of shopp shopper er hi hist stor oryy th that at is stor stored ed in WalWalMart’s Mar t’s data data wareho warehouse use,, she fel feltt that that the company  company  could ‘‘start predicting what’s going to happen, instead of waiting for it to happen,’’ as she put it. 2 Consider   why  data-driven Consider   data-driven prediction might be useful in this scenario. It might be useful to predict that people in the path BIG DATA DATA

MARC MARCH H 2013 2013

 

ORIGINAL ARTICLE

Provost Provo st and Fawcett Fawcett

of the hurricane would buy more bottled water. Maybe, but it seems a bit obvious, and why do we need data science to discover this? It might be useful to project the   amount of    increase   in sales due to the hurricane, to ensure that local Wal-Ma Wal -Marts rts are proper properly ly sto stocke cked. d. Per Perhaps haps mining mining the data data could reveal that a particular DVD sold out in the hurricane’s path—but maybe it sold out that week at Wal-Marts across the country, not just where the hurricane landing was im-

has already designed a special retention offer. Your task is to devise a precise, step-by-step plan for how the data science team should use MegaTelCo’s vast data resources to decide which customers customers shoul should d be offered the special special retention retention deal prior to the expiration of their contracts. Specifically, how  should MegaTelCo decide on the set of customers to target to bestt reduce bes reduce churn churn for a partic particula ularr inc incent entive ive budget budget?? Answerin swe ringg thi thiss questio question n is much much more more compli complicat cated ed than than it

mine minent nt.. The The pr pred edic icti tion on coul could d be so some mewh what at us usef eful ul,, but but probably more general than Ms. Dillman was intending.

seems initially.

It would be more valuable to discover patterns due to the

Data Science and Data-Driven

 .   yhurricane that were not obvious. To do this, analysts might Decision Making    l   n examin minee the hug hugee vol volume ume of Wal-Ma Wal-Mart rt data data fro from m pri prior, or,   oexa   e   ssimilar situations (such as Hurricane Charley earlier in the Data scien science ce involves involves principles, principles, processes, processes, and techniques techniques   u    l same season) to identify unusual local demand for products. for understanding phenomena via the (automated) analysis   a   n of data. For the perspective of this article, the ultimate goal of  From such patterns, the company might be able to anticipate   o   s   r data science is improving deciunusuall demand demand for products and   eunusua   p   rrush stock to the stores ahead of  sion making, as this generally is   o    Fthe hurricane’s landfall. of paramo paramount unt int intere erest st to busibusi . ‘‘FROM SUCH PATTERNS, THE    7 ness. Figure 1 places data science    1    /    0Inde COMPANY MIGHT BE ABLE TO in the context context of other other closel closely  y  Indeed ed,, th that at is wh what at ha happ ppen ened ed..    2    /    9The   New relate related d and data-r data-rela elated ted proproNew Yo York rk Ti Time mes  s    reported  ANTICIPATE UNUSUAL DEMAND    0

that: t: ‘‘. th thee expe expert rtss mi mine ned d the the FOR PRODUCTS AND RUSH STOCK  cesses in the organization. Let’s    t   a   tha   m start at the top. da data ta an and d foun fo und d th that at the th e stor st ores es   o TO THE STORES AHEAD OF THE   c  . would indeed need certain prod   b   uuc HURRICANE’S LANDFALL.’’ Data-d Dat a-driv riven en decisi decision on making making ts—a —and nd not not just just th thee usua usuall   pucts    t   r 3 (DDD) refers to the practice of    eflashlights. ‘We didn’t know in the    b   e pa past st that th at stra st rawb wber erry ry PopPo p-Ta Tart rts s basing decisions on the analysis    i    l  .   eincrea se in sales, sales, lik likee seven seven tim times es their their normal normal sales sales rat rate, e, of da data ta rath rather er th than an pu pure rely ly on in intu tuit itio ion. n. For For ex exam ampl ple, e, a   nincrease    i    l ahead of a hurricane,’ Ms. Dillman said in a recent interview.’ marketer marke ter could select select advertisem advertisements ents based purely purely on her   n   o   And the pre-hurricane top-selling item was beer. *’’’2 long experience in the field and her eye for what will work.   m   o Or, she could base her selection on the analysis of data re  r    f Consider a second, more typical business scenario and how it garding how consumers react to different ads. She could also    4    8  . might be treated from a data perspective. Assume you just use a combination of these approaches. DDD is not an all-or   4  .    2lan landed ded a great great anal analyti ytical cal job wit with h MegaTe MegaTelCo lCo,, one of the nothin not hingg practi practice, ce, and dif differ ferent ent firms firms eng engage age in DDD to    8  .    4largest telecommunication firms in the United States. They  greater great er or lesse lesserr degre degrees. es.    7   yare having a major problem proble m with customer custome r retention reten tion in their    b    d The benefits of data-driven decision making have been dem  ewireless business. In the mid-Atlantic region, 20% of cell   dphone customers leave when their contracts expire, and it is   a onstrated onstrat ed conclusivel conclusively. y. Economist Economist Erik Brynjolfsson Brynjolfsson and his   o    l getting increasingly difficult to acquire new customers. Since col collea leagues gues from fro m MIT and Penn’s Pen n’s Wha Wharto rton n Sch School ool rec recentl ently  y    n   w conducte cond ucted d a study study of how DDD affects affects firm performan performance. ce.3   othe cell-phone market is now saturated, the huge growth in    D the wireless market has tapered off. Communications comThey developed a measure of DDD that rates firms as to how  panies are now engaged panies engaged in battle battless to attrac attractt eac each h other’ other’ss customers custo mers while while retaining retaining their own. Customers switching from one company to another is called  churn , and it is expensive all around: one company must spend on incentives to att attrac ractt a custom customer er whi while le ano anothe therr compan companyy loses loses revenu revenuee when the customer departs. You have been called in to help understand the problem and to devise a solution. Attracting new customers is much more exp expens ensive ive than retain retis aining ing existing existing ones, sochurn. a good gooMarketing d deal deal of  marketing budget allocated to prevent

strongly they use data to make decisions across the company. They show statistically that the more data-driven a firm is, the more productive it is—even controlling for a wide range of  possible confounding factors. And the differences are not small: one standard deviation higher on the DDD scale is associated with a 4–6% increase in productivity. DDD also is correlated with higher return on assets, return on equit equity, y, asset utiliz utilization, ation, and market value, and the relationship seems to be causal. Our two examp example le case studies studie s illustrate illustrate two different different sorts of  decisions: (1) decisions for which ‘‘discoveries’’ need to be

*Of course! What goes better with strawberry Pop-Tarts than a nice cold beer?

MARY ANN LIEBERT, INC.

  

  VOL. VOL. 1

NO. NO. 1

  

  MARCH MARCH 2013 2013

BIG DA DATA  TA 

BD53

 

DATA SCIENCE AND BIG DATA  Provost Prov ost and Fawce Fawcett tt

made within data, and (2) decisions that repeat, especially at massive scale, and so decision making can benefit from even small increases in accuracy based on data analysis. The WalMartt exampl Mar examplee above above illust illustrat rates es a typ type-1 e-1 proble problem. m. Linda Linda Dillman would like to discover knowledge that will help WalMart prepare for Hurri Hurricane cane Frances’s imminent arrival. Our churn example illustrates a type2 DDD probl problem. em. A lar large ge tel telee-

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

industries have adopted automatic decision making at different rates. The finance and telecommunications industries were early adopters. In the 1990s, automated decision making changed chang ed the banking banking and consumer-c consumer-credit redit industries dramatica mat ically lly.. In the 1990s, 1990s, ban banks ks and teleco telecommu mmunic nicati ations ons compan com panies ies als also o implem implement ented ed massiv massive-s e-scal calee system systemss for managing data-driven fraud control trol dec decisi isions ons.. As retail retail system systemss

communicat comm unications ions compan companyy may  ‘‘THE BENEFITS OF DATA-DRIVEN were increasingl increasinglyy computeriz computerized, ed, ha have ve hu hund ndre reds ds of mi mill llio ions ns of  me merc rcha hand ndis isin ingg de deci cisi sions ons we were re DECISION MAKING HAVE BEEN customers, each a candidate for automated. Famous examples indef defect ection ion.. Tens Tens of mil millio lions ns of  DEMONSTRATED CONCLUSIVELY.’’ clu clude de Harrah Harrah’s ’s casinos casinos’’ reward reward customers have contracts expirprog progra rams ms an and d th thee au auto toma mate ted d ing each month, so each one of  recommendations of Amazon and them them has an inc increa reased sed lik likeli elihoo hood d of def defect ection ion in the near near Netflix. Currently we are seeing a revolution in advertising, future. If we can improve our ability to estimate, for a given due in large part to a huge increase in the amount of time customer, how profitable it would be for us to focus on her, consumers are spending online and the ability online to make we can potentially reap large benefits by applying this ability  (literally) split-second advertising decisions. to the millions of customers in the population. This same logic applies to many of the areas where we have seen the Data Processing and ‘‘Big Data’’ most intense application of data science and data mining: direct marketing, online advertising, credit scoring, financial Despite the impression one might get from the media, there is tradin trading, g, help-d help-desk esk manage managemen ment, t, fraud fraud det detect ection ion,, search search ranking, product recommendation, and so on. aneer loting to data processing thatcritical is not science.data-sci Data engineering and processing proce ssing are criti cal data to support data-science ence The diagram in Figure 1 shows data science supporting datadriven decision making, but also overlapping with it. This highlights the fact that, increasingly, business decisions are beingg made automatical bein automatically ly by computer computer systems. systems. Diffe Different rent

activities, as shown in Figure 1, but they are more general and are useful for much more. Data-processing technologies are important for many business tasks that do not involve extracting knowledge or data-driven decision making, such as efficient transaction processing, modern web system processing, online advertising campaign management, and others. ‘‘Big data’’ technologies, such as Hadoop, Hbase, CouchDB, and others have received received considerab considerable le media attention attention recently. For this article, we will simply take  big data  to   to mean datase dat asets ts tha thatt are too lar large ge for tradit tradition ional al data-pr data-proce ocessi ssing ng systems and that therefore require new technologies. As with the traditional technologies, big data technologies are used for many tasks, including data engineering. Occasionally, big data technologies technologies are actually actually used for   implementing   implementing   datamining techniques, but more often the well-known big data technologies are used for data processing   in support of   the data-mining techniques and other data-science activities, as represented in Figure 1. Economist Prasanna Tambe of New York University’s Stern School has examined the extent to which the utilization of big data technologies seems to help firms. 4 He finds that, after controlling for various possible confounding factors, the use of big data technologies correlates with significant additional productivity growth. Specifically, one standard deviation higher utiliza utilization tion of big dat dataa tec technol hnologi ogies es is associa associated ted wit with h 1–3% 1–3%

FIG. 1. Data sc science ience in the co context ntext of cclosel loselyy related pr processe ocessess in the organization.

54BD

 

higher productivity than the average firm; one standard deviation lower in terms of big data utilization is associated with 1–3% lower productivity. This leads to potentially very large productivity differences between the firms at the extremes. BIG DATA DATA

MARC MARCH H 2013 2013

 

ORIGINAL ARTICLE

Provost Provo st and Fawcett Fawcett

From Big Data 1.0 to Big Data 2.0

Data-Analytic Thinking

One way to think about the state of big data technologies is to draw dra w an analog analogyy wit with h the busine business ss adopti adoption on of int intern ernet et technologies. In Web 1.0, businesses busied themselves with getting the basic internet technologies in place so that they  could establish a web presence, build electronic commerce capability, and improve operating efficiency. We can think of 

One of the most critical aspects of data science is the support of data-analytic thinking. Skill at thinking data-analytically is important not just for the data scientist but throughout the organization. For example, managers and line employees in other functional areas will only get the best from the company’s pan y’s data-s data-scie cience nce resourc resources es if the theyy have have some some bas basic ic un-

ours oursel elve vess as be bein ingg in th thee era era of Big Big Da Data ta 1. 1.0, 0, wi with th fir firms ms engaged in building capabilities to process large data. These primarily support their current operations—for example, to make themselves more efficient.

 .   y    l   n With We Web b 1.0, 1.0, on once ce fir firms ms ha had d   oWith   e   sincorpo incorporat rated ed basic basic technol technologi ogies es   u    l thoroughly (and in the process had   a   n driven down prices) they started to   o   s   r look furt rthe her. r. They They be bega gan n to as ask  k    elook fu   p   rwhat the web could do for them,   o    Fand how it could could improv improvee upon upon  .    7 what wh at they th ey’d ’d alwa al ways ys done. don e. This This    1    /    0ushered in the era of Web 2.0, in    2    /    9which new systems and companies    0

derstandi dersta nding ng of the fundam fundament ental al pri princi nciple ples. s. Manage Managers rs in enterprises without substantial data-science resources should still understand basic principles in order to engage consultants on an informed basis. Investors in data-science ventures ne need ed to un unde ders rsta tand nd th thee fu fund ndaamental principles in order to assess sess in inve vest stme ment nt oppo opport rtun unit itie iess ‘‘SIMILARLY, WE SHOULD accurately. accur ately. More generally generally,, busiEXPECT A BIG DATA 2.0 PHASE nesses increasingly are driven by  TO FOLLOW BIG DATA 1.0 . THIS data analytics, and there is great profes pro fession sional al advant advantage age in bei being ng IS LIKELY TO USHER IN THE able able to in inte tera ract ct co comp mpet eten entl tly  y  GOLDEN ERA OF DATA SCIENCE.’’ with and within such businesses. Understandi Under standing ng the fundamental fundamental concepts, conce pts, and having having frameworks frameworks

started d to exp exploi loitt the inter interact active ive natur naturee of the web. The    t   a   starte   m changes brought on by this shift in thinking are extensive and   o   c  . pervasive; the most obvious are the incorporation of social   b   unetworking components and the rise of the ‘‘voice’’ of the   p    t   r   eindividual consumer (and citizen).    b   e    i    l  .   eSimilarly, we should expect a Big Data 2.0 phase to follow Big   n    i    l Dat a 1.0. 1.0. Onc Oncee firms firms have have bec become ome capabl capablee of process processing ing   nData   o   massive data in a flexible fashion, they should begin asking:   m   o   rWhat can I now do that I couldn’t do before, or do better than I     f  This is likely to usher in the golden era of data    4could do before?  This    8  .    4  . science. The principles and techniques of data science will be    2applied far more broadly and far more deeply than they are    8  .    4today.    7   y    b    d   eIt is important to note that in the Web-1.0 era, some pre   d   acocious companies began applying Web-2.0 ideas far ahead   o    l   nof the mainstream. Amazon is a prime example, incorpo  w ratingg the consu consume mer’ r’ss ‘‘voic ‘voice’ e’’’ early early on in the the ratin ratingg of    oratin    D products and product reviews (and deeper, in the rating of 

for organizing data-analytic thinking, not only will allow one to interact competently, but will help to envision opportunities for improving data-driven decision making or to see data-oriented competitive threats. Firms in many traditional industries are exploiting new and existing data resources for competitive advantage. They employ data-science data-science teams to bring advance advanced d technologie technologiess to bear to increase revenue and to decrease costs. In addition, many new companies are being developed with data mining as a key strategic component. Facebook and Twitter, along with many other ‘‘Digital 100’’ companies,5 have high valuations due primarily to data assets they are committed to capturing or creating.{ Increasingly, managers need to manage data-analytics teams and data-analysis projects, marketers havee to org hav organi anize ze and und unders erstan tand d data-d data-driv riven en cam campai paigns gns,, venture capitalists must be able to invest wisely in businesses with substantial data assets, and business strategists must be able to devise plans that exploit data.

review rev iewers ers). ). Sim Simila ilarly rly,, we see some companie companiess alr alread eadyy applyi plying ng Big Data 2.0. Amazon Amazon again again is a compan companyy at the forefro forefront, nt, providi providing ng data-dr data-drive iven n recomme recommendat ndations ions fro from m massive data. There are other examples as well. Online advertisers must process extremely large volumes of data (billions of ad impressions per day is not unusual) and maintain a very very high throughp throughput ut (real(real-tim timee bidding bidding systems systems make decisions in tens of milliseconds). We should look to these and similar industries for signs of advances in big data and

As a few examples, if a consultant presents a proposal to exploit a data asset to improve your business, you should be able to assess whether the proposal makes sense. If a competitor announces a new data partnership, you should recognize when it may put you at a strategic disadvantage. Or, let’s say you take a position with a venture firm and your first project is to assess the potential for investing in an advertising company. The founders present a convincing argument that they will realize significant value from a unique body of data

dataa sc dat scie ienc ncee th that at subse subsequ quen entl tlyy wi will ll be adopte adopted d by other other industries.

they will collect, and on that basis, are arguing for a subst stan anti tial ally ly hi highe gherr valu valuat ation ion.. Is th this is reas reasona onabl ble? e? With With an

{

Of course, this is not a new phenomenon. Amazon and Google are well-established companies that obtain tremendous value from their data assets.

MARY ANN LIEBERT, INC.

  

  VOL. VOL. 1

NO. NO. 1

  

  MARCH MARCH 2013 2013

BIG DA DATA  TA 

BD55

 

DATA SCIENCE AND BIG DATA  Provost Prov ost and Fawce Fawcett tt

understan unders tanding ding of the fundam fundament entals als of data data scienc science, e, you should be able to devise a few probing questions to determine whether their valuation arguments are plausible. On a scale less grand, but probably more common, dataanalytics analy tics projects reach into all business business units units.. Employees Employees throughout these units must interact with the data-science team team.. If th thes esee em empl ploy oyee eess do not not ha have ve a fu fund ndam amen enta tall

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

lems. For example, in actual practice one repeatedly sees analytical ‘‘solutions’’ that are not based on careful analysis of  the problem or are not carefully evaluated. Structured thinking about analytics emphasizes these often underappreciated aspect asp ectss of suppor supportin tingg decisi decision on making making wit with h data. data. Such Such structur stru ctured ed thinking thinking also cont contrasts rasts critical critical points points at which which human intuition and creativity is necessary versus points at which high-powered analytical tools can be brought to bear.

grounding in the principles of data-analyti grounding data-analyticc thinking, thinking, they  will not really understand what is happening in the business. Fundamental concept:  Evaluating data-science results requires  This lack of understanding is much more damaging in datacareful consideration of the context in which they will be used. science projects than in other technical projects, because the Whether knowledge extracted from data will aid in decision dataa scienc dat sciencee sup suppor ports ts imp improv roved ed making mak ing depend dependss critic criticall allyy on the de deci cisi sion on ma maki king ng.. Da Data ta-s -sci cien ence ce applic app licatio ation n in questi question. on. For our projects proje cts requi require re close inter interaction action churn-manage churnmanagement ment example, example, how  ‘‘FACEBOOK AND TWITTER, be betw twee een n th thee scie scient ntis ists ts an and d the the ex exac actl tlyy are are we go goin ingg to use use th thee  ALONG WITH MANY OTHER business people responsible for the pa patt tter erns ns th that at are are ex extr trac acted ted from from ‘DIGITAL ‘DIGI TAL 100’ COMPANIE COMP ANIES, S, HAVE decisi dec ision on mak making ing.. Firms Firms in which which hi hist stori orica call da data ta?? More More ge gene nera rall lly, y, the business people do not underdoes the pattern lead to better deHIGH VALUATIONS DUE sta stand nd what what the data scient scientist istss are cisions than some reasonable alterPRIMARILY TO DATA ASSETS doing doi ng are at a substa substanti ntial al disaddisadnative? How well would one have THEY ARE COMMITTED TO vantag van tage, e, becaus becausee the theyy waste waste time time done by chance? How well would CAPTURING OR CREATING.’’ and effort or, worse, because they  one do with a smart ‘‘default’’ alultimatel ulti matelyy make wrong decisions. decisions. ternative? terna tive? Many data science evaA rec recent ent art articl iclee in Harvar Harvard d BusiBusilua luatio tion n framew framework orkss are bas based ed on ness Review concludes: ‘‘For all the breathless promises about this fundamental concept. the return on investment in Big Data, however, companies Fundamental concept:  The relationship between the business  face a challenge. Investments in analytics can be useless, even  problem and the analytics solution often can be decomposed into  harmfu har mful, l, unless unless emp employ loyees ees can inc incorpo orporat ratee that that dat dataa int into o 6 tractable tract able subprobl subproblems ems via the fram framework ework of analyzing analyzing expected  complex decision making.’’ value.   Variou Variouss tool toolss for min mining ing data data exi exist, st, but busine business ss problems rarely come neatly prepared for their application. Some Fundamental Concepts Breaking the business problem up into components corresponding to estimating probabilities and computing or estiof Data Science mating values, along with a structure for recombining the There is a set of well-studied, fundamental concepts undercomponents, is broadly useful. We have many specific tools lyingg the principled lyin principled extraction extraction of knowledge knowledge from data, with for estimating probabilities and values from data. For our both theoretical and empirical backing. These fundamental churn example, should the   value  of   of the customer be taken concepts of data science are drawn from many fields that into account in addition to the likelihood of leaving? It is study data analytics. analytics. Some reflect reflect the relationshi relationship p betw between een difficult to realistically assess any customer-targeting solution data science and the business problems to be solved. Some without phrasing the problem as one of expected value. reflect the sorts of knowledge discoveries that can be made Fundamental concept:  Information technology can be used to  and are the bas basis is for techn technica icall soluti solutions ons.. Others Others are caucautionary and prescriptive. We briefly discuss a few here. This  find informative data items from within a large body of data. list is not intended to be exhaustive; detailed discussions even One of the first data-science concepts encountered in busiof the handful below would fill a book.* The important thing ness-analy nessanalytics tics scenarios scenarios is the notion of finding finding correlation correlations. s. is that we understand these fundamental concepts. ‘‘Correlation’’ often is used loosely to mean data items that provide provi de infor information mation about other data items—spec items—specifical ifically, ly, Fundamental concept:  Extracting useful knowledge from data  known kno wn quanti quantitie tiess that that reduce reduce our unc uncert ertain ainty ty about about unto solve business problems can be treated systematically by folknown quantities. In our churn example, a quantity of inlowing a process with reasonably well-defined stages. The stages. The Crossterest is the likelihood that a particular customer will leave Industry Standard Process for Data Mining7 (CRISP-DM) is after her contract expires. Before the contract expires, this one of our this thinking process. Keeping such a process in mindcodification can structure about data analytics prob-

would wou ld be unknow unknown n quanti qua ntity. ty.history, Howeve However, r, the there re may be known dataanitems (usage, service how many friends

*And they do; see http://data-science-for-biz.com.

56BD

 

BIG DATA DATA

MARC MARCH H 2013 2013

 

ORIGINAL ARTICLE

Provost Provo st and Fawcett Fawcett

have canceled contracts) that correlate with our quantity of  interest. This fundamental concept underlies a vast number of techniques for statistical analysis, predictive modeling, and other data mining. Fundamental concept:  Entities that are similar with respect to  known features or attributes often are similar with respect to  unknown features or attributes. Computing attributes. Computing similarity is one of 

ciplines. Universities clearly see the need for such programs, and it is only a matter of time before this first complication will be resolved. For example, in New York City alone, two top universities are creating degree programs in data science. Columbia University is in the process of creating a master’s degree program within its new Institute for Data Sciences and Engineering (and has founded a center focusing on the found foundati ations ons of data data sci scienc ence), e), and NYU NYU wil willl

the ma the main in tool toolss of da data ta scie scienc nce. e. Ther Theree ar aree many many wa ways ys to compute similarity and more are invented each year.

commence commen ce a master master’s ’s degree degree program program in data data sci scien ence ce in fall 2013.

The second second complic complicatio ation n builds builds  . Fundamental concept:  If you look    y    l too hard at a set of data, you will  on confusion caused by the first.   n ‘‘WITHOUT ACADEMIC   o  find something—but it might not  Worke Wor kers rs ten tend d to ass associ ociate ate wit with h   e   s PROGRAMS DEFINING THE FIELD   ugen eraliz lizee beyond beyond the data data you’re  you’re  their the ir field field the tasks tasks the theyy spend spend    l genera   aobserving.   This FOR US, WE NEED TO DEFINE Th is is refe re ferr rred ed to as consi co nsider derabl able e ti time me on or th those ose   n   o   s they find challenging or reward  r‘‘overfitting’’ a dataset. Techniques THE FIELD FOR OURSELVES.’’   e   pfor mining data can be very powing. This is in contrast to the tasks   r   oerful, and the need to detect and that   differentiate   differentiate   the field field from from    F  . avoid overfitting is one of the most important concepts to other oth er field fields. s. Forsyt Forsythe he de descr scribe ibed d thi thiss phe phenom nomen enon on in an    7    1    / grasp when applying data-mining tools to real problems. The ethnographic study of practitioners in artificial intelligence    0    2    / concept of overfitting and its avoidance permeates data sci(AI):    9    0    t   a   ence processes, algorithms, and evaluation methods.   m   o   c  . Fundamental concept:  To draw causal conclusions, one must     b   u pay very close attention to the presence of confounding factors,   p    t   r possibly unseen ones. Often, ones.  Often, it is not enough simply to un  e    bcover correlations in data; we may want to use our models to   e    i    l  .   eguide decisions on how to influence the behavior producing   n    i the data. For our churn problem, we want to intervene and    l   n   o cause    customer customer retenti retention. on. All methods for drawing drawing causal   cause    m conclusions—from interpreting the coefficients of regression   o   r    f models to randomized controlled experiments—incorporate    4    8  . assu assump mpti tion onss re rega gard rdin ingg th thee pr pres esen ence ce or ab abse senc ncee of concon   4  .    2founding factors. In applying such methods, it is important    8  .    4to understand their assumptions clearly in order to under   7   ystand the scope of any causal claims.    b    d   e    d   a   oChemistry Is Not About Test Tubes: Data    l   n   w   oScience vs. the Work of the Data Scientist    D

Two additional, related complications combine to make it more difficult difficult to reac reach h a common common understan understanding ding of just what what is da data ta scie scienc ncee an and d ho how w it fit fitss with with ot othe herr re rela late ted d concepts. First is the dearth of academic programs focusing on data science. Without academic programs defining the field for us, we need to define the field for ourselves. However, each of us sees the field from a different perspective and thereby  forms a different conception. The dearth of academic programs is largely due to the inertia associated with academia and the concomitant effort involved in creating new academic programs—especially ones that span traditional disMARY ANN LIEBERT, INC.

  

  VOL. VOL. 1

NO. NO. 1

  

  MARCH MARCH 2013 2013

BIG DA DATA  TA 

The AI specialists I describe view their professional wo work rk as scie scienc ncee (a (and nd in some some ca case sess en engi gine neer er-ing).The sci scient entist ists’ s’ work work and the approa approach ch the they  y  take to it make sense in relation to a particular view  of th thee worl world d th that at is ta take ken n for for gran grante ted d in th thee laboratory .Wondering what it means to ‘‘do AI,’’ I have asked many practitioners to describe their own work. Their answers invariably focus on one or more of the following: problem solving, writing code, and building systems.8 Forsythe goes on to explain that the AI practitioners focus on these three activities even when it is clear that they spend much time doing other things (even less related specifically to AI). Importantly,   none   of these three tasks differentiates AI from other scientific and engineering fields. Clearly just being very good at these three things does not an AI scientist scientist make. And as Forsythe points out, technically the latter two are not even necessary, as the lab director, a famous AI Scientist, had not written code or built systems for years. Nonetheless, these are the tasks the AI scientists saw as defining their work— they apparently did not explicitly consider the notion of what makes doing AI different from doing other tasks that involve problem solving, writing code, and system building. (This is possibly due to the fact that in AI, there were academic distinctions to call on.) Taken togeth Taken together, er, the these se two com compli plicat cation ionss cau cause se par partic ticula ularr confusion in data science, because there are few academic distinctions to fall back on, and moreover, due to the state of  the art in data processing, data scientists tend to spend a majority of their problem-solving time on data preparation and proces processin sing. g. The goal of such such prepar preparati ation on is eit either her to

BD57

 

DATA SCIENCE AND BIG DATA  Provost Prov ost and Fawce Fawcett tt

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0

subsequently apply data-science methods or to understand the results. However, that does not change the fact that the day-to-day work of a data scientist—especially an entry-level one—may be largely data processing. This is directly analogous to an entry-level chemist spending the majority of her time doing technical lab work. If this were all she were trained to do, she likely would not be rightly called a chemist but rather a lab technician. Important for being a chemist is that

thinking that fosters thinking fosters succe success ss in data-driven data-driven decision making. These The se data data sci scienc encee concep concepts ts are genera generall and ver veryy broadl broadly  y  applicable.

this work is in support of the application of the science of  chemistry, and hopefully the eventual advancement to jobs involving more chemistry and less technical work. Similarly  for data science: a chief scientist in a data-science-oriented company will do much less data processing and more dataanalytics design and interpretation.

themselves are part of data science. For example, the automated extraction of patterns from data is a process with welldefined stages. Understanding this process and its stages helps struct structure ure problem problem solvin solving, g, makes makes it mor moree system systemati atic, c, and thus less prone to error.

At the time of this writing, discussions of data science inevitably mention not just the analytical skills but the popular tools used in such analysis. For example, it is common to see  job advertisements mentioning data-mining techniques (random forests, support vector machines), specific application areas (recommendation systems, ad placement optimization), alongside popular software tools for processing big data (SQL, Hadoop, MongoDB). This is natural. The partic   t   a   ular concerns of data science in business are fairly new, and   m   o businesses are still working to figure out how best to address   c  .    b them. Continuing our analogy, the state of data science may    u   p be likened to that of chemistry in the mid-19th century, when    t   r   e    b theories and general principles were being formulated and the   e    i    l  . field was largely experimental. Every good chemist had to be a   e   n competent lab technician. Similarly, it is hard to imagine a    i    l   n working data scientist who is not proficient with certain sorts   o     m of software tools. A firm may be well served by requiring that   o   r their data scientists have skills to access, prepare, and process    f    4 data using tools the firm has adopted.    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

Nevertheless, we emphasize that there is an important reason to focus here on the general principles of data science. In ten

Success in today’s Success today’s data-orient data-oriented ed business business envi environmen ronmentt requires qui res bei being ng abl ablee to thi think nk abo about ut how the these se fundam fundament ental al concepts conce pts apply to particular particular business problems—to problems—to think  data-analytically. This is aided by conceptual frameworks that

There is strong evidence that business performance can be improved substantially via data-driven decision making, 3 big data technologies,4 and data-scienc data-sciencee techniques techniques based on big 9,10 data. Data science supports supports data-driven data-driven decision making—and sometimes allows making decisions automatically  at massive scale—and depends upon technologies for ‘‘big data’’’ storage data’ storage and engineerin engineering. g. Howeve However, r, the principles principles of  data science are its own and should be considered and discu cuss ssed ed ex expl plic icit itly ly in orde orderr for for da data ta scie scienc ncee to real realize ize it itss potential.

Author Disclosure Statement F.P. and T.F. F.P. T.F. are authors authors of the forthc forthcomi oming ng book, book,   Data  Science Scien ce for Busine Business  ss .

References 1. Daven Davenport port T.H., and Patil D.J. Data scientist: scientist: the sexiest  job of the 21st century. Harv Bus Rev, Oct 2012. 2. Hays C. L. What they know abo about ut you. N Y Times, Nov. 14, 2004. 3. Bry Brynjo njolf lfsson sson E., Hit Hittt L.M., L.M., and Kim H.H. H.H. Streng Strength th in

 years’ time, the predominant technologies will likely have changed or advanced enough that today’s choices would seem quaint qua int.. On the other han hand, d, the genera generall pri princi nciple pless of dat dataa science are not so differerent than they were 20 years ago and likely will change little over the coming decades.

4.

Conclusion

5.

Underlying the extensive collection of techniques for mining data is a much smaller set of fundamental concepts comprising data science  data  science . In order for data science to flourish as a field, rather than to drown in the flood of popular attention, we must think beyond the algorithms, techniques, and tools in common use. We must think about the core principles and concepts that underlie the techniques, and also the systematic

6.

58BD

 

numbers: How does data-driven decision making affect firm performance? Working paper, 2011. SSRN working paper. pap er. Ava Availa ilable ble at SSRN: SSRN: htt http:/ p://ss /ssrn. rn.com com/ab /abstr stract act 1819486. Tambe Tam be P. Bi Bigg da data ta kn know ow-h -how ow an and d busi busine ness ss valu value. e. Workin Wor kingg paper, paper, NYU Stern Stern School School of Busine Business, ss, NY, New York, 2012. Fus Fusfel feld d A. The digital digital 100: 100: the world’ world’ss most most val valuab uable le startups. Bus Insider. Sep. 23, 2010. Shah S., Shah S., Hor Horne ne A., A., an and d Ca Cape pell llaa´   J. Go Good od da data ta wo won’ n’tt guarantee good decisions. Harv Bus Rev, Apr 2012. Wirth, R., and Hip Wirth, Hipp, p, J. CRI CRISPSP-DM: DM: Towards Towards a sta stanndard process model for data mining. In Proceedings of  the 4th Internatio International nal Conference Conference on the Practical Practical Appli plicat cation ionss of Knowle Knowledge dge Discov Discovery ery and Data Data Mining Mining,, 2000, pp. 29–39. =

7.

BIG DATA DATA

MARC MARCH H 2013 2013

 

ORIGINAL ARTICLE

Provost Provo st and Fawcett Fawcett

8. Forsythe, Forsythe, Diana E. The construction construction of work in artificial artificial intelligen intel ligence. ce. Science, Science,   Techn Technol olog ogyy & Huma Human n Va Valu lues es,, 18(4), 1993, pp. 460–479. 9. Hil Hill, l, S., Provos Provost, t, F., and Volins Volinsky, ky, C. Networ Network-b k-base ased d market mar keting ing:: Identi Identifyi fying ng lik likely ely adopte adopters rs via consum consumer er networks. Statistical networks.  Statistical Science, Science, 21(2), 2006, pp. 256–276. 10 10.. Marten Martenss D. and Pro Provos vostt F. Pseudo Pseudo-so -socia ciall networ network k tar tar-geting from consumer transaction data. Working paper, CEDER-11-05, Stern School of Business, 2011. Available at SSRN: http://ssrn.com/abstract 1934670. =

Address correspondence to: F. Provost  Department of Information, Operations, and Management Sciences  Leonard N. Stern School of Business  New York University  44 W. 4th Street, 8 th  Floor  New York, NY 10012  E-mail: [email protected] E-mail:  [email protected]

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

MARY ANN LIEBERT, INC.

  

  VOL. VOL. 1

NO. NO. 1

  

  MARCH MARCH 2013 2013

BIG DA DATA  TA 

BD59

 

This article has been cited by:

1. Kamaluddeen Usman Danyaro, M. S. Liew. A Proposed Methodology for Integrating Oil and Gas Data Using Semantic Big  Crossref ] Data Technology Technology 30 30-38. -38. [[Crossref  2. Ettore Bolisani, Constantin Bratianu. Strategic Performance Performance and Knowledge Measuremen Measurementt 175-198. [Crossref ] 3. Weisheng Lu, Yi Peng, Fa Fann Xue, Ke Chen, Yuhan Niu, Xi Chen. The Fusion of GIS and Building Information Modeling for Big  Crossref ] Data Analytics in Managing Developmen Developmentt Sites 34 345-359. 5-359. [[Crossref  4. Ming-Chi Liu, Y Yueh-Min ueh-Min Huang. 2017. The use of data science for educati education: on: The case of social-emotional learning. Smart 

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

 4:1. . [C Learning Environments  4 [Crossref ] 5. Ivana Semanjski, Sidharta Gautama, Rein Ahas, FFrank rank Witlox. 2017. Spatial context mining approach for transport mode recognition from mobile sensed big data. Computers, Environment and Urban Systems  66, 38-52. [Crossref  [Crossref ] 6. S. Vijayakumar Bharathi. 2017. Prioritizing an andd Ranking the Big Data Information Security Risk Spectrum. Spectrum. G llobal obal Journal of  Flexible Systems Management  18:3, 183-201. [Crossref  [Crossref ] 7. Jie Sheng, Joseph Amankwah-Am Amankwah-Amoah, oah, Xiaojun W Wan ang. g. 2017. 2017. A  A multidisciplinary perspective of big data in management research. [Crossref ] International Journal of Production Economics  191, 97-112. [Crossref  8. Sarah Giest. 2017. Big data for policymaking: fad or fasttrack?. Poli cy cy Scienc es  es  50:3, 367-382. [Crossref  [Crossref ] 9. Carlos Costa, Maribel Yasmina San Santos. tos. 2017. The data scientist profile and its representativeness representativeness in the European e-Competence framework and the skills framework for fo r the inf oormation rmation age. International Journal of Information Management   .. [[Crossref  Crossref ] 10. Sarah Cheah, Shenghui W Wang. ang. 2017. Big data-driven business model innovati innovation on by traditional industries in the Chinese economy. ef ]  Journal of Chinese Economic and Foreign Trade Studies  1, 00-00. [Crossr [Crossref  11. Marko Bohanec, Marko Robnik Robnik-Šikonja, -Šikonja, Mirjana Kljajić Borštnar. 2017. Decision-making Dec ision-making framework with double-loop double-loop learning 

through interpretable black-box machine learning models. Industrial Management & Data Systems  117:7, 1389-1406. [Crossref  [Crossref ] 12. Patri Patrick ck Mikalef, Ilias O. Pappas, John Krogstie, Michail Giannakos. 2017. Big data analytics capabilities: a systematic literature review and research agenda. Information Systems and e-Business Management  17. . [Crossref  [Crossref ] 13. Thomas M. Kreuzer, Martina Wilde, Birgit T Terhorst, erhorst, Bodo Damm. 2017. A landslide in invent ventory ory system as a base for automat automated ed process and risk analyses. Earth Science Informatics  58. . [Crossref  [Crossref ] 14. Adrian Gardiner, Cheryl Aasheim, Paige R Rutner, utner, Susan Williams. 2017. Skill Requirements Requirements in Big Data: A Content Analysis of   Job Advertisements. Advertisements. Journal of Computer Information Systems  35, 1-11. [Crossref  [Crossref ] 15. Rafael de Oliveira W Werneck, erneck, W Waldir aldir Rodrigues de Almeida, Bernardo Vecchia Stein, Daniel Vatanabe Pazinato, Pedro Ribeiro Mendes Júnior, Otávio Augusto Bizetto Penatti, Anderson Rocha, Ricardo da Silva Torres. 2017. Kuaa: A unified framework  for design, deployment, execution, and recommendation of machine learning experiments. Futu Future re Gener ation ation Computer Systems  . [[Crossref  Crossref ] 16. Marynia Kolak. 2017. Ian Foster, Rayid Ghani, Ron Ron S Jarmin, et al. (eds), Big data and social science: A practical guide to methods and tools FosterIan, GhaniRayid, JarminRon S, et al. (eds), Big data and social science: A practical guide to methods and tools. CRC Press: Boca Raton, FL, 2016; 356 pp; ISBN: 9781498751407, US$49.95 (hbk). Environment and Planning B: Urban Analytics and City Science  69, 239980831771198. [Crossref  [Crossref ] 17. Matthias Murawski, Markus Bick. 2017. Digital competences competences of the workforce ? a research topic?. Business Process Management  [Crossref ]  Journal   23 23:3, 721-734. [Crossref  18. Kevin Daniel Andr? Carillo Carillo.. 2017. Let?s stop trying to be ?sex ?sexy? y? ? preparing managers for the (big) data-driven business era. [Crossref ] Business Process Management Journal   23 23:3, 598-622. [Crossref  19. Samuel T T.. McAbee, Ronald S. Landis, Maura I. Burke. 2017. 20 17. Inductive reasonin reasoning: g: The promise of big data. Human Resource  [Crossref ] Management Review 27  27:2, 277-290. [Crossref  20. Andrea De Mauro, Marco Greco, Mi Michele chele Grimaldi, Paa Paavo vo R iitala. tala. 20 2017. 17. Human resources for Big Data professions: A systematic classification of job roles and required skill sets. Information Processing & Management  . . [Crossref  [Crossref ] 21. Diana Heredia Vizcaino, Wilson Nieto Bernal. Proposal for a methodology for the adoption of the big data 1-6. [Crossref  Crossref ] 22. Guillaum Guillaumee Coqueret. 2017 2017.. Approximate NORT NORTA A simulations for virtual sample generation. Expert Systems with Applications  73, 69-81. [Crossref  [Crossref ] 23. Allen D. Allen. 2017. extract] knowledge knowledge from fuzzy big data: Conserving Conserv ing traditional science1. science1. Journal of Intelligent  & Fuzzy Systems   32:5,Algorithms 3689-3694.that [Crossref  [Crossref  24. Saša Baškarada, Andy Koroni Koronios. os. 2017. Unicorn data scien scientist: tist: the rarest of breeds. Program 51:1, 65-74. [Crossref  [Crossref ]

 

25. Marko Bohanec, Mirjana Kljajić Borštnar, Marko Robnik-Šikonja. Robnik-Šikonja. 2017. Explaining machine learning models in sales s ales predictions. Expert Systems with Applications  71, 416-428. [Crossref  [Crossref ] 26. Andoni Elola, Javi Javier er Del Ser, Miren Nekane Bilbao, Cristina Perfecto, Enrique Alexandre, Sancho Salc Salcedo-Sanz. edo-Sanz. 2017. Hybridizing  Cartesian Genetic Programming and Harmony Harmony Search for adaptive feature construction in supervised super vised learning problems.  Applied  Soft Computing  52, 760-770. [Crossref  [Crossref ] 27. Elias G. Carayannis, Evangelos Grigoroudis, Manlio Del Giudice, Maria Rosaria Rosaria Della Peruta, Stavros Sindakis. 2017. An exploration of contemporary contemporary organizational artifacts and routines in a sustainable excellence context.  Journal of Knowledge  Management  21  21:1, 35-56. [Crossref  [Crossref ] 28. Ali Intezari, Sim Simone one Gressel. 2017. Information of Knowledge Management    21 21:1, 71-91. [Crossref and [Crossref  ] reformation in KM systems: big data and strategic decision-making. Journal  29. Dominikus Kleindienst. 2017. The data qualit qualityy improvemen improvementt plan: deciding on choice and sequence of data quality improvements. Electronic Markets   .. [[Crossref  Crossref ]  .   y 30. Xiaohong Liu, Junfei Chu, P Pengzhen engzhen Yin, Jiasen Sun. 2017. DEA cross-efficiency evaluation considering undesirable output and    l   n   o ranking priority : a case study of eco-efficiency analysis of coal-fired power plants.  Journal of Cleaner Production   142, 877-885.   e   s [Crossref ]   u    l   a 31. Andrzej Szwabe, Paw Paweł eł Misiorek, Michał Ciesielczyk. Evaluation Evaluation ooff Tensor-Based Algorithms for Real-Time Bidding    n   o   s Optimization 160-169. [Crossref  [Crossref ]   r   e   p 32. Muhammad Fahim, Thar Baker. Kn Knowledge-Based owledge-Based Decision Support Systems for Personalized u-lifecare Big Data Services   r   o 187-203. [Crossref  [Crossref ]    F  .    7 Developing eloping Modified Classifier for Big Data Paradigm: An Approach Through Bio-Inspired    1    / 33. Youakim Badr, Soumya Banerjee. Dev    0 Soft Computing 109-122. [ [Crossref  Crossref  ]    2    /    9    0 34. Carlos Costa, Maribel Yasmina San Santos. tos. A Conceptual Model Model for the Professional Profile of a Data Scientist 453-463. [Crossref  [Crossref ]    t   a   35. Jiang Zeyu, Yu Shuiping, Zho Zhouu Mingduan, Chen Yong qqiang, iang, L Liu iu Yi. 2017. Model Study for Intelligen Intelligentt Transportation System   m   o with Big Data. Procedia Computer Science  107, 418-426. [Crossref  [Crossref ]   c  .    b   u 36. Stavros Sindakis. Applying Data Analytics for Innovation and Sustainable Enterprise Enterprise Excellence 271-275. [Crossref  [Crossref ]   p    t   r Michal chal Ciesielczyk. Tensor-Based Modeling of Temporal Temporal Features Features f oorr Big Data CTR Estimation   e 37. Andrzej Szwabe, Pawel Misiorek, Mi    b   e 16-27. [Crossref  [Crossref ]    i    l  .   e   n 38. Florin Gheorghe Filip, Constantin-Bălă Zamfirescu, Cristian Ciurea. Essential Enabling T Technologies echnologies 121-176. [Crossref  [Crossref ]    i    l   n Eunkyun unkyungg Kweon Kweon,, Minkyun Kim, Sangmi Chai. 2017. Does Implementation of Big Data Analytics Improve Firms’   o   39. Hansol Lee, E   m Market Value? Investors’ Reaction in Stock Market. Sustainability 9:6, 978. [Crossref  [Crossref ]   o   r    f Pankaj ankaj Sharma, Unai Gorostegui Gabiria, Erkki Jantunen, Davi Davidd Baglee. 2017. A Big Data Analytical Archi Architecture tecture    4 40. Jaime Campos, P    8  . for the Asset Managemen Management.t. Procedia CIRP  64, 369-374. [Crossref  [Crossref ]    4  .    2 Martinez. ez. Data Scienc Sciencee 1-4. [[Crossref  Crossref ]    8  . 41. Lourdes S. Martin    4    7 42. Fahim Ahmed, Ky Kyoung-Y oung-Yun un Kim. 20 2017. 17. Data-driven Weld Nugget Width Prediction with Decision Tree Algorithm. Procedia   y    b    d   e    d   a   o    l   n   w   o    D

Manufacturing  10, 1009-1019. [Crossref  [Crossref ] 43. Wardah Zainal Abidin, Nur Amie Ismail, Nuraz Nurazean ean Maarop, Rose Alinda Alias. Skills Sets Towards Towards Becoming Effective Data  Scientists 97-106. [Cross [Crossref  ref ] 44. Chhaya S. Dule, H. A. Girijamma. Guaging the Effectivity of Existing Security Measures for Big Data in Cloud Environment Environment 209-219. [Crossref  [Crossref ] 45. Abeer Ahmed Abdullah AL-Hakimi. 752, 381. [Crossref  [Crossref ] 46. Rudolp udolphh Piena aar,r, Ata Turk, Jorge Bernal-Rusiel, Nicolas Rannou, Daniel Haehn, P. Ellen Grant, Orran Krieger. 10494, 29. [Crossref ] 47. Hiroshi Mamiya, Arash Shaban-Nejad, David L. Buckeridge. Online Public Public Health Intelligenc Intelligence: e: Ethical Considerations at the Big Data Era 129-148. [Crossref  [Crossref ] 48. Md T Tarique arique Hasan Khan, Fahim Ahmed, Ky Kyoung-Y oung-Yun un Kim. 2017. Weldabili Weldability ty Knowledge Visualization of Resistance Spot Welded Welded  Assembly Design. Procedia Manufacturing  11, 1609-1616. 1609-1616. [[Cro Crossref  ssref ] 49. Yan Li, Manoj Thomas, Kw Kweku-Muata eku-Muata Osei-Bryson, Jason Lev Levy. y. 2016. Problem Form Formulation ulation in Knowledge Discov Discovery ery via Data 

 Analytics (KDDA) Environmental vironmental Ris Riskk Manag eement. ment. International Journal of Environmental Research and Public Health  13:12, 1245. [Crossref  [Crossref  ] for En

 

50. Leonardo M. Millefiori, Dimitri Dimitrios os Zissis, Luca C Cazzanti, azzanti, Gianfranc Gianfrancoo Arcieri. Scalable and Distributed Sea Port Operational Areas Estimation from AIS Data 374-381. [Crossref  [ Crossref ] 51. Babak Y Yadranjiaghdam, adranjiaghdam, Nathan Pool, Nasseh Tabrizi. A Survey on Real-Tim Real-Timee Big Data Analytics: Applications and Tools 404-409. [Crossref  [Crossref ] 52. Ivana Semanjski, Rik Bellens, Sidharta Gautama, FFrank rank Witlox. 2016. Integrating Big Data into a Sustainable Mobility Policy 2.0 Planning Support System. Sustainability 8:11, 1142. [Crossref  [Crossref ] 53. Yan Li, Manoj A. Thomas, Kw Kweku-Muata eku-Muata Osei-Bryson. 2016. A snail shell process model for knowledge knowledge dis discovery covery via data  analytics. Decision Support Systems  91, 1-12. [Crossref  [Crossref ]

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

54. SMEs ShirleyBenefit Coleman, Göb, Challenges Giuseppe Manco, Antonio Pievatolo, PievatQuality olo, Xavier T Tort-Martorell, ort-Martorell, MarcoInternational  Seab Seabra ra Reis. 322016. How Can fromRainer Big Data? and a Path FForward. orward. and Reliability Engineering :6, 2151-2164. [Crossref ] 55. Kayode Ayank Ayankoya, oya, Andre PP.. Calitz, Jean H. Greyling. 2016. Real-Time Grain Commodi Commodities ties Price Predictions in South Africa: A Big Data and Neural Networks Approach.  Agrekon 55:4, 483-508. 483-508. [Crossr [Crossref  ef ] 56. Ajaya K. Swain. 2016. Mining big data ttoo support decision making in healthcare. healthcare.  Journal of o f Information Technology Case and  [Crossref ]  Application Research 18:3, 141-154. [C 57. Shanliang Ya Yang, ng, Mei Yang, Song W Wang, ang, Kedi Huang. 2016. Adaptive immun immunee genetic algorithm for weapon system portfolio optimization in military big data environmen environment.t. Cluster Computing  19:3, 1359-1372. [Crossref  [Crossref ] 58. Xiangzheng Deng. 2016. Urgent need for a data sharing platform to prom promote ote ecol ecological ogical research in China. Ecosystem Health and  [Crossref ] Sustainability  22:9, e01241. [Crossref  59. Satyanarayana V Nandury, Beneyaz A Begum. Strategies to han handle dle big data for traffic managemen managementt in smart cities 356-364. [Crossref ] 60. segmentation. Shaofu Du, Wenzhi Tang, Jiajia Zhao, T Tengfei engfei Nie. 2016. competition etition facing market Crossref  ] Sell to whom? Firm’s green production in comp  . [[Crossref   AnnalsT ofang, Operations Research 61. Russell Newman, Victor Chang, Robert John Wa Walters, lters, Gary Brian Wills. 2016. Model and experimental development development for Business Data Science. International Journal of Information Management  36:4, 607-617. [Crossref  [Crossref ] 62. Lisbeth Rodríguez-Mazahua, Cristian-Aarón R Rodríguez-Enríquez, odríguez-Enríquez, José Luis Sánchez-Cervantes, Jair Cervantes, Jorge Luis García-Alcaraz, Giner Alor-Hernández. 2016. A general perspective of Big Data: applications, tools, challenges and trends. The   Journal of Supercomputing  72:8, 3073-3113. [Crossref  [Crossref ] 63. Il-Y Il-Yeol eol Song, Yongjun Zhu. 2016. Big data and data science: what should w wee teach?. Expert Systems  33:4, 364-373. [Crossref  [Crossref ] 64. Rong T Tang, ang, Watin Watinee ee Sae-Lim. 2016. Data science programs in U.S. higher education: An exploratory content content analysis of program description, curriculum structure, and course focus. Education for Information 32:3, 269-290. [Crossref  [Crossref ] 65. Hao Peifeng, Cui Yuzh Yuzhe, e, Song Jingping, Hu Zhaomu. Smart wardrobe system based on Android platform 279-285. [Crossref ] 66. Alberto FFernández, ernández, Sara del Río, Abdullah Bawakid, Francisc Franciscoo Herrer Herrera. a. 2016. Fuzzy rule based classification systems for big data  Crossref ] with MapReduce: granularity granularity analysis. Advances in Data Analysis and Classification . [[Crossref  67. Jianwei Yin, Y Ya  a n Tang, Wei Lo, Zhaohui Wu. From Big Data to Great Services 165-172. [[Crossref  Crossref ] 68. Carolina Lagos, Sebastian Gutierrez, Felisa Cordova, Guillerm Guillermoo Fuert Fuertes, es, Raul Carrasco. Data analysis methods related to energetic consumption in copper mining — A Test est case  case in Chile 244-249. [[Crossref  Crossref ] 69. Zhenning Xu, Gary L. FFrankwick, rankwick, Edward Ramirez. 2016. Effects of big data analytics and traditional marketing analytics on new  ssref ] product success: A knowledge fusion perspective.  Journal of Business Research 69:5, 1562-1566. [Cro [Crossref  70. Myongho Yi. 2016. A Study on th thee Curriculums of Data Science. Science. Journal of the Korean BIBLIA Society for library and Information Science  27  27:1, 263-290. [Crossref  [Crossref ] 71. Thomas Hart, Lei Xie. 2016. Providing data science science support for systems pharmacology and its implicati implications ons to drug discovery. Expert Opinion on Drug Discovery 11:3, 241-256. [Crossref  [Crossref ] 72. Meiyu Fan, Jian Sun, Bin Zhou, Min Chen. 2016. The Smart Health Initiative Initiative in China: The Case of Wuhan, Hubei Province.  Journal of Medi  Medi cal cal Syste m ms  s   40 40:3. . [Crossref  [Crossref ] 73. 2016. A Project-Based Case Study of Data SScience cience Education. Data Science Journal  15. . [Crossref  [Crossref ] 74. David M. SSteinberg. teinberg. 2016. Industrial statistics: The challenges challenges and the the research. Quality Engineering   28 28:1, 45-59. [Crossref  [Crossref ] 75. Kurt St Stockinger, ockinger, Thilo Stadelmann, Andreas R Ruckstuhl. uckstuhl. Data Scientist als Beruf 59-81. [Crossref  [Crossref ] 76. Christos Alexakos, Konstan Konstantinos tinos Arvanitis, Andreas Papalambrou, Papalambrou,   Thoma s Amorgianiotis, George Raptis, Nikolaos Zervos. ERMIS: Extracting Knowledge from Unstructured Big Data for Supporting Business Decision Making 611-622. [ Crossref ]

 

77. Harald Foi Foidl, dl, Michael Feld Felderer. erer. Data Science Challenges to Improve Quality Assurance Ass urance of Intern Internet et of Things Applications 707-726. [Crossref  [Crossref ] 78. Yurik Yurikoo Yan Yano, o, Yukari Shirota. SVD and T Text ext Mining Integrated Approach to Measure Effects of Disasters Dis asters on Japanese Economics 20-29. [Crossref  [Crossref ] 79. Go Muan Sang, Lai Xu, Pa Paul ul de Vrieze. A reference architecture for big data systems 370-375. [[Crossref  Crossref ] 80. Tien-Chi Huang, Chieh Hsu, Zih-Jin Ci Ciou. ou. 2015. Systemati Systematicc Methodology for Ex ccavating avating Sleeping Beauty Publications and Their Princes from Medical and Biological Engineering Studies.  Journal of Medical Medical and Biol oogical gical En gineering   35:6, 749-758. [Crossref ] 81. of Pedro Quelhas Brito, Carlos Soares, SérgioRobotics Almeida, Almeida,and AnaComputer-Integrated Monte, Michel Byvoet. Michel Byvoet. 2015.  2015. Customer segmentation in ]a large database an online customized fashion business. Manufacturing   36, 93-100. [Crossref  [Crossref   .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

82. Hamid Afshari, Qingjin Peng. 2015. Mod Modeling eling and quantifying uncertainty uncertainty in the product design phase for effects of user preference changes. Industrial Management & Data Systems  115:9, 1637-1665. [Crossref  [Crossref ] 83. Banage T.G. T.G.S. S. Kumara, Incheon Paik, Jia Zhang, T.H.A.S T.H.A.S.. Siriweera, Koswatte R.C. Koswatte. Ont Ontology-Based ology-Based Workflow  Generation for Intelligent Intelligent Big Data Analy Analytics tics 495-502. [[Crossref  Crossref ] 84. Rameshwa Rameshwarr Dubey, Angappa Gunasekaran. 2015. Education and training for successful successful career in Big Data and Business Analytics. Industrial and Commercial Training   47 47:4, 174-181. [Crossref  [Crossref ] 85. Seth C. Lewis. 2015. Journalism In An Era Of Big Data. Digital Journalism 3:3, 321-330. [Crossref  [Crossref ] 86. Giulio Aliberti, Alessandro Colant Colantonio, onio, Robert Robertoo Di Pietro, Riccardo Mariani. 2015. EXPEDITE: EXPress closED IT ITemset emset Enumeration. Expert Systems with Applications  42  42:8, 3933-3944. [Crossref  [Crossref ] 87. Li Kung, Hsiao-Fan W Wang. ang. A recommender system for the optimal combination of energy resources with cost-benefit analysis 1-10. [Crossref  [Crossref ] 88. Jussi Ronkainen, Antti Iivari. 2015. Designing Communication cation S ystems. [Crossref a] Data Management Pipeline for Pervasive Sensor Communi Procedia Computer Science  56, 183-188. [Crossref  89. Yi Shen. 2015. Strategic planning for a data-driven, data-driven, shared-access research enterprise: virginia virginia tec techh research data assessment and landscape study. Proceedings of the Association for Information Science and Technology 52:1, 1-4. [Crossref  [Crossref ] 90. Conrad Boton, Gilles Halin, Sylvain K Kubicki, ubicki, Daniel Forgu Forgues. es. Chal Challenges lenges of Big Data in the Age of Building Information Modeling: A High-Level Conceptual Pipeline 48-56. [Crossref  [ Crossref ] 91. Stefan Debort Debortoli, oli, Oliver Müller, Jan vom Brock Brocke. e. 2014. Compari Comparing ng Busin Business ess Intelligence and Big Data Skills. Business &  [Crossref ] Information Systems Engineering  6:5, 289-300. [Crossref  92. Stefan Debortoli, Oliv Oliver er Müller, Jan vom Brocke. 2014. Verglei Vergleich ch von Kompetenzanforderungen an Business Business-Intellig  -Intellig eencence- und Big-Data-Spezialisten. WIRTSCHAFTSINFORMATIK  56:5, 315-328. [Crossref  [Crossref ] 93. A.H.M. Sarowar Sattar, Ji Jiuyong uyong Li, Jixue Liu, Raymo Raymond nd Heat Heatherly, herly, Bradley Malin. 2014. A probabilistic approach to mitigate composition compositi on attacks on privac privacyy in non-coordinat non-coordinated ed environments. Knowledge-Based Systems  67, 361-372. [Crossref  [Crossref ] 94. Alberto FFernández, ernández, Sa rraa del R ío, ío, Victoria López, Abdullah Bawakid, María J. del Jesus, José M. Benítez, FFrancisco rancisco Herrera. 2014. Big Data with Cloud Computing: an insight on the computing environment, Crossref ] and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4  4environmen :5, 380-409.t, [Crossref  [MapReduce, 95. Provost FFoster. oster. 2014. ACM SIGKDD 2014 ttoo be Held August 24–27 in Manhattan. Big Data  22:2, 71-72. [Citation [Citation]] [Full [Full Text

HTML] [Full HTML] [Full Text PDF PDF]] [Full [Full Text PDF with Links Links]] 96. Prasanna T Tambe. ambe. 2014. Big Data Investmen Investment,t, Skills, and Firm Value. Management Science  60:6, 1452-1469. [Crossref  [Crossref ] 97. Jay Lee, Hung-An Kao, Shanhu Y Yang. ang. 2014. Service Innovation and and Smart Analytics for Industry 4.0 4 .0 and Big Data Environmen Environment.t. Procedia CIRP  16, 3-8. [Crossref  [Crossref ] 98. Matthew A. Wal Waller, ler, Stanley E. Fawcett. Fawcett. 2013. Data Science, Predictive Analytics, and Big Data: A Rev Revolution olution That Will Transform Supply Chain Design and Management. Journal of Business Logistics  34:2, 77-84. [Crossref  [Crossref ] 99. Zhecheng Zhu, Heng Bee Hoon, Kiok-Liang T Teow. eow. Interactive Data Visualization Techniques Techniques Applied to Healthcare Decision Making 46-59. [Crossref  [Crossref ] 100. Siddhartha Duggirala. Big Data Architecture: 315-344. [Crossref  [Crossref ] 101. Sa Saurabh urabh Brajesh Brajesh.. Big D Data ata Analytics in Retail Supply Chain 1473-1494. [Crossref ] 102. Xiuli He, Satyajit Sara Saravane, vane, Qiannong Gu. SSupply upply Chain Analytics 2364-2375. [Crossref  [ Crossref ] 103. Saurabh Brajesh. Big Data Analytics in Retail Supply Chain 269-289. [Crossref  [Crossref ]

 

104. Zhecheng Zhu, Heng Bee Hoon, Kiok-Liang T Teow. eow. Interactive Data Visualization Techniques Techniques Applied to Healthcare Decision Making 1157-1171. [Crossref  [Crossref ]

 .   y    l   n   o   e   s   u    l   a   n   o   s   r   e   p   r   o    F  .    7    1    /    0    2    /    9    0    t   a     m   o   c  .    b   u   p    t   r   e    b   e    i    l  .   e   n    i    l   n   o     m   o   r    f    4    8  .    4  .    2    8  .    4    7   y    b    d   e    d   a   o    l   n   w   o    D

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF