Data Analytics

September 5, 2022 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download Data Analytics...

Description

 

MCA SEMESTER  – IV Subject Name: Data Analytics with R Subject Code: 3640005

UNIT – I Introduction to Data Analysis

Overview of Data Analytics (DA) Analysis of data, also known as data analytics, is a process of inspecting , cleansing,  transforming, and   and  modeling data with the goal of discovering useful information, suggesting conclusions, and  supporting decision-making. Data analytics technologies and techniques are widely used in commercial industries to enable organizations to make more-informed business decisions and by scientists and researchers to verify or disprove scientific models, theories and hypotheses. Data analytics is the science of extracting patterns, trends, and actionable information from large sets of data. As a term, data analytics predominantly refers to an assortment of applications, from basic business intelligence ( BI), reporting and online analytical processing ( OLAP) to various forms of advanced analytics. Business Intelligence (BI)  is a broad category of computer software solutions that enables a company or organization to gain insight into its critical operations through reporting applications and analysis tools. OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for

complex calculations, trend analysis, and sophisticated data modeling. Advanced Analytics  is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations.

Data analytics initiatives can help businesses increase revenues, improve operational efficiency, optimize marketing campaigns and customer service efforts, respond more quickly to emerging market trends and gain a competitive edge over rivals -- all with the ultimate goal of boosting business performance. Depending on the particular application, the data that's analyzed can consist of either historical records or new information that has been

 

processed for real-time analytics uses. In addition, it can come from a mix of internal systems and external data sources.

Why is big data analytics important? (Need of Data Analytics)

There are four types of big data BI that really aid business: 1.  Prescriptive  – This type of analysis reveals what actions should be taken. This is the most valuable kind of analysis and usually results in rules and recommendations for next steps. 2.  Predictive  –  An analysis of likely scenarios of what might happen. The deliverables are usually a predictive forecast. 3.  Diagnostic  –  A look at past performance to determine what happened and why. The result of the analysis is often an analytic dashboard. 4.  Descriptive  – What is happening now based on incoming data. To mine the analytics, you typically use a real-time dashboard and/or email reports.

 

 

Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. 1.  Cost reduction.  Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data  –  plus they can identify more efficient ways of doing business. 2.  Faster, better decision making.  With the speed of Hadoop and inmemory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately  –  and make decisions based on what they’ve learned.  

3.  New products and services.  With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.  

 

 Classification of Data Structured Data

Structured data concerns all data which can be stored in database SQL in table with rows and columns. They have relational key and can be easily mapped into pre-designed fields. Today, those data are the most processed in development and the simplest way to manage information. But structured data represent only 5 to 10% of all informatics data.

Semi structured data

Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database (it could be very hard for some kind of semi structured data), but the semi structure exist to ease space, clarity or compute. Examples of semi-structured: CSV, XML and JSON documents are semi structured documents, NoSQL databases are considered as semi structured. But as Structured data, semi structured data represents a few parts of data (5 to 10%).

 

Unstructured data

Unstructured data represent around 80% of data. It often includes text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, WebPages and many other kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered « unstructured » because the data they contain doesn’t fit neatly in a database.   Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data. Just as with structured data, unstructured data is either machine generated or human generated. Here are some examples of machine-generated unstructured data: Satellite images:  This includes weather data or the data that the government captures in its satellite surveillance imagery. Just think about Google Earth, and you get the picture.   Scientific data:  This includes seismic imagery, atmospheric data, and high energy physics.   Photographs and video:  This includes security, surveillance, and traffic video.   Radar or sonar data:  This includes vehicular, meteorological, and oceanographic seismic profiles.  









The following list shows a few examples of human-generated unstructured data:  



Text internal to your company: Think of all the text within documents,

logs, survey results, and e-mails. Enterprise information actually represents a large percent of the text information in the world today.   Social media data:  This data is generated from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr.   Mobile data:  This includes data such as text messages and location information.   Website content:  This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.







And the list goes on. The unstructured data growing quickiest than the other, and their exploitation could help in business decision.

 

A group called the Organization for the Advancement of Structured Information Standards (OASIS) has published the Unstructured Information Management Architecture (UIMA) ( UIMA) standard. The UIMA « defines platformindependent data representations and interfaces for software components or services called analytics, which analyze unstructured information and assign semantics to regions of that unstructured information. » Many industry watchers say that Hadoop has become the de facto industry standard for managing Big Data.

Characteristics of Data

There is lot of buzz around data these days. Businesses, big and small, have started relying on data analytics for critical business decisions. However, it is observed that not all businesses are able to leverage the benefits of data analytics in the same ratio. Let us try to understand the reason behind this. There are five data characteristics that are the building blocks of an efficient data analytics solution: accuracy, completeness, consistency, uniqueness, and

 

timeliness. Understanding each of these will help us in understanding why different businesses are not able to leverage the benefits of data analytics in the same ratio. Accuracy 

When they are insights extracted from a well-developed and well-tested data analytics solution, we are assuming that the data is reliable and accurate. However, flaws in data collection, data storage, or data retrieving will result in unreliable data and this will reduce the accuracy of the insights extracted by a data analytics solution. Completeness 

The insights or information extracted by a data analytics solution depends a great deal on the completeness of the data. Partial data or a dataset with lot of missing values represents an incomplete picture. Thus, the degree of completeness of a data determines the accuracy of a data analytics solution. Consistency 

The consistency within a dataset is another important factor that determines the degree of accuracy of a data analytics solution. A consistent dataset is less prone to errors and results in better accuracy of a data analytics solution. Uniqueness 

One of the essential components of any business is high quality data. This data, if used properly, can make a company competitive or can keep a company competitive. Thus, the degree of uniqueness of data explains the efficiency of a data analytics solution. In order to add value to any business, the data should be unique and distinctive. Timeliness 

A data analytics solution that uses out-dated data can restrict a company from achieving their goals or from surviving in a competitive arena. New and current data is more valuable to a business than old out-dated data. Though old data should not be completely over-looked by a data analytics solution, but emphasis should be placed on the current data.

 

 Applications ons of Data Data Analytics/ Analytics/ Uses of Data Data Science Science  Applicati Using data science, companies have become intelligent enough to push & sell products as per customers purchasing power & i nterest. Here’s how they are ruling our hearts and minds: Internet Search When we speak of search, we think ‘Google’. Right? But there are many other search engines like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of data everyday. Had there been no data science, Google wouldn’t have been the ‘Google’ we know today.  

 

Digital Advertisements (Targeted Advertising and re-targeting)

If you thought Search would have been the biggest application of data science and machine learning, here is a challenger  –  the entire digital marketing spectrum. Starting from the display banners on various websites to the digital bill boards at the airports  –  almost all of them are decided by using data science algorithms. This is the reason why digital ads have been able to get a lot higher CTR than traditional advertisements. They can be targeted based on user’s past behaviour. This is the reason why I see ads of analytics trainings while my friend sees ad of apparels in the same place at the same time. Recommenderr Systems Recommende

Who can forget the suggestions about similar products on Amazon? They not only help you find relevant products from billions of products available with them, but also adds a lot to the user experience. A lot of companies have fervidly used this engine / system to promote their products / suggestions in accordance with user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play, Netflix, Linkedin, imdb and many more uses this system to improve user experience. The recommendations are made based on previous search results for a user.

 

 

Image Recognition

You upload your image with friends on Facebook and you start getting suggestions to tag your friends. This automatic tag suggestion feature uses face recognition algorithm. Similarly, while using whatsapp web, you scan a barcode in your web browser using your mobile phone. In addition, Google provides you the option to search for images by uploading them. It uses image recognition and provides related search results. To know more about image recognition, check out this amazing (1:31) mins video: https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/   https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/

 

Speech Recognition

Some of the best example of speech recognition products are Google Voice, Siri, Cortana etc. Using speech recognition feature, even if you aren’t in a position to type a message, your life wouldn’t stop. Simply speak out the

message and it will be converted to text. However, at times, you would realize, speech recognition doesn’t perform accurately. Just for laugh, check out this

hilarious video(1:30 mins) and the conversation between Cortana & Satya Nadela (CEO, Microsoft). https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/ https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/  

Gaming

EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the next level using data science. Games are now designed using machine learning algorithms which improve / upgrade themselves as the player moves up to a higher level. In motion gaming also, your opponent (computer) analyzes your previous moves and accordingly shapes up its game.

 

Price Comparison Websites

At a basic level, these websites are being driven by lots and lots of data which is fetched using APIs and RSS Feeds. If you have ever used these websites, you would know, the convenience of comparing the price of a product from multiple vendors at one place. PriceGrabber, PriceRunner, Junglee, Shopzilla, DealTime are some examples of price comparison websites. Now a days, price comparison website can be found in almost every domain such as technology, hospitality, automobiles, durables, apparels etc. Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except a few airline service providers, companies are struggling to maintain their occupancy ratio and operating profits. With high rise in air fuel prices and need to offer heavy discounts to customers has further made the situation worse. It wasn’t for long when airlines companies started using data science to identify the strategic areas of improvements. Now using data science, the airline companies can: 1.  Predict flight delay 2.  Decide which class of airplanes to buy 3.  Whether to directly land at the destination, or take a halt in between (For example: A flight can have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any country.) 4.  Effectively drive customer loyalty programs 5.  Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data science to bring changes in their way of working. 6.  Fraud and Risk Detection

 

One of the first applications of data science originated from Finance discipline. Companies were fed up of bad debts and losses every year. However, they had a lot of data which use to get collected during the initial paper work while sanctioning loans. They decided to bring in data science practices in order to rescue them out of losses. Over the years, banking companies learned to divide and conquer data via customer profiling, past expenditures and other essential variables to analyze the probabilities of risk and default. Moreover, it also helped them to push their banking product s based on customer’s purchasing power.

Delivery logistics

Who says data science has limited applications? Logistic companies like DHL, FedEx, UPS, Kuhne+Nagel have used data science to improve their operational efficiency. Using data science, these companies have discovered the best routes to ship, the best suited time to deliver, the best mode of transport to choose thus leading to cost efficiency, and many more to mention. Further more, the data that these companies generate using the GPS installed, provides them a lots of possibilities to explore using data science.

 

Miscellaneous

Apart from the applications mentioned above, data science is also used in Marketing, Finance, Human Resources, Health Care, Government Policies and every possible industry where data gets generated. Using data science, the marketing departments of companies decide which products are best for Up selling and cross selling, based on the behavioral data from customers. In addition, predicting the wallet share of a customer, which customer is likely to churn, which customer should be pitched for high value product and many other questions can be easily answered by data science. Finance (Credit Risk, Fraud), Human Resources (which employees are most likely to leave, employees performance, decide employees bonus) and many other tasks are easily accomplished using data science in these disciplines.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF