Analytics in Nutshell

January 14, 2018 | Author: Saurabh Kumar | Category: Predictive Analytics, Analytics, Statistical Analysis, Data Analysis, Statistics
Share Embed Donate


Short Description

An overview of analytics process...

Description

Analytics can be basically divided into 3 domains. 1. Descriptive Analytics 2. Predictive Analytics and 3. Prescriptive Analytics. A. Descriptive Analytics: It is the first step which consists of gathering and initial checking of data from experiment. Experiment can be of many kind including literal and perceived meanings. Eg. Promotional campaign, Website visit data etc. Data is collected from here and checked for below points. 1. Detection of mistakes during data entry. 2. Checking for assumptions and constrains. 3. Pattern recognition like correlation, linear regression, auto regressive nature etc. 4. Determining relationship between exploratory variables. 5. A future rough direction towards how the future perspectives will be designed. Steps involve are arranging unstructured data into structured data. Unstructured data are one which are not properly arranged in excel format in form of well-defined tables. It must be arranged in tables in such a way that any mismatch can be found easily by visualization. Plus, type of data i.e. categorical, ordinal, binary etc. is taken care of. Other steps consist of: 1. Finding central tendency: Which include type of distribution. Finding potential outliers. The most common measure of central tendency is mean. For skewed distribution and outliers (which cannot be avoided) median is preferred. Modes are used for grouping purpose (examples are available on net). 2. Skewness and Kurtosis 3. Histogram: Challenge is finding the bin size. For higher bin size a data may be normally distributed but same data with smaller bin size may not be normally distributed. This is carefully selected. Can be done using clustering to get similar data under one bin. 4. QQ Plot: Used to check and interpret normal distribution. 5. Correlation and covariance: Cor (X,Y) = Cov (X,Y)/SDx.SDy, Covariance is difficult to interpret. Correlation is more robust and only varies between -1 and 1. 6. Box Plot: Highly handy tool for data exploration especially for non-normally distributed data. Further exploratory data analysis like t test, ANOVA etc. can be used for a detailed initial report. B. Predictive Analytics: Consists of: 1. Regression (Linear/Multiple): Can be used for forecasting, election prediction. 2. Logistic Regression: Used for categorical data. Highly use full in medical/insurance industry, Loan default, Cricket, basketball etc. 3. CART/Forest: Can be used for both categorical and normal data. Useful as it helps in formation of rules and future data entry can be done group which are created by rules. Eg: Medical field, D2Hawkeye (a medical analytics company uses such model), Sensex data, vote prediction etc. 4. Text Analytics: Used for sentiment analysis, understanding of trends etc. Are clubbed with specialized libraries in R and can be used for twitter analytics, Facebook analytics (Personally able to extract Facebook IDs(who commented maximum on page posts) from page of The Hindu, can be used to create focused groups (connect with marketing terminologies) and some kind of loyalty programs (Connect with IMC) helping us in further understanding of what customer wants-connect with consumer behaviour), further google trend can be connected using R which gives scaled search interest for a particular or a group of words on google, can be used for stock price prediction using news as news affects the sentiment of share owners for very short time. 5. Clustering: Can be of hierarchical or K mean type. Use for market segmentation, used by IMDB, Netflix to suggest movies. 6. Time Series: Used in the field of finance and economics especially stock price prediction etc. It captures auto regression i.e. how previous data will affect any new data. It is possible the effect of historic data is of exponential nature. Similarly, how errors effect future data must be studied too. Examples are moving

average, weighted average, exponential smoothening, seasonality corrected exponential smoothening, ARMA model, ARIMA model etc. R supports most of the above-mentioned models under “forecast” package. 7. Affinity Analysis: Used by Amazon for product suggestions. 8. Conjoint Analysis: …… one use is to find willingness to pay …... not much info available with me. 9. Deep Learning: Basically, consists of advance neural network consisting of very complicated interactions between various predictors and how errors are used as input in form of feedback. Just like how brain works. Used for Image Processing, Voice recognition etc. They are self-learning models which improves over the time. Such models is what AI (Artificial Intelligence) stands for. C. Prescriptive Analytics: Basically, consists of two types of models. As per the literal meaning it provides prescription/medicine for higher level business problem. The one management consulting firms solves “Increase profit by 2% etc.”. Based on above two types of analytics prescription can be designed. It is an iterative process and in case of failure entire circle must be revisited i.e. Descriptive  Predictive  Prescriptive 1. Optimization: Using iteration check all the possible combination to give best result. Excel solver can be used for this. Many sophisticated software’s are available for this. R too have advanced packages for this. 2. Simulation Model: A model is built and using various input dry runs are made to find the best input. 3. Artificial Neural Network Model: …. Yet to explorer …. Decision variables: Are gathered through the above predictive modeling techniques. Eg. Education data, to minimize teacher recruitment what are the factors which affect…...we can somehow connect this. Constrains: gathered using PESTAL, organizational constrains, physical constraints. The idea is to come up with best combination (just like doctor’s medicine) which will cure the disease (objective function). This is the end objective of any Analytics. Any type of data mining ultimately are used for prescriptive analytics.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF