Please copy and paste this embed script to where you want to embed

Business Analytics Basic Statistic

© EduPristine

© EduPristine – www.edupristine.com

Agenda Introduction

Data Basic Statistics

© EduPristine

1

3. Basic Statistics I.

Probability

II.

Random variables

III. Probability distribution IV. The Central Limit Theorem V.

Sampling and statistical inference

VI. Confidence intervals VII. Hypothesis testing

© EduPristine

2

3.a. Probability Probability is a numerical way of describing how likely something is to happen. One of the fundamental methods of calculating probability is by using set theory. A set is defined as a collection of objects and each individual object is called an element of that set. • Example from number of credit cards data, the distinct number of credit cards owned form a set: # Cards = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} • Numbers present on a dice form a set: Dice = {1, 2, 3, 4, 5, 6}

The sample space (S ) is the set of all possible outcomes that might be observed for an event/experiment. If each of the elements in the sample space are equally likely, then we can define the probability of event A as: • P(A) = (# elements in A)/(# elements in sample space) • e.g. P(# Cards = 1) = (# of customers having 1 card)/(Total number of customers) = 100/1000 = 0.10 = 10% • e.g. Probability of rolling an even number on a dice Sample space (S) = {1, 2, 3, 4, 5, 6} Event (A) = {2, 3, 4} P(A) = 3/6 = 0.5 = 50%

Why is it important from analytics perspective? • What we do: analyze historical data to find pattern under assumption that past is a reflection of future. • By means of probability theory, predict the future using historical patterns. © EduPristine

3

3.a. Probability- Other topics Set operations • Union (A U B) U

• Intersection (A B)

Venn diagrams • Basic operations on Venn diagrams

Basic probability axioms P (S) = 1 P (A) >= 0 for all A S P (A U B) = P(A) + P(B) – P (A B) U

U

1. 2. 3.

Conditional probability U

• P(A|B) = P (A B)/ P(B)

Bayes theorem

© EduPristine

4

3.b. Random variables I.

Definition

II.

Types of Random Variables 1. 2.

Discrete Continuous

III. Distribution and Probability Density functions of Random Variables IV. Expected value (or Mean) of Random Variables V.

Variance of Random Variables

VI. Coefficient of skewness of Random Variables

© EduPristine

5

3.b. Random variables- Definition A random variable is a function or a rule which maps each event in a sample space to real numbers. X (w) = x Random variable w1 w2 w3 . . . Sample space S

x1 x2 x3 . . . Set of real numbers

So, if w is an element of the sample space S (i.e. w is one of the possible outcomes of the experiment concerned) and the number x is associated with this outcome, then X(w) = x . Convention: • Denote random variable by capital letter “X” • Denote the outcome or possible values by small letter “x” i.e. X(w) = x © EduPristine

6

3.b. Random variables- Definition Example: Suppose there are 8 balls in a bag. The random variable X is the weight, in kg, of a ball selected at random. Balls 1, 2 and 3 weigh 0.1kg, balls 4 and 5 weigh 0.15kg and balls 6, 7 and 8 weigh 0.2kg. Using the notation above, write down this information. Solution: X(b1) = 0.10 kg, X(b2) = 0.10 kg, X(b4) = 0.15 kg, X(b5) = 0.15 kg X(b6) = 0.2 kg, X(b7) = 0.2 kg

X(b3) = 0.1 kg, X (bi) = x Weight (Random variable)

b1 b2 b3 b4 b5 b6 B7 b8 Sample space S- Individual balls © EduPristine

0.10 0.15 0.20

Set of real numbers- Weights in kg 7

3.b. Types of Random variables There are two types of Random Variables 1. Discrete Random Variables 2. Continuous Random Variables

© EduPristine

8

3.b. Discrete Random variables Definition: The set of all possible values of the outcome (or x) takes discrete values • e.g.

Outcome of rolling a dice= {1, 2, 3, 4, 5, 6}

• Or # credit cards owned by an individual = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Probabilities: Probabilities are defined on events (subsets of the sample space S). So what is meant by “P(X = x) ”? • Suppose sample space consists of eight events {s1, s2, s3, s4, s5, s6, s7, s8} • Let the outcome for – E1 = {s1, s2, s3} be associated with number x1 – E2 = {s4, s5} be associated with number x2 – E3 = {s6, s7, s8} be associated with number x3 • P(X = x1) is meant P(E1) • P(X = x2) is meant P(E2) • P(X = x3) is meant P(E3) © EduPristine

9

3.b. Discrete Random variables Probability functions • • • •

The function fX (x) = P(X = x) for each x in the range of X is the probability function (PF) of X It specifies how the total probability of 1 is divided up amongst the possible values of X Thus, gives the probability distribution of X. Also known as “probability distribution functions” (pdf)

Following are the requirements for a function to qualify as the probability function of a discrete random variable: • fX (x) >= 0 for all x within the range of X • ∑fX (x) = 1

Cumulative distribution functions • Gives the probability that X assumes a value that does not exceed x. • Denoted as FX(x) = P(X

View more...
© EduPristine

© EduPristine – www.edupristine.com

Agenda Introduction

Data Basic Statistics

© EduPristine

1

3. Basic Statistics I.

Probability

II.

Random variables

III. Probability distribution IV. The Central Limit Theorem V.

Sampling and statistical inference

VI. Confidence intervals VII. Hypothesis testing

© EduPristine

2

3.a. Probability Probability is a numerical way of describing how likely something is to happen. One of the fundamental methods of calculating probability is by using set theory. A set is defined as a collection of objects and each individual object is called an element of that set. • Example from number of credit cards data, the distinct number of credit cards owned form a set: # Cards = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} • Numbers present on a dice form a set: Dice = {1, 2, 3, 4, 5, 6}

The sample space (S ) is the set of all possible outcomes that might be observed for an event/experiment. If each of the elements in the sample space are equally likely, then we can define the probability of event A as: • P(A) = (# elements in A)/(# elements in sample space) • e.g. P(# Cards = 1) = (# of customers having 1 card)/(Total number of customers) = 100/1000 = 0.10 = 10% • e.g. Probability of rolling an even number on a dice Sample space (S) = {1, 2, 3, 4, 5, 6} Event (A) = {2, 3, 4} P(A) = 3/6 = 0.5 = 50%

Why is it important from analytics perspective? • What we do: analyze historical data to find pattern under assumption that past is a reflection of future. • By means of probability theory, predict the future using historical patterns. © EduPristine

3

3.a. Probability- Other topics Set operations • Union (A U B) U

• Intersection (A B)

Venn diagrams • Basic operations on Venn diagrams

Basic probability axioms P (S) = 1 P (A) >= 0 for all A S P (A U B) = P(A) + P(B) – P (A B) U

U

1. 2. 3.

Conditional probability U

• P(A|B) = P (A B)/ P(B)

Bayes theorem

© EduPristine

4

3.b. Random variables I.

Definition

II.

Types of Random Variables 1. 2.

Discrete Continuous

III. Distribution and Probability Density functions of Random Variables IV. Expected value (or Mean) of Random Variables V.

Variance of Random Variables

VI. Coefficient of skewness of Random Variables

© EduPristine

5

3.b. Random variables- Definition A random variable is a function or a rule which maps each event in a sample space to real numbers. X (w) = x Random variable w1 w2 w3 . . . Sample space S

x1 x2 x3 . . . Set of real numbers

So, if w is an element of the sample space S (i.e. w is one of the possible outcomes of the experiment concerned) and the number x is associated with this outcome, then X(w) = x . Convention: • Denote random variable by capital letter “X” • Denote the outcome or possible values by small letter “x” i.e. X(w) = x © EduPristine

6

3.b. Random variables- Definition Example: Suppose there are 8 balls in a bag. The random variable X is the weight, in kg, of a ball selected at random. Balls 1, 2 and 3 weigh 0.1kg, balls 4 and 5 weigh 0.15kg and balls 6, 7 and 8 weigh 0.2kg. Using the notation above, write down this information. Solution: X(b1) = 0.10 kg, X(b2) = 0.10 kg, X(b4) = 0.15 kg, X(b5) = 0.15 kg X(b6) = 0.2 kg, X(b7) = 0.2 kg

X(b3) = 0.1 kg, X (bi) = x Weight (Random variable)

b1 b2 b3 b4 b5 b6 B7 b8 Sample space S- Individual balls © EduPristine

0.10 0.15 0.20

Set of real numbers- Weights in kg 7

3.b. Types of Random variables There are two types of Random Variables 1. Discrete Random Variables 2. Continuous Random Variables

© EduPristine

8

3.b. Discrete Random variables Definition: The set of all possible values of the outcome (or x) takes discrete values • e.g.

Outcome of rolling a dice= {1, 2, 3, 4, 5, 6}

• Or # credit cards owned by an individual = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Probabilities: Probabilities are defined on events (subsets of the sample space S). So what is meant by “P(X = x) ”? • Suppose sample space consists of eight events {s1, s2, s3, s4, s5, s6, s7, s8} • Let the outcome for – E1 = {s1, s2, s3} be associated with number x1 – E2 = {s4, s5} be associated with number x2 – E3 = {s6, s7, s8} be associated with number x3 • P(X = x1) is meant P(E1) • P(X = x2) is meant P(E2) • P(X = x3) is meant P(E3) © EduPristine

9

3.b. Discrete Random variables Probability functions • • • •

The function fX (x) = P(X = x) for each x in the range of X is the probability function (PF) of X It specifies how the total probability of 1 is divided up amongst the possible values of X Thus, gives the probability distribution of X. Also known as “probability distribution functions” (pdf)

Following are the requirements for a function to qualify as the probability function of a discrete random variable: • fX (x) >= 0 for all x within the range of X • ∑fX (x) = 1

Cumulative distribution functions • Gives the probability that X assumes a value that does not exceed x. • Denoted as FX(x) = P(X