Econometrics Cheat Sheet Stock and Watson
December 16, 2016 | Author: peathepeanut | Category: N/A
Short Description
Cheat Sheet for Econometrics from Intro to Econometric Stock and Watson...
Description
Linear Regression with 1 regressor (CHAPTER 4)
Binary Variables interpretation:
Aim: estimate the causal effect on Y of a unit change in X Slope: expected change on Y for a unit change in X E[X|Y] = b0 + b1X Method: minimize the sum of square errors or average squared difference between actual Yi and predicted Yi, min u (OLS), u = error which contains omitted factors that influence Y that is not captured in X and also error in measurement in Y b0 and b1 are population parameter, the hats are the estimates, we pick the hats so that u is minimized Interpretation: one more unit change in x on average have beta1 effect on y Measure of fit: 1) R^2 – measure the fraction of variance of Y that is explained by X, between [0,1] = sum ESS/sum TSS = (yhat – ybar_hat)/(yi – y bar)2) SER – measure the magnitude of a typical regression residual in the unit change of Y, measures the spread of the dis of u 3) RMSE is the same as SER but 1/n and not n-2 Assumption on Sampling:
Multiple Regressor: Hypothesis testing and CI (CHAPTER 7) IF var(u|X=x) is constant – that is if the conditional dist of u given X doesn’t depend on X then u is said to be homo ow hetero IF 3 assumption + homoskedastic u imply beta 1 has smallest variance among all linear estimator (Gauss Markov Thm)
Linear Multiple Regression with multiple regressor (CHA 6)
sampling dist of beta 1 is normal when we have n large and thus estimators -> pop parameter in the limits that is it is consistent The larger the var of X, the smaller the var of beta hat
We use F-test for joint hypotheses. Note t up F up F approaches chi^2 of q as n approach infinity, q =# of betas
when the t1 and t2 are indep then 0.5(t1^2+t2^2)
OVB: The error u arises because of factors, or variables, that influence Y but are not included in the regression function. There are always omitted variables.
p-value = tail probability of the chi^2_q/q distribution beyond the Fstatistic actually computed.
2 conditions for OVB where Z is the omitted varaible
Restricted vs unrestricted regression: compare R^2 under H0 or H1 – that is under H1 b1 = b2= 0
1) 2)
Z is a determinant of Y Z is correlated with the regressor X corr(Z,X) != 0
Assuming homo vs hetero, more likely to reject H0. via example
Direction of Bias = direction of corr(X,u) For small n use F dist as it is more conservative than chi^2 in rejecting the null
3) E(X^4) < infinity or outliers are rare, OLS can be sensitive with outliers beta 1 = Sxy/Sx^2 is the object of interest (the causal effect of X on Y)
Idea here is all the same from chapter 5 but the t test is WRONG
IF u is homo and u is dist N(0,sigma^2) then beta0 and beta1 ~ N under all n and t stat ~t with n-2 degree of freedom
1) E(u|Xi = xi) = 0 implies beta1hat is unbiased, conditional dist of u given X has mean 0 RESULT: E(beta hat) = beta, var(betahat)~1/n 2) (Xi,Yi) are iid – true if sample is random samples, problematic when we have panel data
are multiple categories and every observation falls in one and only one category (Freshmen, Sophomores, Juniors, Seniors, Other). If you include all these dummy variables and a constant, you will have perfect multicollinearity – this is sometimes called the dummy variable.SOLN: omit one group or b0
then the OLS estimator in the regression omitting that variable is biased and inconsistent.thus beta hat will not approach beta1 nlarge Causal effect: is defined to be the effect measured in an ideal randomized controlled experiment. beta 1c = E[Y|X=x*] - E E[Y|X=x* n] where n is decided by us randomly
Testing b1 = b2, there are 2 methods 1) re-arrange regressor so that the restriction become a restriction on a single coefficient in an equivalent regression 2) test on stata
For beta hat to approach beta forecast need (2,3), for causal also 1 1 Regressor: Hypothesis testing and CI (CHAPTER 5) sampling dist of beta 1 when n is large
Interpretation of beta: unit change in Y for unit change in X1 holding all other Xi constan beta0 = predicted value of Y when all Xi are = 0
Objective: Test various hypotheses H0: beta1 = 0, 1 or 2 side
Adjusted R^2: 1 – ((n-1)/(n-k-1))SRR/TSS – penalize addition regressors but converges when n is large
Method: t-stat, compute p-value and reject or accept
Assumptions on sampling: 1,2,3
Reject at 5% sig level if |t| > 1..96
4) there is no perfect multicollinearity – that is when one of the reg is an exact linear function of the other Dummy Var trap: Suppose you have a set of multiple binary (dummy) variables, which are mutually exclusive and exhaustive – that is, there
Confidence Sets based on F-stat: ellipse for 2 variables Control Variables: A control variable W is a variable that is correlated with, and controls for, an omitted causal factor in the regression of Y on X, but which itself does not necessarily have acausal effect on Y. - An effective control variable is one which, when included in the regression, makes the error term uncorrelated with the variable of interest. Shown with pipeline E(u|w) = E[E[u|X,W]|W] High R^2 shows predictive power not causal effect
Nonlinear Regression Function (Chapter 8) If a relation between Y and X is nonlinear: (1) The effect on Y of a change in X depends on the value of X –that is, the marginal effect of X is not constant (2) A linear regression is mis-specified: the functional form is wrong (3) The estimator of the effect on Y of X is biased: in general it isn’t even right on average.(4) The solution is to estimate a regression function that is nonlinear in X
Case where indiviudual was accepted by together is rejected
Polynomial Case:
Regression with Panel data (Chapter 10)
Regression itself can still show predicted change just calculate the change in X normally [KEY] To interpret the estimated regression function: (1) plot predicted values as a function of x (2) compute predicted ΔY/ΔX at different values of x
Notations: i = entity, t = time
Time fixed effect
Panel Data:
State –fixed – some unobserved effect for CA over time is constant Time-fixed – some unobserved effect for 1999 over all states is constant
1) control factors that vary across entity/state, but not time
Assumptions on sampling: 1,2,3,4
2) control factor that vary across time but not entity 3) control the unobserved or unmeasured also lets us eliminate omitted varialbe bias that is constant over time with a given state, due to the regression is in moving time – e.g. Any change in the fatality rate from 1982 to 1988 cannot be caused by Zi, because Zi (by assumption) does not change between 1982 and 1988.Wh We look at the difference equation if
last case is just elasticity, note cant compare R^2 over diff cases Non-Linear Least Square are used for when the parameter beta is non-linear in the regression equaiton – use stata Binary-Binary
(1) The new error term, (ui1988 – ui1982), is uncorrelated with either BeerTaxi1988 or BeerTaxi1982.(2) This “difference” equation can be estimated by OLS, even though Zi isn’t observed.(3) The omitted variable Zi doesn’t change, so it cannot be a determinant of the change in Y (4) This differences regression doesn’t have an intercept – it was eliminated by the subtraction step “n-1 binary regressor”
Continuous and Binary
Entity-demeaned
Cont-Cont
1) Uit has mean zero, given the entity fixed effect and the entire history of the X’s for that entity 2) This is satisfied if entities are randomly sampled from their population by simple random sampling. This does not require observations to be i.i.d. over time for the same entity – that would be unrealistic. Whether a state has a high beer tax this year is a good predictor of (correlated with) whether it will have a high beer tax next year. Similarly, the error term for an entity in one year is plausibly correlated with its value in the year, that is, corr(uit, uit+1) is often plausibly nonzero. Auto corrolation: corrolation over time – brings out the need for cluster!
View more...
Comments