R Studio Cheat Sheet for Math1041

April 23, 2017 | Author: Oliver | Category: N/A
Share Embed Donate


Short Description

Cheat Sheet for all the code/formulas needed for the Statistics for Life and Social Sciences computer test at UNSW....

Description

R Studio Cheat Sheet Function Code Combine, create a vector with these name = c(#, #, #, #..., #) values Session>Set Working Directory>Choose Directory Importing files Call file (display all data) Call first ten rows Call second and fourth columns General glimpse of data Table of frequencies comparing two aspects In dataset A, find column B Scatterplot of columnname’s data Labelled scatterplot Boxplot Histogram Barplot of labelled subjects Clustered Bar Chart (beside makes clustered) Mean Standard Deviation First quartile (similar percentages for others) Five number summary (+mean) Correlation Coefficient (numeric comparison) Least Squares Regression Line lm = linear model Produces Y-intercept (Intercept) and gradient (column2) Log Transformation

Compare two graphs Randomise order of subjects

Page 1

name = read.table(“name.txt”, header=T) name name[1:10,] name[,c(2,4)] head(name) tab = table(name$column1, name$column2) A$B plot(name$columnname) plot(name$columnname, xlab=”label”, ylab=”label”, main=”title” boxplot(name$columnname) hist(name$columnname) x = c(number1, number2) names(x) = c(“name1”, “name2”) barplot(x) barplot(tab, beside=T, legend=T) mean(name) sd(name) quantile(name, 0.25) summary(name) cor(name$column1, name$column2 model = lm(column1~column2, dat=name) model log.data = log10(data) (also “sqrt(data)”, “log(data)” for e, “1/data” hist(log.data) ← produces histogram par(mfrow=c(2,1)) hist(data1) hist(data2) randomnumbers = runif(6) sort(randomnumbers, index.return=T) -ORnames = c(“1”, “2”, “3”, “David”) ran = runif(6) ← random, uniform

Oliver Bogdanovski

Probability to left of point on normal distribution P(Z≤z) Point (quantile) where given the probability we can find the z-value i.e. Find c in P(Z≤c)=p given p Create a normal quantile plot × Find how many values in data greater than a particular number n Generate normally distributed numbers Generate binomially distributed numbers We can combine whole datasets (of equal n) by summing the two and dividing it all by 2 Find P(X=k) where X~B(n,p) Find P(X≤k) where X~B(n,p) Create range of values from x to y increasing by 1 To create probability histogram of binomial distribution space=0 means no gaps between lines names.arg gives x-values overlay normal curve Hypothesis tests - using z statistic (with σ) Confidence interval Find P(T≤x) in T~t(k) ← k degrees of freedom Find P(T>x) in T~t(k) Find c in P(T≤c)=p in T~t(k)

sort.ran = sort(ran, index.return=T) names[sort.ran$ix] pnorm(z) → e.g. pnorm(1.96) for standardised pnorm(z, μ, σ) ← for unstandardised qnorm(p) → e.g. qnorm (0.975) for P(Z≤c)=0.975 qnorm(p, μ, σ) ← unstandardised qqnorm(data$column) * sum(dataname>n) Also: ==, !=, ) name = x:y xbin = 0:24 ← creates set of numbers fbin = dbinom(xbin, n, p) ← n must be bigger than 24 in this case, doubled is best for normal barplot(fbin, names.arg=xbin, space=0) lines(xbin, dnorm(xbin, 12, 3)) ← note n is half of 24 set up values (mean, mu.0, sigma, n) z = (m - mu.0)/(sigma/sqrt(n)) pnorm(z) z.star=qnorm(0.975) ← for 95% confidence use rest of formula pt(x, k) ← works like pnorm, also probability to left 1-pt(x, k) qt(p, k) → e.g. to find upper 0.5%, use qt(0.995, k)

Hypothesis Testing (with t-tests) if Ha: μ > μ0 (one-sided) if Ha: μ < μ0 (one-sided) if Ha: μ ≠ μ0 (two-sided)

Page 2

t.test(sampledata, mu=*μ0*, alternative = “greater”) t.test(sampledata, mu=*μ0*, alternative = “less”) t.test(sampledata, mu=*μ0*)

Oliver Bogdanovski

if Ha: μ ≠ μ0 (two-sided) - choose 98% CI

t.test(sampledata, conf. level=0.98) no mu needed as it only affect P-value, not confidence interval if that’s all you’re interested in

if using a parameter other than mu, like p, we need to do it manually

Page 3

Oliver Bogdanovski

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF