# 15 regression basics

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
20 pages
0 downs
164 views
Share
Description
1. Regression BasicsPredicting a DV with a Single IV 2. Questionsã What are predictors andcriteria?ã Write an equation forthe linear regression.Describe each term.ã…
Transcript
• 1. Regression BasicsPredicting a DV with a Single IV
• 2. Questions• What are predictors andcriteria?• Write an equation forthe linear regression.Describe each term.• How do changes in theslope and interceptaffect (move) theregression line?• What does it mean totest the significance ofthe regression sum ofsquares? R-square?• What is R-square?• What does it mean to choosea regression line to satisfythe loss function of leastsquares?• How do we find the slopeand intercept for theregression line with a singleindependent variable?(Either formula for the slopeis acceptable.)• Why does testing for theregression sum of squaresturn out to have the sameresult as testing for R-square?
• 3. Basic Ideas• Jargon– IV = X = Predictor (pl. predictors)– DV = Y = Criterion (pl. criteria)– Regression of Y on X e.g., GPA on SAT• Linear Model = relations between IVand DV represented by straight line.• A score on Y has 2 parts – (1) linearfunction of X and (2) error.Y Xi i i= + +α β ε (population values)
• 4. Basic Ideas (2)• Sample value:• Intercept – place where X=0• Slope – change in Y if X changes 1unit. Rise over run.• If error is removed, we have a predictedvalue for each person at X (the line):Y a bX ei i i= + +′ = +Y a bXSuppose on average houses are worth about \$75.00 asquare foot. Then the equation relating price to sizewould be Y’=0+75X. The predicted price for a 2000square foot house would be \$150,000.
• 5. Linear Transformation• 1 to 1 mapping of variables via line• Permissible operations are addition andmultiplication (interval data)1086420X4035302520151050YChanging the Y InterceptY=5+2XY=10+2XY=15+2XAdd a constant1086420X3020100YChanging the SlopeY=5+.5XY=5+XY=5+2XMultiply by a constant′ = +Y a bX
• 6. Linear Transformation (2)• Centigrade to Fahrenheit• Note 1 to 1 map• Intercept?• Slope?1209060300Degrees C24020016012080400DegreesF32 degrees F, 0 degrees C212 degrees F, 100 degrees CIntercept is 32. When X (Cent) is 0, Y (Fahr) is 32.Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goesfrom 32 to 212 (rise), and 212-32 = 180. Then 180/100 =1.8 isrise over run is the slope. Y = 32+1.8X. F=32+1.8C.′ = +Y a bX
• 7. Review• What are predictors and criteria?• Write an equation for the linearregression with 1 IV. Describe eachterm.• How do changes in the slope andintercept affect (move) the regressionline?
• 8. Regression of Weight onHeightHt Wt61 10562 12063 12065 16065 12068 14569 17570 16072 18575 210N=10 N=10M=67 M=150SD=4.57 SD=33.99767472706866646260Height in Inches2402101801501209060Regression of Weight on HeightRegression of Weight on HeightRegression of Weight on HeightRiseRunY= -316.86+6.97XXCorrelation (r) = .94.Regression equation: Y’=-316.86+6.97X′ = +Y a bX
• 9. Illustration of the LinearModel. This concept is vital!727068666462Height200180160140120100WeightRegression of Weight on Height727068666462HeightRegression of Weight on HeightRegression of Weight on Height(65,120)Mean of XMean of YDeviation from XDeviation from YLinear PartError PartyYeY Xi i i= + +α β εY a bX ei i i= + +Consider Y asa deviationfrom themean.Part of that deviation can be associated with X (the linearpart) and part cannot (the error).′ = +Y a bXiii YYe −=
• 10. Predicted Values & ResidualsN Ht Wt Y Resid1 61 105 108.19 -3.192 62 120 115.16 4.843 63 120 122.13 -2.134 65 160 136.06 23.945 65 120 136.06 -16.066 68 145 156.97 -11.977 69 175 163.94 11.068 70 160 170.91 -10.919 72 185 184.84 0.1610 75 210 205.75 4.25M 67 150 150.00 0.00SD 4.57 33.99 31.85 11.89V 20.89 1155.56 1014.37 141.32727068666462Height200180160140120100WeightRegression of Weight on Height727068666462HeightRegression of Weight on HeightRegression of Weight on Height(65,120)Mean of XMean of YDeviation from XDeviation from YLinear PartError PartyYeNumbers for linear part and error.Note M of Y’and Residuals.Note variance ofY is V(Y’) +V(res).′ = +Y a bX
• 11. Finding the Regression LineNeed to know the correlation, SDs and means of X and Y.The correlation is the slope when both X and Y areexpressed as z scores. To translate to raw scores, just bringback original SDs for both.Nzzr YXXY∑=XYXYSDSDrb =To find the intercept, use: XbYa −=(rise over run)Suppose r = .50, SDX = .5, MX = 10, SDY = 2, MY = 5.25.25. ==b 15)10(25 −=−=a XY 215 +−=Slope Intercept Equation
• 12. Line of Least Squares727068666462Height200180160140120100WeightRegression of Weight on Height727068666462HeightRegression of Weight on HeightRegression of Weight on Height(65,120)Mean of XMean of YDeviation from XDeviation from YLinear PartError PartyYeWe have some points.Assume linear relationsis reasonable, so the 2vbls can be representedby a line. Whereshould the line go?Place the line so errors (residuals) are small. The line wecalculate has a sum of errors = 0. It has a sum of squarederrors that are as small as possible; the line provides thesmallest sum of squared errors or least squares.
• 13. Least Squares (2)
• 14. Review• What does it mean to choose a regression lineto satisfy the loss function of least squares?• What are predicted values and residuals?Suppose r = .25, SDX = 1, MX = 10, SDY = 2, MY = 5.What is the regression equation (line)?
• 15. Partitioning the Sum ofSquaresebXaY ++= bXaY +=eYY += YYe −=Definitions)()( YYYYYY −+−=− = y, deviation from mean∑∑ −+−=− 22)]()[()( YYYYYY Sum of squares∑ ∑∑ −+−= 222)()()( YYYYy(cross productsdrop out)Sum ofsquareddeviationsfrom themean=Sum of squaresdue toregression+Sum of squaredresidualsreg errorAnalog: SStot=SSB+SSW
• 16. Partitioning SS (2)SSY=SSReg + SSRes Total SS is regression SS plusresidual SS. Can also getproportions of each. Can getvariance by dividing SS by N if youwant. Proportion of total SS due toregression = proportion of totalvariance due to regression = R2(R-square).YsYgYYSSSSSSSSSSSS ReRe+=)1(1 22RR −+=
• 17. Partitioning SS (3)Wt (Y)M=150Y Resid(Y-Y)Resid2105 2025 108.19 -41.81 1748.076 -3.19 10.1761120 900 115.16 -34.84 1213.826 4.84 23.4256120 900 122.13 -27.87 776.7369 -2.13 4.5369160 100 136.06 -13.94 194.3236 23.94 573.1236120 900 136.06 -13.94 194.3236 -16.06 257.9236145 25 156.97 6.97 48.5809 -11.97 143.2809175 625 163.94 13.94 194.3236 11.06 122.3236160 100 170.91 20.91 437.2281 -10.91 119.0281185 1225 184.84 34.84 1213.826 0.16 0.0256210 3600 205.75 55.75 3108.063 4.25 18.0625Sum =150010400 1500.01 0.01 9129.307 -0.01 1271.907Variance 1155.56 1014.37 141.322)( YY − YY −2)( YY −
• 18. Partitioning SS (4)Total Regress ResidualSS 10400 9129.31 1271.91Variance 1155.56 1014.37 141.3212.88.11040091.12711040031.91291040010400+=⇒+= Proportion of SS12.88.156.115532.14156.115537.101456.115556.1155+=⇒+= Proportion ofVarianceR2= .88Note Y’ is linear function of X, so.XYYY rr == 94.1 =XYr012.35..88. 222 ===== EYYEYEYY rrrRr
• 19. Significance TestingTesting for the SS due to regression = testing for the variancedue to regression = testing the significance of R2. All are thesame. 0: 20 =populationRHFSS dfSS dfSS kSS N kregresregres= =− −//// ( )12 1k=number of IVs (hereit’s 1) and N is thesample size (# people).F with k and (N-k-1)df.FSS dfSS dfregres= =− −=//. /. / ( ).129129 31 1127191 10 1 157 42)1/()1(/22−−−=kNRkRF Equivalent test using R-squareinstead of SS.F =− − −=. /( . ) / ( ).88 11 88 10 1 158 67Results will be same withinrounding error.
• 20. Review• What does it mean to test thesignificance of the regression sum ofsquares? R-square?• What is R-square?• Why does testing for the regression sum ofsquares turn out to have the same result astesting for R-square?
• Related Search
Similar documents

View more...