Announcement

Hi everyone,
I want to run a regression using weights in stata. I already know which command to use : reg y v1 v2 v3 [pweight= weights]. But I would like to find out how stata exactly works with the weights and how stata weights the individual observations.
In the stata-syntax-file I have read the attached concept.

I tried to do the regression manually in stata by first weight all variables of observation i with sqrt(wi) and then perform a multiple linear regression. However, I don't get the same results as when I do a regression by using the option [pweight = weights].
Does anyone know why the calculation is false and how stata considers the weights in the observations?

Thank you for your help!

Attached Files Last edited by Yolanda Schmidt; 20 Jul 2020, 04:35 . Tags: None Joro Kolev 20 Jul 2020, 04:46

You get different results because pweights and aweights are different. In the picture that you post you see it is aweight, and in your post you speak of pweight.

Here is an explanation of what is going on

Comment

Post Cancel Joro Kolev 20 Jul 2020, 04:50

Here is another tutorial on weights in Stata. In short it is a bit of a headache to figure out 1. what you need to do (how to weight) 2. what Stata does 3. It all depends on the estimator.

Comment

Post Cancel Andrew Musau 20 Jul 2020, 05:00

Here is an example showing the equivalence in #1:

sysuse auto keep in 1/10 keep mpg weight gen w=_n^2 l, sep(10) regress mpg weight [aweight=w] gen wcons= sqrt(w) gen wmpg= sqrt(w)*mpg gen wweight= sqrt(w)*weight regress wmpg wweight wcons, nocons
. l, sep(10) +--------------------+ | mpg weight w | |--------------------| 1. | 22 2,930 1 | 2. | 17 3,350 4 | 3. | 22 2,640 9 | 4. | 20 3,250 16 | 5. | 15 4,080 25 | 6. | 18 3,670 36 | 7. | 26 2,230 49 | 8. | 20 3,280 64 | 9. | 16 3,880 81 | 10. | 19 3,400 100 | +--------------------+ . regress mpg weight [aweight=w] (sum of wgt is 385) Source | SS df MS Number of obs = 10 -------------+---------------------------------- F(1, 8) = 477.32 Model | 95.5590801 1 95.5590801 Prob > F = 0.0000 Residual | 1.60158778 8 .200198473 R-squared = 0.9835 -------------+---------------------------------- Adj R-squared = 0.9815 Total | 97.1606679 9 10.7956298 Root MSE = .44744 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.00588 .0002691 -21.85 0.000 -.0065007 -.0052594 _cons | 39.02116 .9195017 42.44 0.000 36.90078 41.14153 ------------------------------------------------------------------------------ . regress wmpg wweight wcons, nocons Source | SS df MS Number of obs = 10 -------------+---------------------------------- F(2, 8) = 9418.14 Model | 145183.339 2 72591.6694 Prob > F = 0.0000 Residual | 61.6611296 8 7.7076412 R-squared = 0.9996 -------------+---------------------------------- Adj R-squared = 0.9995 Total | 145245 10 14524.5 Root MSE = 2.7763 ------------------------------------------------------------------------------ wmpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- wweight | -.00588 .0002691 -21.85 0.000 -.0065007 -.0052594 wcons | 39.02116 .9195017 42.44 0.000 36.90078 41.14153 ------------------------------------------------------------------------------

Comment

Post Cancel Yolanda Schmidt 20 Jul 2020, 05:44 Originally posted by Joro Kolev View Post

You get different results because pweights and aweights are different. In the picture that you post you see it is aweight, and in your post you speak of pweight.

Here is an explanation of what is going on

Thanks for the document, its very helpful.

But in the Instruction they mention:
Note that point estimates are the same than the one obtained using aweight

I also get the same results when i use pweights and aweights.

Comment

Post Cancel Yolanda Schmidt 20 Jul 2020, 05:45 Originally posted by Andrew Musau View Post

Here is an example showing the equivalence in #1:

sysuse auto keep in 1/10 keep mpg weight gen w=_n^2 l, sep(10) regress mpg weight [aweight=w] gen wcons= sqrt(w) gen wmpg= sqrt(w)*mpg gen wweight= sqrt(w)*weight regress wmpg wweight wcons, nocons
. l, sep(10) +--------------------+ | mpg weight w | |--------------------| 1. | 22 2,930 1 | 2. | 17 3,350 4 | 3. | 22 2,640 9 | 4. | 20 3,250 16 | 5. | 15 4,080 25 | 6. | 18 3,670 36 | 7. | 26 2,230 49 | 8. | 20 3,280 64 | 9. | 16 3,880 81 | 10. | 19 3,400 100 | +--------------------+ . regress mpg weight [aweight=w] (sum of wgt is 385) Source | SS df MS Number of obs = 10 -------------+---------------------------------- F(1, 8) = 477.32 Model | 95.5590801 1 95.5590801 Prob > F = 0.0000 Residual | 1.60158778 8 .200198473 R-squared = 0.9835 -------------+---------------------------------- Adj R-squared = 0.9815 Total | 97.1606679 9 10.7956298 Root MSE = .44744 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.00588 .0002691 -21.85 0.000 -.0065007 -.0052594 _cons | 39.02116 .9195017 42.44 0.000 36.90078 41.14153 ------------------------------------------------------------------------------ . regress wmpg wweight wcons, nocons Source | SS df MS Number of obs = 10 -------------+---------------------------------- F(2, 8) = 9418.14 Model | 145183.339 2 72591.6694 Prob > F = 0.0000 Residual | 61.6611296 8 7.7076412 R-squared = 0.9996 -------------+---------------------------------- Adj R-squared = 0.9995 Total | 145245 10 14524.5 Root MSE = 2.7763 ------------------------------------------------------------------------------ wmpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- wweight | -.00588 .0002691 -21.85 0.000 -.0065007 -.0052594 wcons | 39.02116 .9195017 42.44 0.000 36.90078 41.14153 ------------------------------------------------------------------------------