Homework #3 - An unknown data set

This assignment is somewhat similar in spirit to the "unknowns" you may have had to solve in chemistry class. Below is a data set consisting of four variables: dependent variable Y, and candidate independent variables X1, X2 and X3. I don't have "names" for the variables because I simply made these data up.

Find the best model you can for modeling the dependent variable Y. There is one model which is (hopefully) better than any other. You may need to transform variables and/or remove selected cases to find this model. Then again, you may not. There is no guarantee that the optimal model uses all three variables. It may use only two, or even just one.

NOTE: the goal is model specification, not prediction at any cost. But, if you've time, think about the model you would construct if prediction were the overriding goal... In what ways would it be different?


options linesize=72;
* there should be 40 cases in the data set below;
data unknown;
  input Y X1 X2 X3;
  casenum = _N_; * this constructs a variable for tracking the case number;
  * do any needed transformations, case deletions here;
 cards;
41.476532	-0.980000	-0.177000	0.517000
15.012185	-0.305000	-0.966000	-0.636000
34.095757	1.117000	-1.148000	0.473000
79.121658	0.011000	1.220000	-1.513000
77.338843	-1.506000	0.566000	-0.112000
63.534287	1.134000	0.175000	0.119000
44.031970	0.269000	0.207000	0.602000
45.967824	-0.381000	0.641000	-0.070000
46.853216	-0.470000	0.348000	1.625000
43.713493	-1.066000	-0.028000	-0.252000
67.552964	-1.248000	0.541000	0.010000
51.097503	1.399000	0.267000	-0.195000
134.468353	0.777000	1.838000	0.362000
25.126183	-0.749000	-1.745000	0.415000
40.356803	0.309000	0.002000	-0.344000
34.808482	-0.757000	-0.151000	-0.883000
30.278860	0.498000	-0.674000	-0.256000
55.885217	1.307000	0.076000	0.947000
47.026038	0.513000	0.676000	0.640000
168.355813	2.035000	1.642000	0.343000
45.198592	-0.480000	0.616000	-2.639000
67.589368	1.012000	0.886000	-2.495000
41.007865	1.169000	-0.853000	0.804000
88.158704	1.291000	1.208000	-1.034000
28.423509	0.670000	-0.834000	-0.120000
20.260507	0.308000	-0.525000	-1.530000
32.261649	0.649000	-0.064000	-0.402000
31.717682	0.055000	-0.078000	-1.797000
28.475147	1.094000	-0.660000	-0.872000
26.104881	0.622000	-1.387000	0.180000
153.581928	1.293000	1.787000	0.373000
29.931081	-0.196000	-0.513000	-2.054000
31.762139	-0.459000	-0.514000	-0.997000
39.063836	1.079000	-1.331000	-2.004000
82.418881	0.634000	1.118000	0.015000
22.347592	0.180000	-0.465000	-0.600000
5.898834	0.094000	-1.108000	-0.832000
125.763731	2.632000	0.337000	1.363000
33.162800	0.177000	0.175000	0.419000
53.716938	-0.366000	0.776000	-1.117000
;