What is SAS?
SAS stands for statistical analysis
system. SAS is a collection of modules that are used
to process and analyze data. SAS
is also an extremely powerful, general-purpose programming language.
is also an extremely powerful, general-purpose programming language.
Here we are going to see
examples from base SAS programming which are particularly solved problems from
the ‘SAS by example by Ron Cody’.
Getting data
file into SAS
Here is an example of
getting your data from text file to SAS dataset. Here default delimiter is a
blank space (‘ ‘).
The program:
libname myprac1 '/folders/myfolders/';
data myprac1.veg;
infile "/folders/myfolders/veggies.txt";
input Name $ Code $ Days Number Price;
CostPerSeed = Price / Number;
run;
proc print data=myprac1.veg;
run;
data myprac1.veg;
infile "/folders/myfolders/veggies.txt";
input Name $ Code $ Days Number Price;
CostPerSeed = Price / Number;
run;
proc print data=myprac1.veg;
run;
Output:
Obs
|
Name
|
Code
|
Days
|
Number
|
Price
|
CostPerSeed
|
1
|
Cucumber
|
50104-A
|
55
|
30
|
195
|
6.50000
|
2
|
Cucumber
|
51789-A
|
56
|
30
|
215
|
7.16667
|
3
|
Carrot
|
50179-A
|
68
|
1500
|
355
|
0.23667
|
4
|
Carrot
|
50872-A
|
65
|
1500
|
325
|
0.21667
|
5
|
Beetroot
|
57224-A
|
75
|
200
|
375
|
1.87500
|
6
|
Beetroot
|
62471-A
|
80
|
200
|
395
|
1.97500
|
7
|
Brinjal
|
57828-A
|
66
|
200
|
295
|
1.47500
|
8
|
Brinjal
|
52233-A
|
70
|
30
|
225
|
7.50000
|
Above code starts from
defining private library name with statement ‘libname’. Default library for SAS
datasets are ‘Work’ library. Proc step is used to perform various process on
your dataset. Here, we are printing dataset.
Conditional processing
SAS
evaluates the expression in an IF-THEN statement to produce a result that is
either non-zero, zero, or missing. A non-zero and nonmissing result causes the
expression to be true; a result of zero or missing causes the expression to be
false.
If the conditions that are specified in the IF clause
are met, the IF-THEN statement executes a SAS statement for observations that
are read from a SAS data set, for records in an external file, or for computed
values. An optional ELSE statement gives an alternative action if the THEN
clause is not executed. The ELSE statement, if used, must immediately follow
the IF-THEN statement.
Below
are some problems from Chapter 7 of Cody`s book given with their solution code
and output.
NOTE: Problem statements are given in each code snippet in first comment section. Various comments have been incorporated in order to improve the readability. Learning explanations are given below every code snippet.
Code: ch7_1.sas
Please note that problem statements are given in code snippet in comment section as below.
/* Problem 1 Run the program here to create a temporary SAS data set called School: data school; input Age Quiz : $1. Midterm Final; /* Add you statements here datalines; 12 A 92 95 12 B 88 88 13 C 78 75 13 A 92 93 12 F 55 62 13 B 88 82 ;*//* Using IF and ELSE IF statements, compute two new variables as follows: Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13. The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75, D = 70, and F = 65. Using this information, compute a course grade (Course) as a weighted average of the Quiz (20%), Midterm (30%) and Final (50%).;*/ libname a15022 '/folders/myfolders/my_prac'; data a15022.school; input Age Quiz : $1. Midterm Final; /* Add you statements here */ if Age eq 12 then Grade=6; else if Age eq 13 then Grade=8; if Quiz eq 'A' then Course_Grade=(95*20+Midterm*30+Final*50)/100; if Quiz eq 'B' then Course_Grade=(85*20+Midterm*30+Final*50)/100; if Quiz eq 'C' then Course_Grade=(75*20+Midterm*30+Final*50)/100; if Quiz eq 'F' then Course_Grade=(65*20+Midterm*30+Final*50)/100; datalines; 12 A 92 95 12 B 88 88 13 C 78 75 13 A 92 93 12 F 55 62 13 B 88 82 ; proc print data=a15022.school; run;
Learning: Use
of conditional statements to assign grades based on scores.
Output:
Obs
|
Age
|
Quiz
|
Midterm
|
Final
|
Grade
|
Course_Grade
|
1
|
12
|
A
|
92
|
95
|
6
|
94.1
|
2
|
12
|
B
|
88
|
88
|
6
|
87.4
|
3
|
13
|
C
|
78
|
75
|
8
|
75.9
|
4
|
13
|
A
|
92
|
93
|
8
|
93.1
|
5
|
12
|
F
|
55
|
62
|
6
|
60.5
|
6
|
13
|
B
|
88
|
82
|
8
|
84.4
|
Code:
ch7_2.sas
/* 2. Using the SAS data set Hosp, use PROC PRINT to list observations for Subject values of 5, 100, 150, and 200. Do this twice, once using OR operators and once using the IN operator. Note: Subject is a numeric variable. */ libname a15022 '/folders/myfolders/my_prac'; **Creating Database; data a15022.hosp; do j = 1 to 1000; AdmitDate = int(ranuni(1234)*1200 + 15500); quarter = intck('qtr','01jan2002'd,AdmitDate); do i = 1 to quarter; if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then AdmitDate = AdmitDate + 1; if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then AdmitDate = AdmitDate - int(3*ranuni(0) + 1); DOB = int(25000*Ranuni(0) + '01jan1920'd); DischrDate = AdmitDate + abs(10*rannor(0) + 1); Subject + 1; output; end; end; drop i j; format AdmitDate DOB DischrDate mmddyy10.; run; proc print data=a15022.hosp; where subject eq 5 or subject eq 100 or subject eq 150 or subject eq 200; run; proc print data=a15022.hosp; where subject in (5,100,150,200); run;
Learning: PROC
PRINT with or and in operator. Multiple ‘or’ can be replaced with ‘in’
statement.
Output:
Obs
|
AdmitDate
|
quarter
|
DOB
|
DischrDate
|
Subject
|
5
|
08/03/2003
|
6
|
06/28/1928
|
08/12/2003
|
5
|
100
|
11/05/2004
|
11
|
09/10/1980
|
11/15/2004
|
100
|
150
|
03/31/2005
|
12
|
02/07/1923
|
04/11/2005
|
150
|
200
|
07/02/2004
|
10
|
09/25/1931
|
07/10/2004
|
200
|
Obs
|
AdmitDate
|
quarter
|
DOB
|
DischrDate
|
Subject
|
5
|
08/03/2003
|
6
|
06/28/1928
|
08/12/2003
|
5
|
100
|
11/05/2004
|
11
|
09/10/1980
|
11/15/2004
|
100
|
150
|
03/31/2005
|
12
|
02/07/1923
|
04/11/2005
|
150
|
200
|
07/02/2004
|
10
|
09/25/1931
|
07/10/2004
|
200
|
Code: Ch7_7.sas
/*Using the Sales data set, list all the observations where Region is North and Quantity is less than 60. Include in this list any observations where the customer name (Customer) is Pet's are Us*/ libname a15022 '/folders/myfolders/my_prac'; *Data set SALES; data a15022.sales; input EmpID : $4. Name & $15. Region : $5. Customer & $18. Date : mmddyy10. Item : $8. Quantity : 5. UnitCost : dollar9.; TotalSales = Quantity * UnitCost; /* format date mmddyy10. UnitCost TotalSales dollar9.;*/ drop Date; datalines; 1843 George Smith North Barco Corporation 10/10/2006 144L 50 $8.99 1843 George Smith South Cost Cutter's 10/11/2006 122 100 $5.99 1843 George Smith North Minimart Inc. 10/11/2006 188S 3 $5,199 1843 George Smith North Barco Corporation 10/15/2006 908X 1 $5,129 1843 George Smith South Ely Corp. 10/15/2006 122L 10 $29.95 0177 Glenda Johnson East Food Unlimited 9/1/2006 188X 100 $6.99 0177 Glenda Johnson East Shop and Drop 9/2/2006 144L 100 $8.99 1843 George Smith South Cost Cutter's 10/18/2006 855W 1 $9,109 9888 Sharon Lu West Cost Cutter's 11/14/2006 122 50 $5.99 9888 Sharon Lu West Pet's are Us 11/15/2006 100W 1000 $1.99 0017 Jason Nguyen East Roger's Spirits 11/15/2006 122L 500 $39.99 0017 Jason Nguyen South Spirited Spirits 12/22/2006 407XX 100 $19.95 0177 Glenda Johnson North Minimart Inc. 12/21/2006 777 5 $10.500 0177 Glenda Johnson East Barco Corporation 12/20/2006 733 2 $10,000 1843 George Smith North Minimart Inc. 11/19/2006 188S 3 $5,199 ; proc print data=a15022.sales; run; proc print data=a15022.sales; where Region eq 'North' and Quantity lt 60 and Customer like "%Pet's are Us%"; run;
Learning: ‘where’
condition is used to apply
constraints on dataset in PROC PRINT statement.
Conditional operator used here are
eq-> Equal to, lt-> Less than.
Output:
Obs
|
EmpID
|
Name
|
Region
|
put
|
Item
|
Quantity
|
UnitCost
|
TotalSales
|
1
|
1843
|
George Smith
|
North
|
Barco Corporation
|
144L
|
50
|
8.99
|
449.5
|
2
|
1843
|
George Smith
|
South
|
Cost Cutter's
|
122
|
100
|
5.99
|
599.0
|
3
|
1843
|
George Smith
|
North
|
Minimart Inc.
|
188S
|
3
|
5199.00
|
15597.0
|
4
|
1843
|
George Smith
|
North
|
Barco Corporation
|
908X
|
1
|
5129.00
|
5129.0
|
5
|
1843
|
George Smith
|
South
|
Ely Corp.
|
122L
|
10
|
29.95
|
299.5
|
6
|
0177
|
Glenda Johnson
|
East
|
Food Unlimited
|
188X
|
100
|
6.99
|
699.0
|
7
|
0177
|
Glenda Johnson
|
East
|
Shop and Drop
|
144L
|
100
|
8.99
|
899.0
|
8
|
1843
|
George Smith
|
South
|
Cost Cutter's
|
855W
|
1
|
9109.00
|
9109.0
|
9
|
9888
|
Sharon Lu
|
West
|
Cost Cutter's
|
122
|
50
|
5.99
|
299.5
|
10
|
9888
|
Sharon Lu
|
West
|
Pet's are Us
|
100W
|
1000
|
1.99
|
1990.0
|
11
|
0017
|
Jason Nguyen
|
East
|
Roger's Spirits
|
122L
|
500
|
39.99
|
19995.0
|
12
|
0017
|
Jason Nguyen
|
South
|
Spirited Spirits
|
407XX
|
100
|
19.95
|
1995.0
|
13
|
0177
|
Glenda Johnson
|
North
|
Minimart Inc.
|
777
|
5
|
10.50
|
52.5
|
14
|
0177
|
Glenda Johnson
|
East
|
Barco Corporation
|
733
|
2
|
10000.00
|
20000.0
|
15
|
1843
|
George Smith
|
North
|
Minimart Inc.
|
188S
|
3
|
5199.00
|
15597.0
|
Performing
iterative processing
Looping
is very important programming structure, especially when you have to run
particular type of instructions multiple times.
Code: Ch8_1.sas
/*Run the program here to create a temporary SAS data set called Vitals Using this data set, create a new data set (NewVitals) with the following new variables: For subjects less than 50 years of age: If Pulse is less than 70, set PulseGroup equal to Low; otherwise, set PulseGroup equal to High. If SBP is less than 130, set SBPGroup equal to Low; otherwise, set SBPGroup equal to High. For subjects greater than or equal to 50 years of age: If Pulse is less than 74, set PulseGroup equal to Low; otherwise, set PulseGroup equal to High. If SBP is less than 140, set SBPGroup equal to Low; otherwise, set SBPGroup equal to High. You may assume there are no missing values for Pulse or SBP*/ libname a15022 '/folders/myfolders/my_prac'; data a15022.vitals; input ID : $3. Age Pulse SBP DBP; label SBP = "Systolic Blood Pressure" DBP = "Diastolic Blood Pressure"; datalines; 001 23 68 120 80 002 55 72 188 96 003 78 82 200 100 004 18 58 110 70 005 43 52 120 82 006 37 74 150 98 007 . 82 140 100 ; data a15022.NewVitals; set a15022.vitals; if age lt 50 then do; if Pulse lt 70 then PulseGroup='Low'; else PulseGroup='High'; if SBP lt 130 then SBPGroup='Low'; else SBPGroup='High'; end; if age ge 50 then do; if Pulse lt 74 then PulseGroup='Low'; else PulseGroup='High'; if SBP lt 140 then SBPGroup='Low'; else SBPGroup='High'; end; run; proc print data=a15022.NewVitals; run;
Learning: ‘do’
in a if statement enables us to perform number of operations within that if
condition.
Output:
Obs
|
ID
|
Age
|
Pulse
|
SBP
|
DBP
|
PulseGroup
|
SBPGroup
|
1
|
001
|
23
|
68
|
120
|
80
|
Low
|
Low
|
2
|
002
|
55
|
72
|
188
|
96
|
Low
|
Hig
|
3
|
003
|
78
|
82
|
200
|
100
|
Hig
|
Hig
|
4
|
004
|
18
|
58
|
110
|
70
|
Low
|
Low
|
5
|
005
|
43
|
52
|
120
|
82
|
Low
|
Low
|
6
|
006
|
37
|
74
|
150
|
98
|
Hig
|
Hig
|
7
|
007
|
.
|
82
|
140
|
100
|
Hig
|
Hig
|
Code: Ch8_2.sas
/* Run the program here to create a temporary SAS data set (MonthSales): data monthsales; input month sales @@; /* add your line(s) here */ /*datalines; 1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500 9 5100 10 5700 11 6500 12 7500 ; Modify this program so that a new variable, SumSales, representing Sales to date, is added to the data set. Be sure that the missing value for Sales in month 3 does not result in a missing value for SumSales. */ libname a15022 '/folders/myfolders/my_prac'; data a15022.monthsales; input month sales @@; /* add your line(s) here */ SumSales+sales; datalines; 1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500 9 5100 10 5700 11 6500 12 7500 ; proc print data=a15022.monthsales; run;
Learning: Sum
function is another way of writing x=x+1. In above problem, we are
creating new column SumSales as SumSales=Sumsales+sales.
Output:
Obs
|
month
|
sales
|
SumSales
|
1
|
1
|
4000
|
4000
|
2
|
2
|
5000
|
9000
|
3
|
3
|
.
|
9000
|
4
|
4
|
5500
|
14500
|
5
|
5
|
5000
|
19500
|
6
|
6
|
6000
|
25500
|
7
|
7
|
6500
|
32000
|
8
|
8
|
4500
|
36500
|
9
|
9
|
5100
|
41600
|
10
|
10
|
5700
|
47300
|
11
|
11
|
6500
|
53800
|
12
|
12
|
7500
|
61300
|
Code:
ch8_4.sas
/* Count the number of missing values for the variables A, B, and C in the Missing data set. Add the cumulative number of missing values to each observation (use variable names MissA, MissB, and MissC). Use the MISSING function to test for the missing values. */ *Setting up private library; libname a15022 '/folders/myfolders/my_prac'; * data step ; data a15022.mss; *reading from missing.txt file; infile '/folders/myfolders/Data_Sources/missing.txt'; input A B C; * conditional statement with 'do'; if missing(A) then do; MissA+1; **Using sum statement; end; if missing(B) then do; MissB+1; end; if missing(C) then do; MissC+1; end; run; proc print data=a15022.mss; run;
Learning: Above example illustrates use of sum
function for calculating number of missing values.
Output given below is just upto 10
observations.
Output:
Obs
|
A
|
B
|
C
|
MissA
|
MissB
|
MissC
|
1
|
1
|
2
|
3
|
0
|
0
|
0
|
2
|
4
|
5
|
.
|
0
|
0
|
1
|
3
|
6
|
7
|
8
|
0
|
0
|
1
|
4
|
9
|
10
|
11
|
0
|
0
|
1
|
Sub setting
and combining datasets
SAS provides
various method for sub setting as well as combining different datasets.
Code: Ch10_1.sas
/*1.
Using the SAS data set Blood, create two temporary SAS data sets called
Subset_A
and Subset_B. Include in both of these data sets a variable called Combined equal to
.001 times WBC plus RBC. Subset_A should consist of observations from Blood
where Gender is equal to Female and BloodType is equal to AB. Subset_B should
consist of all observations from Blood where Gender is equal to Female, BloodType
is equal to AB, and Combined is greater than or equal to 14. */
* Creating dataset;
libname a15022'/folders/myfolders/my_prac';
data a15022.blood;
infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
input Subject
Gender
BloodType
AgeGroup
WBC
RBC
Chol;
label Gender = "Gender"
BloodType = "Blood Type"
AgeGroup = "Age Group"
Chol = "Cholesterol";
run;
proc print data=a15022.blood;
data a15022.subset_A;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB';
run;
proc print data=a15022.subset_A;
run;
data a15022.subset_B;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB' and ((WBC+RBC)*0.001) >= 14;
run;
proc print data=a15022.subset_B;
run;
and Subset_B. Include in both of these data sets a variable called Combined equal to
.001 times WBC plus RBC. Subset_A should consist of observations from Blood
where Gender is equal to Female and BloodType is equal to AB. Subset_B should
consist of all observations from Blood where Gender is equal to Female, BloodType
is equal to AB, and Combined is greater than or equal to 14. */
* Creating dataset;
libname a15022'/folders/myfolders/my_prac';
data a15022.blood;
infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
input Subject
Gender
BloodType
AgeGroup
WBC
RBC
Chol;
label Gender = "Gender"
BloodType = "Blood Type"
AgeGroup = "Age Group"
Chol = "Cholesterol";
run;
proc print data=a15022.blood;
data a15022.subset_A;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB';
run;
proc print data=a15022.subset_A;
run;
data a15022.subset_B;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB' and ((WBC+RBC)*0.001) >= 14;
run;
proc print data=a15022.subset_B;
run;
Learning: In above
problem, subset A and subset B are being derived based on condition given in ‘where’
statement.
Output:
Obs
|
Gender
|
BloodType
|
AgeGroup
|
Subject
|
WBC
|
RBC
|
Chol
|
Combine
|
1
|
Female
|
AB
|
Young
|
1
|
7710
|
7.40
|
258
|
7.71740
|
2
|
Female
|
AB
|
Old
|
25
|
7480
|
4.70
|
183
|
7.48470
|
3
|
Female
|
AB
|
Young
|
78
|
7410
|
5.82
|
175
|
7.41582
|
4
|
Female
|
AB
|
Young
|
79
|
.
|
4.61
|
69
|
.
|
5
|
Female
|
AB
|
Young
|
101
|
7610
|
4.60
|
162
|
7.61460
|
6
|
Female
|
AB
|
Old
|
142
|
9380
|
5.82
|
218
|
9.38582
|
7
|
Female
|
AB
|
Young
|
180
|
6220
|
5.58
|
.
|
6.22558
|
8
|
Female
|
AB
|
Old
|
199
|
6810
|
5.54
|
.
|
6.81554
|
9
|
Female
|
AB
|
Old
|
255
|
8080
|
5.45
|
144
|
8.08545
|
10
|
Female
|
AB
|
Young
|
260
|
7680
|
.
|
127
|
.
|
11
|
Female
|
AB
|
Young
|
288
|
6810
|
8.26
|
138
|
6.81826
|
12
|
Female
|
AB
|
Young
|
363
|
7950
|
5.24
|
.
|
7.95524
|
13
|
Female
|
AB
|
Old
|
366
|
7350
|
4.72
|
135
|
7.35472
|
14
|
Female
|
AB
|
Old
|
414
|
.
|
5.98
|
.
|
.
|
15
|
Female
|
AB
|
Old
|
449
|
7480
|
3.37
|
186
|
7.48337
|
16
|
Female
|
AB
|
Young
|
459
|
6620
|
6.19
|
184
|
6.62619
|
17
|
Female
|
AB
|
Old
|
528
|
6310
|
6.30
|
.
|
6.31630
|
18
|
Female
|
AB
|
Old
|
544
|
6360
|
4.88
|
.
|
6.36488
|
19
|
Female
|
AB
|
Old
|
652
|
9110
|
5.41
|
.
|
9.11541
|
20
|
Female
|
AB
|
Old
|
802
|
7200
|
3.37
|
185
|
7.20337
|
Code: Ch10_2.sas
/*
2. Using the a15022S data set Hosp, create a temporary a15022S data set called Monday2002,
consisting of observations from Hosp where the admission date (AdmitDate) falls on
a Monday and the year is 2002. Include in this new data set a variable called Age,
computed as the person’s age as of the admission date, rounded to the nearest year.
*/
*Setting up library;
libname a15022'/folders/myfolders/my_prac';
*create dataset Monday2002;
data a15022.Monday2002;
set a15022.hosp;
Age= round(yrdif(DOB,AdmitDate),1); *Age calculation using yrdif function;
where weekday(AdmitDate)=2 and year(AdmitDate)=2002; *Condition for subsetting;
run;
proc print data=a15022.Monday2002;
run;
*Data set HOSP;
data a15022.hosp;
do j = 1 to 1000;
AdmitDate = int(ranuni(1234)*1200 + 15500);
quarter = intck('qtr','01jan2002'd,AdmitDate);
do i = 1 to quarter;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
AdmitDate = AdmitDate + 1;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
DOB = int(25000*Ranuni(0) + '01jan1920'd);
DischrDate = AdmitDate + abs(10*rannor(0) + 1);
Subject + 1;
output;
end;
end;
drop i j;
format AdmitDate DOB DischrDate mmddyy10.;
run;
proc print data=a15022.hosp;
run;
2. Using the a15022S data set Hosp, create a temporary a15022S data set called Monday2002,
consisting of observations from Hosp where the admission date (AdmitDate) falls on
a Monday and the year is 2002. Include in this new data set a variable called Age,
computed as the person’s age as of the admission date, rounded to the nearest year.
*/
*Setting up library;
libname a15022'/folders/myfolders/my_prac';
*create dataset Monday2002;
data a15022.Monday2002;
set a15022.hosp;
Age= round(yrdif(DOB,AdmitDate),1); *Age calculation using yrdif function;
where weekday(AdmitDate)=2 and year(AdmitDate)=2002; *Condition for subsetting;
run;
proc print data=a15022.Monday2002;
run;
*Data set HOSP;
data a15022.hosp;
do j = 1 to 1000;
AdmitDate = int(ranuni(1234)*1200 + 15500);
quarter = intck('qtr','01jan2002'd,AdmitDate);
do i = 1 to quarter;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
AdmitDate = AdmitDate + 1;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
DOB = int(25000*Ranuni(0) + '01jan1920'd);
DischrDate = AdmitDate + abs(10*rannor(0) + 1);
Subject + 1;
output;
end;
end;
drop i j;
format AdmitDate DOB DischrDate mmddyy10.;
run;
proc print data=a15022.hosp;
run;
Learning: ‘where’
condition in data step enables us to derive a subset from given dataset.
Output:
Obs
|
AdmitDate
|
quarter
|
DOB
|
DischrDate
|
Subject
|
Age
|
1
|
11/11/2002
|
3
|
10/01/1928
|
11/18/2002
|
48
|
74
|
2
|
11/11/2002
|
3
|
05/08/1940
|
11/19/2002
|
49
|
63
|
3
|
11/11/2002
|
3
|
09/26/1942
|
11/15/2002
|
50
|
60
|
4
|
07/22/2002
|
2
|
10/14/1986
|
07/25/2002
|
69
|
16
|
5
|
07/22/2002
|
2
|
09/05/1981
|
07/23/2002
|
70
|
21
|
6
|
10/14/2002
|
3
|
01/28/1961
|
10/20/2002
|
289
|
42
|
7
|
10/14/2002
|
3
|
04/25/1922
|
10/25/2002
|
290
|
80
|
8
|
10/14/2002
|
3
|
11/06/1948
|
10/29/2002
|
291
|
54
|
9
|
12/23/2002
|
3
|
09/07/1949
|
01/02/2003
|
303
|
53
|
10
|
12/23/2002
|
3
|
10/31/1927
|
12/26/2002
|
304
|
75
|
Working
with numeric functions
SAS functions
are an essential tool in DATA step programming. They perform such tasks
as rounding numbers, computing dates from month-day-year values, summing and
averaging the values of SAS variables, and hundreds of other tasks.
as rounding numbers, computing dates from month-day-year values, summing and
averaging the values of SAS variables, and hundreds of other tasks.
Code: ch11_2.sas
/* Problem
2. Count the number of missing values for WBC, RBC, and Chol in the Blood data set.
Use the MISSING function to detect missing values.
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set BLOOD;
data a15022.blood;
infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
input Subject
Gender
BloodType
AgeGroup
WBC
RBC
Chol;
label Gender = "Gender"
BloodType = "Blood Type"
AgeGroup = "Age Group"
Chol = "Cholesterol";
run;
data a15022.miss;
set a15022.blood;
if missing(Chol) then MissChol + 1;
if missing(WBC) then MissWBC + 1;
if missing(RBC) then MissRBC + 1;
drop Subject Gender BloodType AgeGroup WBC RBC Chol;
run;
proc print data=a15022.miss;
run;
2. Count the number of missing values for WBC, RBC, and Chol in the Blood data set.
Use the MISSING function to detect missing values.
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set BLOOD;
data a15022.blood;
infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
input Subject
Gender
BloodType
AgeGroup
WBC
RBC
Chol;
label Gender = "Gender"
BloodType = "Blood Type"
AgeGroup = "Age Group"
Chol = "Cholesterol";
run;
data a15022.miss;
set a15022.blood;
if missing(Chol) then MissChol + 1;
if missing(WBC) then MissWBC + 1;
if missing(RBC) then MissRBC + 1;
drop Subject Gender BloodType AgeGroup WBC RBC Chol;
run;
proc print data=a15022.miss;
run;
Learning:
‘missing’
function lets you detect missing value in dataset.
Output:
Obs
|
MissChol
|
MissWBC
|
MissRBC
|
1
|
0
|
0
|
0
|
2
|
1
|
0
|
0
|
3
|
1
|
0
|
0
|
4
|
2
|
0
|
0
|
5
|
2
|
1
|
0
|
6
|
2
|
1
|
0
|
7
|
2
|
1
|
0
|
8
|
2
|
1
|
0
|
9
|
2
|
2
|
0
|
10
|
3
|
2
|
0
|
Working
with Character function
These
functions enable you to search for character strings, take strings apart and
put them back
together again, and remove selected characters from a string.
together again, and remove selected characters from a string.
Code: ch12_2.sas
/*
2. Using the data set Mixed, create a temporary SAS data set (also called Mixed) with
the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c.(Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set MIXED;
data a15022.mixed;
input Name & $20. ID;
datalines;
Daniel Fields 123
Patrice Helms 233
Thomas chien 998
;
data a15022.mixed;
set a15022.mixed;
Namelow=lowcase(Name); * fun lowcase to covert it into lower case;
Nameprop=propcase(Name); * fun uppercase to convert it into upper case;
* taking substring of first character in every word and making it upcase;
* then, concatenating all substrings to form full name with proper case;
fchar=upcase(substr(Name,1,1));
fname=cats(fchar,substr(Name,2,index(Name,' ')-1));
lchar=upcase(substr(Name,index(Name,' ')+1,1));
lname= cats(lchar,substr(Name,index(Name,' ')+2,length(Name)));
Namehard=catx(' ',fname,lname);
drop fname lname fchar lchar;
run;
proc print data=a15022.mixed;
run;
2. Using the data set Mixed, create a temporary SAS data set (also called Mixed) with
the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c.(Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set MIXED;
data a15022.mixed;
input Name & $20. ID;
datalines;
Daniel Fields 123
Patrice Helms 233
Thomas chien 998
;
data a15022.mixed;
set a15022.mixed;
Namelow=lowcase(Name); * fun lowcase to covert it into lower case;
Nameprop=propcase(Name); * fun uppercase to convert it into upper case;
* taking substring of first character in every word and making it upcase;
* then, concatenating all substrings to form full name with proper case;
fchar=upcase(substr(Name,1,1));
fname=cats(fchar,substr(Name,2,index(Name,' ')-1));
lchar=upcase(substr(Name,index(Name,' ')+1,1));
lname= cats(lchar,substr(Name,index(Name,' ')+2,length(Name)));
Namehard=catx(' ',fname,lname);
drop fname lname fchar lchar;
run;
proc print data=a15022.mixed;
run;
Learning: In above
code, up case function has been used to make observation uppercase. Another
functions such as cats-for concatenation, substr- for taking out substring,
catx- concatenating with space in between has been used to make observation in
Propercase which can also be done with in-built function ‘Propcase’.
Output :
Obs
|
Name
|
ID
|
Namelow
|
Nameprop
|
Namehard
|
1
|
Daniel Fields
|
123
|
daniel fields
|
Daniel Fields
|
Daniel Fields
|
2
|
Patrice Helms
|
233
|
patrice helms
|
Patrice Helms
|
Patrice Helms
|
3
|
Thomas chien
|
998
|
thomas chien
|
Thomas Chien
|
Thomas Chien
|
Arrays
Typically
arrays are used to perform a similar operation on a group of variables.
Code: ch13_2.sas
/*
. Redo Problem 1, except use data set Survey2.
Note: Ques1–Ques5 are numeric variables.*/
*Data set SURVEY2;
data a15022.survey2;
input ID
(Q1-Q5)(1.);
datalines;
535 13542
012 55443
723 21211
007 35142
;
* making dataset named survey1;
data a15022.survey1;
set a15022.survey;
* Creating a temporary array;
array temp{5} _temporary_;
*creating array for survey1;
array Q{5} Ques1-Ques5;
do i=1 to 5;
temp{6-i}=Q{i};
end;
do i=1 to 5;
Q{i}=temp{i};
end;
drop i;
proc print data=a15022.survey1;
run;
. Redo Problem 1, except use data set Survey2.
Note: Ques1–Ques5 are numeric variables.*/
*Data set SURVEY2;
data a15022.survey2;
input ID
(Q1-Q5)(1.);
datalines;
535 13542
012 55443
723 21211
007 35142
;
* making dataset named survey1;
data a15022.survey1;
set a15022.survey;
* Creating a temporary array;
array temp{5} _temporary_;
*creating array for survey1;
array Q{5} Ques1-Ques5;
do i=1 to 5;
temp{6-i}=Q{i};
end;
do i=1 to 5;
Q{i}=temp{i};
end;
drop i;
proc print data=a15022.survey1;
run;
Learning: In above
example, arrays have been used to perform operation on Question column of
Survey2 dataset. Here, we are reversing values of column Ques1 to Ques5 in
resulting survey1 table. _temporary_ lets us create a temporary array.
Output:
Obs
|
ID
|
Gender
|
Age
|
Salary
|
Ques1
|
Ques2
|
Ques3
|
Ques4
|
Ques5
|
1
|
1
|
M
|
23
|
28000
|
3
|
2
|
1
|
2
|
1
|
2
|
2
|
F
|
55
|
76123
|
1
|
1
|
2
|
5
|
4
|
3
|
3
|
M
|
38
|
36500
|
1
|
2
|
2
|
2
|
2
|
4
|
4
|
F
|
67
|
128000
|
4
|
2
|
2
|
3
|
5
|
5
|
5
|
M
|
22
|
23060
|
2
|
4
|
3
|
3
|
3
|
6
|
6
|
M
|
63
|
90000
|
3
|
4
|
5
|
3
|
2
|
7
|
7
|
F
|
45
|
76100
|
3
|
3
|
4
|
3
|
5
|
Displaying
your data
Code: Ch14_2.sas
/*
Use PROC PRINT (without any DATA steps) to create a listing like the one here.
Note: The variables in the Hosp data set are Subject, AdmitDate (Admission Date),
DischrDate (Discharge Date), and DOB (Date of Birth).
Selected Patients from HOSP Data Set
Admitted in September of 2004
Older than 83 years of age
--------------------------------------
Date of Admission Discharge
Subject Birth Date Date
401 03/21/1921 09/13/2004 09/22/2004
407 08/26/1920 09/13/2004 09/18/2004
409 01/01/1921 09/13/2004 10/02/2004
2577 04/30/1920 09/27/2004 09/27/2004
6889 10/26/1920 09/17/2004 09/22/2004
7495 02/11/1921 09/21/2004 09/22/2004
Number of Patients = 6
*/
libname a15022 '/folders/myfolders/my_prac';
title "Hosp Dataset";
title2 "Patients admitted in sept 2004 and age >83";
title3 "Older than 83 years of age";
proc print data=a15022.hosp
n='Number of Patients = '
label
double;
where Year(AdmitDate) eq 2004 and
Month(AdmitDate) eq 9 and
yrdif(DOB,AdmitDate,'Actual') ge 83;
id Subject;
var DOB AdmitDate DischrDate;
label AdmitDate = 'Admission Date'
DischrDate = 'Discharge Date'
DOB = 'Date of Birth';
run;
Use PROC PRINT (without any DATA steps) to create a listing like the one here.
Note: The variables in the Hosp data set are Subject, AdmitDate (Admission Date),
DischrDate (Discharge Date), and DOB (Date of Birth).
Selected Patients from HOSP Data Set
Admitted in September of 2004
Older than 83 years of age
--------------------------------------
Date of Admission Discharge
Subject Birth Date Date
401 03/21/1921 09/13/2004 09/22/2004
407 08/26/1920 09/13/2004 09/18/2004
409 01/01/1921 09/13/2004 10/02/2004
2577 04/30/1920 09/27/2004 09/27/2004
6889 10/26/1920 09/17/2004 09/22/2004
7495 02/11/1921 09/21/2004 09/22/2004
Number of Patients = 6
*/
libname a15022 '/folders/myfolders/my_prac';
title "Hosp Dataset";
title2 "Patients admitted in sept 2004 and age >83";
title3 "Older than 83 years of age";
proc print data=a15022.hosp
n='Number of Patients = '
label
double;
where Year(AdmitDate) eq 2004 and
Month(AdmitDate) eq 9 and
yrdif(DOB,AdmitDate,'Actual') ge 83;
id Subject;
var DOB AdmitDate DischrDate;
label AdmitDate = 'Admission Date'
DischrDate = 'Discharge Date'
DOB = 'Date of Birth';
run;
Learning: Label
statement lets you present data well. In given example data from Hosp dataset
is filtered and presented with label names.
Output:
Hosp Dataset
Patients admitted in
sept 2004 and age >83
Older than 83 years of age
Subject
|
Date of Birth
|
Admission Date
|
Discharge Date
|
401
|
03/21/1921
|
09/13/2004
|
09/22/2004
|
407
|
08/26/1920
|
09/13/2004
|
09/18/2004
|
409
|
01/01/1921
|
09/13/2004
|
10/02/2004
|
2577
|
04/30/1920
|
09/27/2004
|
09/27/2004
|
6889
|
10/26/1920
|
09/17/2004
|
09/22/2004
|
7495
|
02/11/1921
|
09/21/2004
|
09/22/2004
|
Number of Patients =
6
|
Creating
customized reports
Although
you can customize the output produced by PROC PRINT, there are times when
you need a bit more control over the appearance of your report. PROC REPORT was
developed to fit this need.
you need a bit more control over the appearance of your report. PROC REPORT was
developed to fit this need.
Code: Ch15_1.sas
/*
Using the SAS data set BloodPressure, produce a report showing Gender and Age,
along with a new variable called AgeGroup. Values of AgeGroup are <= 50 or
> 50 depending on the value of Age. Label this variable “Age Group.” Your report
should look like this:
Patient Age Groups
Age
Gender Age Group
M 23 <= 50
F 68 > 50
M 55 > 50
F 28 <= 50
M 35 <= 50
M 45 <= 50
F 48 <= 50
F 78 > 50
316 Learning SAS by Example: A Programmer’s Guide
6. Using the SAS data set BloodPressure, produce a report showing Gender, Age, SBP,
and DBP. Order the report in Gender and Age order as shown here:
Subject's in Gender and Age Order
Systolic Diastolic
Blood Blood
Gender Age Pressure Pressure
F 28 120 70
48 138 88
68 110
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set BLOODPRESSURE;
data a15022.bloodpressure;
input Gender : $1.
Age
SBP
DBP;
datalines;
M 23 144 90
F 68 110 62
M 55 130 80
F 28 120 70
M 35 142 82
M 45 150 96
F 48 138 88
F 78 132 76
;
title "Patient Age Groups";
proc report data=a15022.bloodpressure nowd;
column Gender Age AgeGroup;
define Gender / width=6;
define Age / display width=5;
define AgeGroup / computed "Age Group";
compute AgeGroup / character length=5;
if Age gt 50 then AgeGroup = '> 50';
else if not missing(Age) then AgeGroup = '<= 50';
endcomp;
run;
Using the SAS data set BloodPressure, produce a report showing Gender and Age,
along with a new variable called AgeGroup. Values of AgeGroup are <= 50 or
> 50 depending on the value of Age. Label this variable “Age Group.” Your report
should look like this:
Patient Age Groups
Age
Gender Age Group
M 23 <= 50
F 68 > 50
M 55 > 50
F 28 <= 50
M 35 <= 50
M 45 <= 50
F 48 <= 50
F 78 > 50
316 Learning SAS by Example: A Programmer’s Guide
6. Using the SAS data set BloodPressure, produce a report showing Gender, Age, SBP,
and DBP. Order the report in Gender and Age order as shown here:
Subject's in Gender and Age Order
Systolic Diastolic
Blood Blood
Gender Age Pressure Pressure
F 28 120 70
48 138 88
68 110
*/
libname a15022 '/folders/myfolders/my_prac';
*Data set BLOODPRESSURE;
data a15022.bloodpressure;
input Gender : $1.
Age
SBP
DBP;
datalines;
M 23 144 90
F 68 110 62
M 55 130 80
F 28 120 70
M 35 142 82
M 45 150 96
F 48 138 88
F 78 132 76
;
title "Patient Age Groups";
proc report data=a15022.bloodpressure nowd;
column Gender Age AgeGroup;
define Gender / width=6;
define Age / display width=5;
define AgeGroup / computed "Age Group";
compute AgeGroup / character length=5;
if Age gt 50 then AgeGroup = '> 50';
else if not missing(Age) then AgeGroup = '<= 50';
endcomp;
run;
Learning: Above example
shows usage of PROC report to generate report with a new column as AgeGroup and
defining width og existing table columns.
Output:
Patient Age Groups
Gender
|
Age
|
Age Group
|
M
|
23
|
<= 50
|
F
|
68
|
> 50
|
M
|
55
|
> 50
|
F
|
28
|
<= 50
|
M
|
35
|
<= 50
|
M
|
45
|
<= 50
|
F
|
48
|
<= 50
|
F
|
78
|
> 50
|
Programs and datasets...
Your blog looks clean and structured. You need to add more problems to make it more substantive.
ReplyDeleteAlso, put the structure you have created in to the play, by having separate page for each chapter and add an index page, linking the pages (for the chapters). You can also add links at each chapter to go back to index, previous (if any) and next (if any) pages. Alternatively, you can create a tabbed structure (one tab for each page)
Further:
1. Introduce each of the chapters briefly ~ a bit more than you did.
2. Make the blog page wider, so that the code or output results do not wrap. \
3. Keep the introduction and learning for each problem.
4. Write a conclusion or summary for each page and
5. Remember to number the problems solved by giving Chapter # and Problem #
You are not very far from what I am looking for. For the next submission, if time permits, you can give it a try.
Cheers,
Thank you sir. I will incorporate all review comments shortly after end term examinations.
ReplyDelete