A step towards 'Data Science'..!!: Introduction to SAS by example...

What is SAS?

SAS stands for statistical analysis system. SAS is a collection of modules that are used to process and analyze data. SAS
is also an extremely powerful, general-purpose programming language.

Here we are going to see examples from base SAS programming which are particularly solved problems from the ‘SAS by example by Ron Cody’.

Getting data file into SAS

Here is an example of getting your data from text file to SAS dataset. Here default delimiter is a blank space (‘ ‘).

The program:

libname myprac1 '/folders/myfolders/';

data myprac1.veg;
   infile "/folders/myfolders/veggies.txt";
   input Name $ Code $ Days Number Price;
   CostPerSeed = Price / Number;
run;
proc print data=myprac1.veg;
run;

Output:

Obs	Name	Code	Days	Number	Price	CostPerSeed
1	Cucumber	50104-A	55	30	195	6.50000
2	Cucumber	51789-A	56	30	215	7.16667
3	Carrot	50179-A	68	1500	355	0.23667
4	Carrot	50872-A	65	1500	325	0.21667
5	Beetroot	57224-A	75	200	375	1.87500
6	Beetroot	62471-A	80	200	395	1.97500
7	Brinjal	57828-A	66	200	295	1.47500
8	Brinjal	52233-A	70	30	225	7.50000

Above code starts from defining private library name with statement ‘libname’. Default library for SAS datasets are ‘Work’ library. Proc step is used to perform various process on your dataset. Here, we are printing dataset.

Conditional processing

SAS evaluates the expression in an IF-THEN statement to produce a result that is either non-zero, zero, or missing. A non-zero and nonmissing result causes the expression to be true; a result of zero or missing causes the expression to be false.

If the conditions that are specified in the IF clause are met, the IF-THEN statement executes a SAS statement for observations that are read from a SAS data set, for records in an external file, or for computed values. An optional ELSE statement gives an alternative action if the THEN clause is not executed. The ELSE statement, if used, must immediately follow the IF-THEN statement.

Below are some problems from Chapter 7 of Cody`s book given with their solution code and output.

NOTE: Problem statements are given in each code snippet in first comment section. Various comments have been incorporated in order to improve the readability. Learning explanations are given below every code snippet.

Code: ch7_1.sas

Please note that problem statements are given in code snippet in comment section as below.

/*

Problem 1

Run the program here to create a temporary SAS data set called School:

data school;

input Age Quiz : $1. Midterm Final;

/* Add you statements here 

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;*//*

Using IF and ELSE IF statements, compute two new variables as follows: Grade

(numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.

The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75,

D = 70, and F = 65. Using this information, compute a course grade (Course) as a

weighted average of the Quiz (20%), Midterm (30%) and Final (50%).;*/



libname a15022 '/folders/myfolders/my_prac';



data a15022.school;

input Age Quiz : $1. Midterm Final;

/* Add you statements here */



if Age eq 12 then Grade=6;

else if Age eq 13 then Grade=8;



if Quiz eq 'A' then Course_Grade=(95*20+Midterm*30+Final*50)/100;

if Quiz eq 'B' then Course_Grade=(85*20+Midterm*30+Final*50)/100;

if Quiz eq 'C' then Course_Grade=(75*20+Midterm*30+Final*50)/100;

if Quiz eq 'F' then Course_Grade=(65*20+Midterm*30+Final*50)/100;

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;



proc print data=a15022.school;

run;

Learning: Use of conditional statements to assign grades based on scores.

Output:

Obs	Age	Quiz	Midterm	Final	Grade	Course_Grade
1	12	A	92	95	6	94.1
2	12	B	88	88	6	87.4
3	13	C	78	75	8	75.9
4	13	A	92	93	8	93.1
5	12	F	55	62	6	60.5
6	13	B	88	82	8	84.4

Code: ch7_2.sas

/*

2. Using the SAS data set Hosp, use PROC PRINT to list observations for Subject

values of 5, 100, 150, and 200. Do this twice, once using OR operators and once

using the IN operator.

Note: Subject is a numeric variable.

*/

libname a15022 '/folders/myfolders/my_prac';

**Creating Database;

data a15022.hosp;

   do j = 1 to 1000;

      AdmitDate = int(ranuni(1234)*1200 + 15500);

      quarter = intck('qtr','01jan2002'd,AdmitDate);

      do i = 1 to quarter;

         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then

            AdmitDate = AdmitDate + 1;

         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then

            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);

         DOB = int(25000*Ranuni(0) + '01jan1920'd);

         DischrDate = AdmitDate + abs(10*rannor(0) + 1);

         Subject + 1;

         output;

      end;

   end;

   drop i j;

   format AdmitDate DOB DischrDate mmddyy10.;

run;



proc print data=a15022.hosp;

      where subject eq 5 or subject eq 100

      or subject eq 150 or subject eq 200;

      run;

      

      

proc print data=a15022.hosp;

      where subject in (5,100,150,200);

run;

Learning: PROC PRINT with or and in operator. Multiple ‘or’ can be replaced with ‘in’ statement.

Output:

Obs	AdmitDate	quarter	DOB	DischrDate	Subject
5	08/03/2003	6	06/28/1928	08/12/2003	5
100	11/05/2004	11	09/10/1980	11/15/2004	100
150	03/31/2005	12	02/07/1923	04/11/2005	150
200	07/02/2004	10	09/25/1931	07/10/2004	200

Obs	AdmitDate	quarter	DOB	DischrDate	Subject
5	08/03/2003	6	06/28/1928	08/12/2003	5
100	11/05/2004	11	09/10/1980	11/15/2004	100
150	03/31/2005	12	02/07/1923	04/11/2005	150
200	07/02/2004	10	09/25/1931	07/10/2004	200

Code: Ch7_7.sas

/*Using the Sales data set, list all the observations where Region is North and

Quantity is less than 60. Include in this list any observations where the customer

name (Customer) is Pet's are Us*/



libname a15022 '/folders/myfolders/my_prac';



*Data set SALES;

data a15022.sales;

   input    EmpID     :       $4. 

            Name      &      $15.

            Region    :       $5.

            Customer  &      $18.

            Date      : mmddyy10.

            Item      :       $8.

            Quantity  :        5.

            UnitCost  :  dollar9.;

   TotalSales = Quantity * UnitCost;

/*   format date mmddyy10. UnitCost TotalSales dollar9.;*/

   drop Date;

datalines;

1843 George Smith  North Barco Corporation  10/10/2006 144L 50 $8.99

1843 George Smith  South Cost Cutter's  10/11/2006 122 100 $5.99

1843 George Smith  North Minimart Inc.  10/11/2006 188S 3 $5,199

1843 George Smith  North Barco Corporation  10/15/2006 908X 1 $5,129

1843 George Smith  South Ely Corp.  10/15/2006 122L 10 $29.95

0177 Glenda Johnson  East Food Unlimited  9/1/2006 188X 100 $6.99

0177 Glenda Johnson  East Shop and Drop  9/2/2006 144L 100 $8.99

1843 George Smith  South Cost Cutter's  10/18/2006 855W 1 $9,109

9888 Sharon Lu  West Cost Cutter's  11/14/2006 122 50 $5.99

9888 Sharon Lu  West Pet's are Us  11/15/2006 100W 1000 $1.99

0017 Jason Nguyen  East Roger's Spirits  11/15/2006 122L 500 $39.99

0017 Jason Nguyen  South Spirited Spirits  12/22/2006 407XX 100 $19.95

0177 Glenda Johnson  North Minimart Inc.  12/21/2006 777 5 $10.500

0177 Glenda Johnson  East Barco Corporation  12/20/2006 733 2 $10,000

1843 George Smith  North Minimart Inc.  11/19/2006 188S 3 $5,199

;

proc print data=a15022.sales;

run;



proc print data=a15022.sales;

where Region eq 'North' and Quantity lt 60 and Customer like "%Pet's are Us%";

run;

Learning: ‘where’ condition is used to apply constraints on dataset in PROC PRINT statement.

Conditional operator used here are eq-> Equal to, lt-> Less than.

Output:

Obs	EmpID	Name	Region	put	Item	Quantity	UnitCost	TotalSales
1	1843	George Smith	North	Barco Corporation	144L	50	8.99	449.5
2	1843	George Smith	South	Cost Cutter's	122	100	5.99	599.0
3	1843	George Smith	North	Minimart Inc.	188S	3	5199.00	15597.0
4	1843	George Smith	North	Barco Corporation	908X	1	5129.00	5129.0
5	1843	George Smith	South	Ely Corp.	122L	10	29.95	299.5
6	0177	Glenda Johnson	East	Food Unlimited	188X	100	6.99	699.0
7	0177	Glenda Johnson	East	Shop and Drop	144L	100	8.99	899.0
8	1843	George Smith	South	Cost Cutter's	855W	1	9109.00	9109.0
9	9888	Sharon Lu	West	Cost Cutter's	122	50	5.99	299.5
10	9888	Sharon Lu	West	Pet's are Us	100W	1000	1.99	1990.0
11	0017	Jason Nguyen	East	Roger's Spirits	122L	500	39.99	19995.0
12	0017	Jason Nguyen	South	Spirited Spirits	407XX	100	19.95	1995.0
13	0177	Glenda Johnson	North	Minimart Inc.	777	5	10.50	52.5
14	0177	Glenda Johnson	East	Barco Corporation	733	2	10000.00	20000.0
15	1843	George Smith	North	Minimart Inc.	188S	3	5199.00	15597.0

Performing iterative processing

Looping is very important programming structure, especially when you have to run particular type of instructions multiple times.

Code: Ch8_1.sas

/*Run the program here to create a temporary SAS data set called Vitals

Using this data set, create a new data set (NewVitals) with the following new

variables:

For subjects less than 50 years of age:

If Pulse is less than 70, set PulseGroup equal to Low;

otherwise, set PulseGroup equal to High.

If SBP is less than 130, set SBPGroup equal to Low;

otherwise, set SBPGroup equal to High.



For subjects greater than or equal to 50 years of age:

If Pulse is less than 74, set PulseGroup equal to Low;

otherwise, set PulseGroup equal to High.

If SBP is less than 140, set SBPGroup equal to Low;

otherwise, set SBPGroup equal to High.

You may assume there are no missing values for Pulse or SBP*/



libname a15022 '/folders/myfolders/my_prac';

data a15022.vitals;

input ID : $3.

Age

Pulse

SBP

DBP;

label SBP = "Systolic Blood Pressure"

DBP = "Diastolic Blood Pressure";

datalines;

001 23 68 120 80

002 55 72 188 96

003 78 82 200 100

004 18 58 110 70

005 43 52 120 82

006 37 74 150 98

007 . 82 140 100

;

data a15022.NewVitals;

      set a15022.vitals;

if age lt 50 then do;

      if Pulse lt 70 then PulseGroup='Low';

      else PulseGroup='High';

      if SBP lt 130 then SBPGroup='Low';

      else SBPGroup='High';

      end;

if age ge 50 then do;

      if Pulse lt 74 then PulseGroup='Low';

      else PulseGroup='High';

      if SBP lt 140 then SBPGroup='Low';

      else SBPGroup='High';

      end;

      run;

      

proc print data=a15022.NewVitals;

run;

Learning: ‘do’ in a if statement enables us to perform number of operations within that if condition.

Output:

Obs	ID	Age	Pulse	SBP	DBP	PulseGroup	SBPGroup
1	001	23	68	120	80	Low	Low
2	002	55	72	188	96	Low	Hig
3	003	78	82	200	100	Hig	Hig
4	004	18	58	110	70	Low	Low
5	005	43	52	120	82	Low	Low
6	006	37	74	150	98	Hig	Hig
7	007	.	82	140	100	Hig	Hig

Code: Ch8_2.sas

/*

Run the program here to create a temporary SAS data set (MonthSales):

data monthsales;

input month sales @@;

/* add your line(s) here */

/*datalines;

1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500

9 5100 10 5700 11 6500 12 7500

;

Modify this program so that a new variable, SumSales, representing Sales to date, is

added to the data set. Be sure that the missing value for Sales in month 3 does not

result in a missing value for SumSales.

*/

libname a15022 '/folders/myfolders/my_prac';

data a15022.monthsales;

input month sales @@;

/* add your line(s) here */

SumSales+sales;

datalines;

1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500

9 5100 10 5700 11 6500 12 7500

;



proc print data=a15022.monthsales;

run;

Learning: Sum function is another way of writing x=x+1. In above problem, we are creating new column SumSales as SumSales=Sumsales+sales.

Output:

Obs	month	sales	SumSales
1	1	4000	4000
2	2	5000	9000
3	3	.	9000
4	4	5500	14500
5	5	5000	19500
6	6	6000	25500
7	7	6500	32000
8	8	4500	36500
9	9	5100	41600
10	10	5700	47300
11	11	6500	53800
12	12	7500	61300

Code: ch8_4.sas

/*

Count the number of missing values for the variables A, B, and C in the Missing data

set. Add the cumulative number of missing values to each observation (use variable

names MissA, MissB, and MissC). Use the MISSING function to test for the missing

values.

*/

*Setting up private library;

libname a15022 '/folders/myfolders/my_prac';

* data step ;
data a15022.mss;

*reading from missing.txt file;

infile '/folders/myfolders/Data_Sources/missing.txt';

input A B C;

* conditional statement with 'do';

if missing(A) then do;

 MissA+1;  **Using sum statement;

end;

if missing(B) then do;

 MissB+1;

end;

if missing(C) then do;

 MissC+1;

end;

run;



proc print data=a15022.mss;

run;

Learning: Above example illustrates use of sum function for calculating number of missing values.

Output given below is just upto 10 observations.

Output:

Obs	A	B	C	MissA	MissB	MissC
1	1	2	3	0	0	0
2	4	5	.	0	0	1
3	6	7	8	0	0	1
4	9	10	11	0	0	1

Sub setting and combining datasets

SAS provides various method for sub setting as well as combining different datasets.

Code: Ch10_1.sas

/*1. Using the SAS data set Blood, create two temporary SAS data sets called Subset_A
and Subset_B. Include in both of these data sets a variable called Combined equal to
.001 times WBC plus RBC. Subset_A should consist of observations from Blood
where Gender is equal to Female and BloodType is equal to AB. Subset_B should
consist of all observations from Blood where Gender is equal to Female, BloodType
is equal to AB, and Combined is greater than or equal to 14. */

* Creating dataset;
libname a15022'/folders/myfolders/my_prac';
data a15022.blood;
   infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject
         Gender
         BloodType
         AgeGroup
         WBC
         RBC
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
run;

proc print data=a15022.blood;

data a15022.subset_A;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB';
run;

proc print data=a15022.subset_A;
run;

data a15022.subset_B;
set a15022.blood;
Combine=(WBC+RBC)*0.001;
where Gender='Female' and BloodType='AB' and ((WBC+RBC)*0.001) >= 14;
run;

proc print data=a15022.subset_B;
run;

Learning: In above problem, subset A and subset B are being derived based on condition given in ‘where’ statement.

Output:

Obs	Gender	BloodType	AgeGroup	Subject	WBC	RBC	Chol	Combine
1	Female	AB	Young	1	7710	7.40	258	7.71740
2	Female	AB	Old	25	7480	4.70	183	7.48470
3	Female	AB	Young	78	7410	5.82	175	7.41582
4	Female	AB	Young	79	.	4.61	69	.
5	Female	AB	Young	101	7610	4.60	162	7.61460
6	Female	AB	Old	142	9380	5.82	218	9.38582
7	Female	AB	Young	180	6220	5.58	.	6.22558
8	Female	AB	Old	199	6810	5.54	.	6.81554
9	Female	AB	Old	255	8080	5.45	144	8.08545
10	Female	AB	Young	260	7680	.	127	.
11	Female	AB	Young	288	6810	8.26	138	6.81826
12	Female	AB	Young	363	7950	5.24	.	7.95524
13	Female	AB	Old	366	7350	4.72	135	7.35472
14	Female	AB	Old	414	.	5.98	.	.
15	Female	AB	Old	449	7480	3.37	186	7.48337
16	Female	AB	Young	459	6620	6.19	184	6.62619
17	Female	AB	Old	528	6310	6.30	.	6.31630
18	Female	AB	Old	544	6360	4.88	.	6.36488
19	Female	AB	Old	652	9110	5.41	.	9.11541
20	Female	AB	Old	802	7200	3.37	185	7.20337

Code: Ch10_2.sas

/*
2. Using the a15022S data set Hosp, create a temporary a15022S data set called Monday2002,
consisting of observations from Hosp where the admission date (AdmitDate) falls on
a Monday and the year is 2002. Include in this new data set a variable called Age,
computed as the person’s age as of the admission date, rounded to the nearest year.

*/

*Setting up library;
libname a15022'/folders/myfolders/my_prac';

*create dataset Monday2002;
data a15022.Monday2002;
      set a15022.hosp;
Age= round(yrdif(DOB,AdmitDate),1); *Age calculation using yrdif function;
where weekday(AdmitDate)=2 and year(AdmitDate)=2002; *Condition for subsetting;
run;
proc print data=a15022.Monday2002;
run;

*Data set HOSP;
data a15022.hosp;
   do j = 1 to 1000;
      AdmitDate = int(ranuni(1234)*1200 + 15500);
      quarter = intck('qtr','01jan2002'd,AdmitDate);
      do i = 1 to quarter;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
            AdmitDate = AdmitDate + 1;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
         DOB = int(25000*Ranuni(0) + '01jan1920'd);
         DischrDate = AdmitDate + abs(10*rannor(0) + 1);
         Subject + 1;
         output;
      end;
   end;
   drop i j;
   format AdmitDate DOB DischrDate mmddyy10.;
run;

proc print data=a15022.hosp;
run;

Learning: ‘where’ condition in data step enables us to derive a subset from given dataset.

Output:

Obs	AdmitDate	quarter	DOB	DischrDate	Subject	Age
1	11/11/2002	3	10/01/1928	11/18/2002	48	74
2	11/11/2002	3	05/08/1940	11/19/2002	49	63
3	11/11/2002	3	09/26/1942	11/15/2002	50	60
4	07/22/2002	2	10/14/1986	07/25/2002	69	16
5	07/22/2002	2	09/05/1981	07/23/2002	70	21
6	10/14/2002	3	01/28/1961	10/20/2002	289	42
7	10/14/2002	3	04/25/1922	10/25/2002	290	80
8	10/14/2002	3	11/06/1948	10/29/2002	291	54
9	12/23/2002	3	09/07/1949	01/02/2003	303	53
10	12/23/2002	3	10/31/1927	12/26/2002	304	75

Working with numeric functions

SAS functions are an essential tool in DATA step programming. They perform such tasks
as rounding numbers, computing dates from month-day-year values, summing and
averaging the values of SAS variables, and hundreds of other tasks.

Code: ch11_2.sas

/* Problem
2. Count the number of missing values for WBC, RBC, and Chol in the Blood data set.
Use the MISSING function to detect missing values.
*/
libname a15022 '/folders/myfolders/my_prac';

*Data set BLOOD;
data a15022.blood;
   infile '/folders/myfolders/Data_Sources/blood.txt' truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject
         Gender
         BloodType
         AgeGroup
         WBC
         RBC
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
run;

data a15022.miss;
set a15022.blood;
if missing(Chol) then MissChol + 1;
if missing(WBC) then MissWBC + 1;
if missing(RBC) then MissRBC + 1;
drop Subject Gender BloodType AgeGroup WBC RBC Chol;
run;

proc print data=a15022.miss;
run;

Learning:

‘missing’ function lets you detect missing value in dataset.

Output:

Obs	MissChol	MissWBC	MissRBC
1	0	0	0
2	1	0	0
3	1	0	0
4	2	0	0
5	2	1	0
6	2	1	0
7	2	1	0
8	2	1	0
9	2	2	0
10	3	2	0

Working with Character function

These functions enable you to search for character strings, take strings apart and put them back
together again, and remove selected characters from a string.

Code: ch12_2.sas

/*
2. Using the data set Mixed, create a temporary SAS data set (also called Mixed) with
the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c.(Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function
*/

libname a15022 '/folders/myfolders/my_prac';

*Data set MIXED;
data a15022.mixed;
input Name & $20. ID;
datalines;
Daniel Fields 123
Patrice Helms 233
Thomas chien 998
;

data a15022.mixed;
set a15022.mixed;
Namelow=lowcase(Name); * fun lowcase to covert it into lower case;
Nameprop=propcase(Name); * fun uppercase to convert it into upper case;

* taking substring of first character in every word and making it upcase;
* then, concatenating all substrings to form full name with proper case;
fchar=upcase(substr(Name,1,1));
fname=cats(fchar,substr(Name,2,index(Name,' ')-1));
lchar=upcase(substr(Name,index(Name,' ')+1,1));
lname= cats(lchar,substr(Name,index(Name,' ')+2,length(Name)));
Namehard=catx(' ',fname,lname);
drop fname lname fchar lchar;
run;

proc print data=a15022.mixed;
run;

Learning: In above code, up case function has been used to make observation uppercase. Another functions such as cats-for concatenation, substr- for taking out substring, catx- concatenating with space in between has been used to make observation in Propercase which can also be done with in-built function ‘Propcase’.

Output :

Obs	Name	ID	Namelow	Nameprop	Namehard
1	Daniel Fields	123	daniel fields	Daniel Fields	Daniel Fields
2	Patrice Helms	233	patrice helms	Patrice Helms	Patrice Helms
3	Thomas chien	998	thomas chien	Thomas Chien	Thomas Chien

Arrays

Typically arrays are used to perform a similar operation on a group of variables.

Code: ch13_2.sas

/*
. Redo Problem 1, except use data set Survey2.
Note: Ques1–Ques5 are numeric variables.*/

*Data set SURVEY2;
data a15022.survey2;
input ID
(Q1-Q5)(1.);
datalines;
535 13542
012 55443
723 21211
007 35142
;

* making dataset named survey1;
data a15022.survey1;
set a15022.survey;
* Creating a temporary array;
array temp{5} _temporary_;
*creating array for survey1;
array Q{5} Ques1-Ques5;
do i=1 to 5;
temp{6-i}=Q{i};
end;

do i=1 to 5;
Q{i}=temp{i};
end;
drop i;
proc print data=a15022.survey1;
run;

Learning: In above example, arrays have been used to perform operation on Question column of Survey2 dataset. Here, we are reversing values of column Ques1 to Ques5 in resulting survey1 table. _temporary_ lets us create a temporary array.

Output:

Obs	ID	Gender	Age	Salary	Ques1	Ques2	Ques3	Ques4	Ques5
1	1	M	23	28000	3	2	1	2	1
2	2	F	55	76123	1	1	2	5	4
3	3	M	38	36500	1	2	2	2	2
4	4	F	67	128000	4	2	2	3	5
5	5	M	22	23060	2	4	3	3	3
6	6	M	63	90000	3	4	5	3	2
7	7	F	45	76100	3	3	4	3	5

Displaying your data

Code: Ch14_2.sas

/*
Use PROC PRINT (without any DATA steps) to create a listing like the one here.
Note: The variables in the Hosp data set are Subject, AdmitDate (Admission Date),
DischrDate (Discharge Date), and DOB (Date of Birth).
Selected Patients from HOSP Data Set
Admitted in September of 2004
Older than 83 years of age
--------------------------------------
Date of Admission Discharge
Subject Birth Date Date
401 03/21/1921 09/13/2004 09/22/2004
407 08/26/1920 09/13/2004 09/18/2004
409 01/01/1921 09/13/2004 10/02/2004
2577 04/30/1920 09/27/2004 09/27/2004
6889 10/26/1920 09/17/2004 09/22/2004
7495 02/11/1921 09/21/2004 09/22/2004
Number of Patients = 6
*/

libname a15022 '/folders/myfolders/my_prac';

title "Hosp Dataset";
title2 "Patients admitted in sept 2004 and age >83";
title3 "Older than 83 years of age";

proc print data=a15022.hosp
           n='Number of Patients = '
           label
           double;
   where Year(AdmitDate) eq 2004 and
         Month(AdmitDate) eq 9 and
         yrdif(DOB,AdmitDate,'Actual') ge 83;
   id Subject;
   var DOB AdmitDate DischrDate;
   label AdmitDate = 'Admission Date'
         DischrDate = 'Discharge Date'
         DOB = 'Date of Birth';
run;

Learning: Label statement lets you present data well. In given example data from Hosp dataset is filtered and presented with label names.

Output:

Hosp Dataset

Patients admitted in sept 2004 and age >83

Older than 83 years of age

Subject	Date of Birth	Admission Date	Discharge Date
401	03/21/1921	09/13/2004	09/22/2004
407	08/26/1920	09/13/2004	09/18/2004
409	01/01/1921	09/13/2004	10/02/2004
2577	04/30/1920	09/27/2004	09/27/2004
6889	10/26/1920	09/17/2004	09/22/2004
7495	02/11/1921	09/21/2004	09/22/2004
Number of Patients = 6

Creating customized reports

Although you can customize the output produced by PROC PRINT, there are times when
you need a bit more control over the appearance of your report. PROC REPORT was
developed to fit this need.

Code: Ch15_1.sas

/*
Using the SAS data set BloodPressure, produce a report showing Gender and Age,
along with a new variable called AgeGroup. Values of AgeGroup are <= 50 or
> 50 depending on the value of Age. Label this variable “Age Group.” Your report
should look like this:
Patient Age Groups
Age
Gender Age Group
M 23 <= 50
F 68 > 50
M 55 > 50
F 28 <= 50
M 35 <= 50
M 45 <= 50
F 48 <= 50
F 78 > 50
316 Learning SAS by Example: A Programmer’s Guide
6. Using the SAS data set BloodPressure, produce a report showing Gender, Age, SBP,
and DBP. Order the report in Gender and Age order as shown here:
Subject's in Gender and Age Order
Systolic Diastolic
Blood Blood
Gender Age Pressure Pressure
F 28 120 70
48 138 88
68 110
*/
libname a15022 '/folders/myfolders/my_prac';

*Data set BLOODPRESSURE;
data a15022.bloodpressure;
   input Gender : $1.
         Age
         SBP
         DBP;
datalines;
M 23 144 90
F 68 110 62
M 55 130 80
F 28 120 70
M 35 142 82
M 45 150 96
F 48 138 88
F 78 132 76
;

title "Patient Age Groups";
proc report data=a15022.bloodpressure nowd;
   column Gender Age AgeGroup;
   define Gender / width=6;
   define Age / display width=5;
   define AgeGroup / computed "Age Group";
   compute AgeGroup / character length=5;
      if Age gt 50 then AgeGroup = '> 50';
      else if not missing(Age) then AgeGroup = '<= 50';
   endcomp;
run;

Learning: Above example shows usage of PROC report to generate report with a new column as AgeGroup and defining width og existing table columns.

Output:

Patient Age Groups

Gender	Age	Age Group
M	23	<= 50
F	68	> 50
M	55	> 50
F	28	<= 50
M	35	<= 50
M	45	<= 50
F	48	<= 50
F	78	> 50

Please click on below link to get all the program with data sets..
Programs and datasets...

2 comments:

Unknown23 September 2015 at 06:35
Your blog looks clean and structured. You need to add more problems to make it more substantive.

Also, put the structure you have created in to the play, by having separate page for each chapter and add an index page, linking the pages (for the chapters). You can also add links at each chapter to go back to index, previous (if any) and next (if any) pages. Alternatively, you can create a tabbed structure (one tab for each page)

Further:
1. Introduce each of the chapters briefly ~ a bit more than you did.
2. Make the blog page wider, so that the code or output results do not wrap. \
3. Keep the introduction and learning for each problem.
4. Write a conclusion or summary for each page and
5. Remember to number the problems solved by giving Chapter # and Problem #

You are not very far from what I am looking for. For the next submission, if time permits, you can give it a try.

Cheers,
Sagar23 September 2015 at 09:49
Thank you sir. I will incorporate all review comments shortly after end term examinations.

Sunday, 20 September 2015

Introduction to SAS by example...

Code: ch7_1.sas

Code: ch7_2.sas

Code: Ch7_7.sas

Code: Ch8_1.sas

Code: Ch8_2.sas

Code: ch8_4.sas

2 comments: