It depends on what types of analysis datasets are required for creating tables but I am more of a data step programmer as it gives me more flexibility. For e.g creating a change from baseline data set for blood pressure sometimes I have to retain certain values …use arrays ….or use the first. -and last. variables.
There are two ways to display duplicate observations:
1)Compilation Phase:
When you submit a DATA step, it reads the input statements it creates an input buffer and brings the variables and observations. it is a logical memory area and pdv brings the observations at a time from input buffer
and checks the errors. PDV contains 2 automatic variables _n_ & _error_, these checks the errors in observations.
_n_: indicates the no of obs.
_error_: 1 if error occured
0 if no error
After that it assigns the data values to appropriate variable and builds a sas dataset.
Use KEEP option on a dataset to only select few variables from 100 variables. We can use KEEP option either on set statement or data step statement. If we use on SET statement then only the five variables are created on pdv and only these variables are sent to the output dataset. If we use KEEP option on data step statement then all the variables are copied into pdv and after any manipulation only the selected variables on data step statement are processed and sent to output dataset.
data abc;
set xyz (keep= ab cd ef gh);
run;
data abc(keep= ab cd ef gh ij);
set xyz;
ij=ab+jk;
run;
When you check for ordinary missing numeric values, you can use code that is similar to the following:
if numvar=. then do;
If your data contains special missing values, you can check for either an ordinary or special missing value with a statement that is similar to the following:
if numvar<=.z then do;
To check for a missing character value, you can use a statement that is similar to the following:
if charvar=' ' then do;
The MISSING function enables you to check for either a character or numeric missing value, as in:
if missing(var) then do;
Only PDV is generated at compilation time, whereas the automatic variables _N_ , _ERROR_ are generated at execution time only.
Use the statements INFILE to point to the file reference(should be defined using Filename) / to the file path. Use INPUT statement to read the data into the sas dataset.
line pointer is used for multiple lines per observation
@@ is used for multiple observations per line
No. I don’t think it can be used.
Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.PROC SQL can save a great deal of time if u want to filter the variables while selecting or u can modify them …apply format….creating new variables , macro variables…as well as sub setting the data.PROC SQL offers great flexibility for joining tables.
1)Use OPTIONS OBS = 0
2) Creating a data set by using the like clause.
ex: proc sql;
create table latha.emp like oracle.emp;
quit;
In this the like clause triggers the existing table structure to be copied to the new table. using this method
result in the creation of an empty table.
3) data check;
set _null_;
run;
4) data test;
delete;
run;
The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.
we can remove the format by using proc datasets:
Proc datasets;
modify <data set name>;
format <variable name>(which variable format needs to
modify>;
run;
quit;
I have used the SAS/ACCESS SQL pass thru facility for connection with external databases and importing tables from them and also Microsoft access and excel files.Besides this, lot of times I have used PROC SQL for joining tables.
When data set is created sas create descriptor portion and data portion .that me sas stores the information like variable name ,length,type etc.
Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.
PROC SQL can save a great deal of time if u want to filter the variables while selecting or we can modify them, apply format and creating new variables, macro variables. as well as subsetting the data. PROC SQL offers great flexibility for joining tables.
When a sas code is submitted, SAS performs syntactical checks before executing the program/ code. In that case, one of the ways could be - at the beginning of the code, write OPTIONS OBS=0 in addition to other options and then RUN it.
This way data will not be processed and the log shows error messages/ warnings, if any.
If you are executing the SAS code on PC SAS, the highlighted colors itself shows the syntactical errors, if any.
A typical SAS program could contain DATA steps, PROC steps and macros. Macros are preprocessed. DATA steps are just in time compiled. PROC steps are interpreted in the order they appear in program. So when we submit a SAS program consisting of all these three components, the macro is compiled and executed first. If a DATA step is encountered, then it is compiled and executed. Note that the DATA step will not be executed if there is an error in the compilation. If a PROC step is encountered, it is interpreted and executed line by line. However i am not certain on this PROC step behavior.
different types of functions:
TRIM : Removing the trailing blanks from character expressions.
syntax=trim(argument)
SUBSTR: substr extracts the substring from an argument
syntax=substr(argument, position<,n>)
Abs: Returns the absolute of the argument
Syntax=abs(argument)
The most common functions that would be used are-
Conversion functions - Input / Put / int / ceil / floor
Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase
Arithmetic functions - Sum / abs /
Attribute info functions – Attrn / length
Dataset – open / close / exist
Directory - dexist / dopen / dclose / dcreate / dinfo
File functions – fexist / fopen/ filename / fileref
SQL functions – coalesce / count / sum/ mean
Date functions – date / today / datdif / datepart / datetime / intck / mdy
Array functions – dim
Arrays is used for processing for large data set.
INDEX: Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string.
INDEX (source, excerpt)
it returns the position where the 2nd field is in the source
Eg:
str1 = 'Hi i am fine here. how are u there ? .....';
str2 = index(str1,'how'); --> str2 = 20
SCAN is to get a substring upto mentioned character. Scan function the best for locating the particular word specified in a argument,scan function default length is 200bytes.
data k;
r='ganesh kumar';
u=scan(r,2);
proc print;
run;
result: kumar
INDEXC to locate only for mentioned one or more single character
INDEXC(character-value, 'char1','char2','char3', ..)
FIND: Searches for a specific substring of characters within a character string
FIND(string, substring<,modifiers><,startups>)
it returns the position the substring is in
FINDC: To locate a character that appears or does not appear within a string. used to search for any one in a list of character values
INDEXW: Searches for the substring as a word that follows a space in the provided sentence
eg;string1= "there is a the here" ;
INDEXW(STRING1,"the")
result: 12 (the word "the")
If the data is continuosly in data set SAS would read the first words only from each line in the `datelines' block and it will ignore the rest of the line. if we use Trailing @@'it will read completly.and another type of trailing is using single @ this is a line hold specifier.
They are used for the records such as
001F38 H
002 F 40 G
To read these values to the data step
Data example:
input @10 type $ @;
if type='H' then
input @1 id @@4 gender $@@5 age2.;
else if type='G' then
input @1 id@@5 gender $@@7 age 2.;
end;
cards;
001F38 H
002 F 40 G
;
run;
The double trailing holds the until the end of the record.
Data example2:
input id age @@;
cards;
001 23 002 43 003 65 004 32 005 54
;
run;
Yes I have used it for summarization at times…For e.g if I have to calculate the max value of BP for patients 101 102 and 103 then I use the max (bpd) function to get the maximum value and use group by statement to group the patients accordingly.
I prefer Proc report as it is highly customizable and flexible where I can define each column in whatever way I want to and even make use of SAS functions, logic processing, and assignment statements and create new.
Variables for report making use of the compute block of proc report. I referred proc report because it is more efficient tool than tabulate because with report we can do me frequency and tabulate also.
NMISS OPTION is used for missing values.
The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.
PROC SQL;
CREATE TABLE BOTH AS
SELECT A.PATIENT,
A.DATE FORMAT=DATE@AS DATE,
A.PULSE,B.MED, B.DOSES,
B.AMT FORMAT=4.1
FROM VITALS A INNER JOIN DOSING B
ON (A.PATIENT = B.PATIENT)
AND(A.DATE = B.DATE)
ORDER BY PATIENT, DATE;
QUIT;