Top 27 Base Sas Interview Questions You Must Prepare 25.Apr.2024

It depends on what types of analysis datasets are required for creating tables but I am more of a data step programmer as it gives me more flexibility. For e.g creating a change from baseline data set for blood pressure sometimes I have to retain certain values …use arrays ….or use the first. -and last. variables.

There are two ways to display duplicate observations:

  1. In data step, using first.var and last.var
  2. Using proc sort with option dupout option.

1)Compilation Phase:

When you submit a DATA step, it reads the input statements it creates an input buffer and brings the variables and observations. it is a logical memory area and pdv brings the observations at a time from input buffer 

and checks the errors. PDV contains 2 automatic variables _n_ & _error_, these checks the errors in observations.

_n_: indicates the no of obs.

_error_: 1 if error occured

             0 if no error

After that it assigns the data values to appropriate variable and builds a sas dataset.

Use KEEP option on a dataset to only select few variables from 100 variables. We can use KEEP option either on set statement or data step statement. If we use on SET statement then only the five variables are created on pdv and only these variables are sent to the output dataset. If we use KEEP option on data step statement then all the variables are copied into pdv and after any manipulation only the selected variables on data step statement are processed and sent to output dataset.

data abc;

  set xyz (keep= ab cd ef gh);

run;

data abc(keep= ab cd ef gh ij);

  set xyz;

 ij=ab+jk;

run;

When you check for ordinary missing numeric values, you can use code that is similar to the following: 

if numvar=. then do;

If your data contains special missing values, you can check for either an ordinary or special missing value with a statement that is similar to the following: 

if numvar<=.z then do;

To check for a missing character value, you can use a statement that is similar to the following: 

if charvar=' ' then do;

The MISSING function enables you to check for either a character or numeric missing value, as in: 

if missing(var) then do;

Only PDV is generated at compilation time, whereas the automatic variables _N_ , _ERROR_ are generated at execution time only.

Use the statements INFILE to point to the file reference(should be defined using Filename) / to the file path. Use INPUT statement to read the data into the sas dataset.

line pointer is used for multiple lines per observation

  @@ is used for multiple observations per line

Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.PROC SQL can save a great deal of time if u want to filter the variables while selecting or u can modify them …apply format….creating new variables , macro variables…as well as sub setting the data.PROC SQL offers great flexibility for joining tables.

1)Use OPTIONS OBS = 0
2) Creating a data set by using the like clause.
 ex: proc sql;
create table latha.emp like oracle.emp;
quit;
In this the like clause triggers the existing table structure to be copied to the new table. using this method 
result in the creation of an empty table.
3)  data check;
           set _null_;
     run;
4) data test;
        delete;
    run;

The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.

we can remove the format by using proc datasets:

Proc datasets;

modify <data set name>;

format <variable name>(which variable format needs to 

modify>;

run;

quit;

I have used the SAS/ACCESS SQL pass thru facility for connection with external databases and importing tables from them and also Microsoft access and excel files.Besides this, lot of times I have used PROC SQL for joining tables.

When data set is created sas create descriptor portion and data portion .that me sas stores the information like variable name ,length,type etc.

Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.

PROC SQL can save a great deal of time if u want to filter the variables while selecting or we can modify them, apply format and creating new variables, macro variables. as well as subsetting the data. PROC SQL offers great flexibility for joining tables.

When a sas code is submitted, SAS performs syntactical checks before executing the program/ code. In that case, one of the ways could be - at the beginning of the code, write OPTIONS OBS=0 in addition to other options and then RUN it. 

This way data will not be processed and the log shows error messages/ warnings, if any. 

If you are executing the SAS code on PC SAS, the highlighted colors itself shows the syntactical errors, if any.

A typical SAS program could contain DATA steps, PROC steps and macros. Macros are preprocessed. DATA steps are just in time compiled. PROC steps are interpreted in the order they appear in program. So when we submit a SAS program consisting of all these three components, the macro is compiled and executed first. If a DATA step is encountered, then it is compiled and executed. Note that the DATA step will not be executed if there is an error in the compilation. If a PROC step is encountered, it is interpreted and executed line by line. However i am not certain on this PROC step behavior.

  1. Compress - sas function which is used to remove spaces in string value and concatenate two values Without spaces.
  2. Input is another function of sas, it is one of conversion function in sas. It converts numeric into char.
  3. Put is another conversion function in sas. It converts char to numeric.
  4. SAS functions can be used to convert data and manipulate character variable values.

different types of functions:

  1. TRIM
  2. SUBSTR
  3. ABS
  4. SCAN

TRIM : Removing the trailing blanks from character expressions. 

               syntax=trim(argument)

SUBSTR: substr extracts the substring from an argument

               syntax=substr(argument, position<,n>)

Abs: Returns the absolute of the argument

                Syntax=abs(argument)

The most common functions that would be used are-

Conversion functions - Input / Put / int / ceil / floor

Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase

Arithmetic functions - Sum / abs /

Attribute info functions – Attrn / length

Dataset – open / close / exist

Directory  - dexist / dopen / dclose / dcreate / dinfo

File functions – fexist / fopen/ filename / fileref

SQL functions – coalesce / count / sum/ mean

Date functions – date / today / datdif / datepart / datetime / intck / mdy

Array functions – dim

INDEX: Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string.

INDEX (source, excerpt)

it returns the position where the 2nd field is in the source

Eg:  

     str1 = 'Hi i am fine here. how are u there ? .....';

     str2 = index(str1,'how'); --> str2 = 20

SCAN is to get a substring upto mentioned character. Scan function the best for locating the particular word specified in a argument,scan function default length is 200bytes.

data k;

r='ganesh kumar';

u=scan(r,2);

proc print;

run;

result:  kumar

INDEXC to locate only for mentioned one or more single character

        INDEXC(character-value, 'char1','char2','char3', ..)

FIND: Searches for a specific substring of characters within a character string

 FIND(string, substring<,modifiers><,startups>)

it returns the position the substring is in

FINDC: To locate a character that appears or does not appear within a string. used to search for any one in a list of character values

INDEXW: Searches for the substring as a word that follows a space in the provided sentence

eg;string1= "there is a the here" ;

INDEXW(STRING1,"the")   

result: 12 (the word "the")

If the data is continuosly in data set SAS would read the first words only from each line in the `datelines' block and it will ignore the rest of the line. if we use Trailing @@'it will read completly.and another type of trailing is using single @ this is a line hold specifier.

  • Trailing @ is used to hold the record in input buffer to execute another input statement on the same datelines.
  • Trailing @@ is used to hold the record in input buffer to execute same input statement on same datelines intel eof record.
  • The trailing @ or more technically, line hold specifiers are used to hold the pointer in the same record for multiple iterations. 
  • The two tyoes of line hold specifiers are single trailing(@) and double trailing(@@).
  • The single trailing hold the record until it encounters either another input statement or end of the data step.  

They are used for the records such as 

001F38   H

002 F 40 G

To read these values to the data step

Data example:

  input @10 type $ @;

  if type='H' then

    input @1 id @@4 gender $@@5 age2.;

  else if type='G' then

    input @1 id@@5 gender $@@7 age 2.;

  end;

cards;

001F38   H

002 F 40 G

;

run;

The double trailing holds the until the end of the record.

Data example2:

  input id age @@;

cards;

001 23 002 43 003 65 004 32 005 54

;

run;

Yes I have used it for summarization at times…For e.g if I have to calculate the max value of BP for patients 101 102 and 103 then I use the max (bpd) function to get the maximum value and use group by statement to group the patients accordingly.

I prefer Proc report as it is highly customizable and flexible where I can define each column in whatever way I  want to and even make use of SAS functions, logic processing, and assignment statements and create new. 

Variables for report making use of the compute block of proc report. I referred proc report because it is more efficient tool than tabulate because with report we can do me frequency and tabulate also.

  • Proc Summary defaults to NOPRINT Proc Me defaults to PRINT
  • If you omit the VAR statement, then PROC SUMMARY produces a simple count of observations, whereas PROC MEANS tries to analyze all the numeric variables that are not listed in the other statements
  • If you specify statistics on the PROC SUMMARY statement and the VAR statement is omitted, then PROC SUMMARY stops processing and an error message is written to the SAS log.
  • If you omit the VAR statement, then PROC MEANS analyzes all numeric variables that are not listed in the other statements. When all variables are character variables, PROC MEANS produces a simple count of observations.

NMISS OPTION is used for missing values.

The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.

PROC SQL;

CREATE TABLE BOTH AS

SELECT A.PATIENT,

A.DATE FORMAT=DATE@AS DATE,

A.PULSE,B.MED, B.DOSES,

B.AMT FORMAT=4.1

FROM VITALS A INNER JOIN DOSING B

ON (A.PATIENT = B.PATIENT)

AND(A.DATE = B.DATE)

ORDER BY PATIENT, DATE;

QUIT;