Top 50 Sas Programming Interview Questions You Must Prepare 19.Mar.2024

PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new dataset all in one step! PROC SQL uses fewer resources when compared to that of data and proc steps. To join files in PROC SQL it does not require to sort the data prior to merging, which is must, is data merge.

Proc Freq, Proc univariate, Proc Tabulate & Proc Report.

Is anyone wondering why you wouldn’t just use total=field1+field2+field3;

Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null.

Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets.

SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12.

The IN=variablesWhat if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the "IN=" variables, so that you can do this and more. Here's what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them:data three;merge one(in=x) two(in=y); /* x & y are your choices of names */by id; /* for the IN= variables for data */if x=1 and y=1; /* sets one and two respectively */run;

Missing values will be assigned as missing in Assignment statement. Sort order treats missing as second smallest followed by underscore.

The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step. What is the order of evaluation of the comparison operators: + - * / ** ()?A) (), **, *, /, +, -.

First thing is look into Log for errors or warning or NOTE in some cases or use the debugger in SAS data step.

MOD: Modulo is a constant or numeric variable, the function returns the reminder after numeric value divided by modulo.

INT: It returns the integer portion of a numeric value truncating the decimal portion.

PAD: it pads each record with blanks so that all data lines have the same length. It is used in the INFILE statement. It is useful only when missing data occurs at the end of the record.

CATX: concatenate character strings, removes leading and trailing blanks and inserts separators.

SCAN: it returns a specified word from a character value. Scan function assigns a length of 200 to each target variable.

SUBSTR: extracts a sub string and replaces character values.Extraction of a substring:
Middleinitial=substr(middlename,1,1); Replacing character values: substr (phone,1,3)=’433’; If SUBSTR function is on the left side of a statement, the function replaces the contents of the character variable.

TRIM: trims the trailing blanks from the character values.

SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the sub string to extract from a character value.

SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using proc contents procedure.

It has only to values, which are 1 for error and 0 for no error.

It implies that automatic conversion took place to make character functions possible.

Missing semicolon and not checking log after submitting program, Not using debugging techniques and not using Fsview option vigorously.

When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or Both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop.

Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc Univariate etc.

NODUP compares all the variables in our dataset while NODUPKEY compares just the BY variables.

I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the look up of my table, (ex: Width option, by this we can change the width of each column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.

It is an approach to import text files with SAS (It comes free with Base SAS version 9.0).

To create CSV file, we have to open notepad, then, declare the variables.

  proc import datafile='E:age.csv' 
out=sarath dbms=csv replace;
getnames=yes;
run;

Using Subset functions like IF then Else, Where and Select.

To determine the number of missing values that are excluded in a computation, use the NMISS function.

data _null_;
m = . ;
y = 4 ;
z = 0 ;
N = N(m , y, z);
NMISS = NMISS (m , y, z);
run;

The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values).

SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc.
SAS/Connect only use Server connection.

When SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step. Following the data step then you can avoid the usage of the run statement.

Creating a data set by using the like clause.ex: proc sql;create table latha.emp like oracle.emp;quit;In this the like clause triggers the existing table structure to be copied to the new table. using this method result in the creation of an empty table.
In the editor window we write%include 'path of the sas file';run;if it is with non-windowing environment no need to give run statement.

The main advantage of version 9 is faster execution of applications and centralized access of data and support.

There are lots of changes has been made in the version 9 when we compared with the version @The following are the few:

  • SAS version 9 supports Formats longer than 8 bytes & is not possible with version @
  • Length for Numeric format allowed in version 9 is 32 where as 8 in version @
  • Length for Character names in version 9 is 31 where as in version 8 is 3@
  • Length for numeric informat in version 9 is 31, 8 in version @

Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date, time and datetime forms of data into a SAS date or SAS time.

  • ANYDTDTEW. - Converts to a SAS date value
  • ANYDTTMEW. - Converts to a SAS time value.
  • ANYDTDTMW. -Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable at execution time in the data step by
  • Trimming trailing blanks
  • Automatically converting numeric value to character.

New ODS option (COLUMN OPTION) is included to create a multiple columns in the output.

By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values.

Will either read or writes all numeric and character variables in dataset.

Efficiency and performance strategies can be classified into 5 different areas.
•CPU time
•Data Storage
• Elapsed time
• Input/Output
• Memory CPU Time and Elapsed Time- Base line measurements.

It will display the execution of whole program and the logic. It will also display the error with line number so that you can and edit the program.

Input data set options are obs, firstobs, where, in output data set options compress, reuse.Both input and output dataset options include keep, drop, rename, obs, first obs.

The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs.However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of fielda--zfield); Just make sure you remember the “of” and the double dashes or your code will run but you won’t get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values.There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985.

The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable values from an analysis.

Declare array for number of variables in the record and then used Do loop Proc Transpose with VAR statement.

The –ERROR- variable has a value of 1 if there is an error in the data for that observation and 0 if it is not.

INTNX: INTNX function advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value.
Ex: INTNX(interval,start-from,number-of-increments,alignment)

INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the number of intervals between two give SAS dates, Time and/or datetime.

DATETIME () returns the current date and time of day.

DATDIF (sdate,edate,basis): returns the number of days between two dates.

When you submit a DATA step, SAS processes the DATA step and then creates a new SAS data set.( creation of input buffer and PDV)
Compilation Phase
Execution Phase.

If don’t use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 & a6 and a@It is true for mean option also.

Proc Sort with nodupkey option, because it will eliminate the duplicate values.

Just use: missing_values=MISSING(field1,field2,field3);
This function simply returns 0 if there aren't any or 1 if there are missing values.If you need to know how many missing values you have then use num_missing=NMISS(field1,field2,field3);
You can also find the number of non-missing values with non_missing=N (field1,field2,field3);

Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result.