Using in-built function - setdiff(c (1, 3, 5, 7, 10), c (1, 5, 10, 11, 13))

Without using in-built function - c (1, 3, 5, 7, 10) [! c (1, 3, 5, 7, 10) %in% c (1, 5, 10, 11, 13).

Number.

read.csv () function is used to read a .csv file in R language.

**Below is a simple example –**

filcontent

print (filecontent)

save (x, file=”x.Rdata”)

mat<-matrix(rep(c(TRUE,FALSE),8),nrow=4)

sum(mat)

8

If two vectors with different lengths perform an operation –the elements of the shorter vector will be re-used to complete the operation. This is referred to as element recycling.

Example – Vector A <-c(1,2,0,4) and Vector B<-(3,6) then the result of A*B will be ( 3,12,0,24). Here 3 and 6 of vector B are repeated when computing the result.

A matrix of scatter plots can be produced using pairs. Pairs function takes various parameters like formula, data, subset, labels, etc.

**The two key parameters required to build a scatter plot matrix are –**

**formula-** A formula basically like ~a+b+c . Each term gives a separate variable in the pairs plots where the terms should be numerical vectors. It basically represents the series of variables used in pairs.

**data-** It basically represents the dataset from which the variables have to be taken for building a scatterplot.

**Using the below line of code-**

data(package = .packages(all.available = TRUE))

A factor variable can be converted to numeric using the as.numeric() function in R language. However, the variable first needs to be converted to character before being converted to numberic because the as.numeric() function in R does not return original values but returns the vector of the levels of the factor variable.

X <- factor(c(4, 5, 6, 6, 4))

X1 = as.numeric(as.character(X))

mean impute <- function(x) {x [is.na(x)] <- mean(x, na.rm = TRUE); x}

- Sample () function can be used to select a random sample of size ‘n’ from a huge dataset.
- Subset () function is used to select variables and observations from a given dataset.

The line of code in R language should begin with a hash symbol (#).

**R Commander is used to import data in R language. To start the R commander GUI, the user must type in the command Rcmdr into the console. There are 3 different ways in which data can be imported in R language-**

- Users can select the data set in the dialog box or enter the name of the data set (if they know).
- Data can also be entered directly using the editor of R Commander via Data->New Data Set. However, this works well when the data set is not too large.
- Data can also be imported from a URL or from a plain text file (ASCII), from any other statistical package or from the clipboard.

**There are various ways to do this**-

- It can be done using the match () function- match () function returns the first appearance of a particular element.
- The other is to use %in% which returns a Boolean value either true or false.
- Is.element () function also returns a Boolean value either true or false based on whether it is present in a vector or not.

R code can be tested using Hadley’s testthat package.

Trpose t () is the easiest method for reshaping the data before analysis.

CRAN package repository in R has more than 6000 packages, so a data scientist needs to follow a well-defined process and criteria to select the right one for a specific task. When looking for a package in the CRAN repository a data scientist should list out all the requirements and issues so that an ideal R package can address all those needs and issues.

The best way to wer this question is to look for an R package that follows good software development principles and practices. For example, you might want to look at the quality documentation and unit tests. The next step is to check out how a particular R package is used and read the reviews posted by other users of the R package. It is important to know if other data scientists or data analysts have been able to solve a similar problem as that of yours. When you in doubt choosing a particular R package, I would always ask for feedback from R community members or other colleagues to ensure that I am making the right choice.

R language has Homogeneous and Heterogeneous data structures.

**Homogeneous data structures have same type of objects –** Vector, Matrix ad Array.

** Heterogeneous data structures have different type of objects –** Data frames and lists.

Scalars, Matrices ad Vectors.

8TB is the memory limit for 64-bit system memory and 3GB is the limit for 32-bit system memory.

Using the loglm () function

- Bucket Sort
- Selection Sort
- Quick Sort
- Bubble Sort
- Merge Sort

Emp_sal= 2000+2.5(emp_age)2

Yes it is a linear equation as the coefficients are linear.

unclass (as.Date (“2016-10-05″))

boxplot () or text ()

character

It will generate 7 random numbers between 0 and 1.

Merge () function is used to combine two dataframes and it identifies common rows or columns between the 2 dataframes. Merge () function basically finds the intersection between two different sets of data.

**Merge () function in R language takes a long list of arguments as follows –**

**Syntax for using Merge function in R language -**

merge (x, y, by.x, by.y, all.x or all.y or all )

X represents the first dataframe.

Y represents the second dataframe.

**by.X-** Variable name in dataframe X that is common in Y.

**by.Y-** Variable name in dataframe Y that is common in X.

**all.x -** It is a logical value that specifies the type of merge. all.X should be set to true, if we want all the observations from dataframe X . This results in Left Join.

**all.y -** It is a logical value that specifies the type of merge. all.y should be set to true , if we want all the observations from dataframe Y . This results in Right Join.

**all –** The default value for this is set to FALSE which me that only matching rows are returned resulting in Inner join. This should be set to true if you want all the observations from dataframe X and Y resulting in Outer join.