okram1


Slip 1
Q1.Write a R program to add, multiply and divide two vectors of integer type.
(Vector length should
be minimum 4).
Answer:
vector1 = seq(10,40 , length.out=4)
vector2 = c(20, 10, 40, 40)
print("Original Vectors:")
print(vector1)
print(vector2)
add= vector1+vector2
cat("Sum of vector is ",add, "\n")
sub_vector= vector1-vector2
cat("Substraction of vector is ",sub_vector, "\n")
mul_vector= vector1 * vector2
cat("Multiplication of vector is ",mul_vector, "\n")
print("Division of two Vectors:")
"Division of two Vectors:"
div_vector = vector1 / vector2
print(div_vector)
Q2.Consider the student data set. It can be downloaded from:
programme
in python to apply simple linear regression and find out mean absolute error,
mean squared error
and root mean squared error.
Answer:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

Load the dataset

data = pd.read_csv('student_data.csv')

Assuming you have two columns: 'independent_variable' and 'dependent_variable'

X = data['independent_variable'].values.reshape(-1, 1)
y = data['dependent_variable']

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Create a linear regression model

model = LinearRegression()

Fit the model on the training data

model.fit(X_train, y_train)

Predict the values on the test data

y_pred = model.predict(X_test)

Calculate Mean Absolute Error (MAE)

mae = mean_absolute_error(y_test, y_pred)

Calculate Mean Squared Error (MSE)

mse = mean_squared_error(y_test, y_pred)

Calculate Root Mean Squared Error (RMSE)

rmse = np.sqrt(mse)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)
Slip 2
Q1. Write an R program to calculate the multiplication table using a function.
Answer:
table<-function(number)
{
for(t in 1:10)
{
print(paste(number,'',t,'=',numbert))
}
}
table(2)
Q2. Write a python program to implement k-means algorithms on a synthetic
dataset.
Answer:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

Create a synthetic dataset

np.random.seed(0)
points = np.vstack(((np.random.randn(150, 2) * 0.75 + np.array([1, 0])),
(np.random.randn(50, 2) * 0.25 + np.array([-0.5, 0.5])),
(np.random.randn(50, 2) * 0.5 + np.array([-0.5, -0.5]))))

Specify the number of clusters

kmeans = KMeans(n_clusters=3)

Fit the model to the data

kmeans.fit(points)

Get the cluster assignments for each data point

labels = kmeans.labels_

Plot the data colored by the cluster assignments

plt.scatter(points[:, 0], points[:, 1], c=labels)
plt.show()
Slip 3
Q1. Write a R program to reverse a number and also calculate the sum of digits of
that number.
Answer:
n=567
Reverse=function(n)
{
sum=0
rev=0
while(n>0)
{
r=n%%10
sum=sum+r
rev=rev*10+r
n=n%/%10
}
print(rev)
print(sum)
}
Reverse(n)
Q2. Consider the following observations/data. And apply simple linear regression
and find out
estimated coefficients b0 and b1.( use numpy package)
x=[0,1,2,3,4,5,6,7,8,9,11,13]
y = ([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18]
Answer:
import numpy as np

Given data

x = np.array([0,1,2,3,4,5,6,7,8,9,11,13])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18])

Calculate the means of x and y

x_mean = np.mean(x)
y_mean = np.mean(y)

Calculate the terms needed for the numator and denominator of beta

xycov = (x * y).mean() - x_mean * y_mean
xvar = (x ** 2).mean() - x_mean ** 2

Calculate beta

beta = xycov / xvar

Calculate alpha

alpha = y_mean - (beta * x_mean)
print(f'The estimated coefficients are b0 = {alpha} and b1 = {beta}')
Slip 4
Q1. Write a R program to calculate the sum of two matrices of given size.
Answer:
matrix1<-matrix(c(1,2,3,4,5,6),nrow=2)
print(matrix1)
matrix2<-matrix(c(7,8,9,10,11,12),nrow=2)
print(matrix2)
result<-matrix1+matrix2
cat("Addition : ","\n")
print(result)
Q2. Consider following dataset
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','S
unny','Rainy','Sunny','
Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','
Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','
No'].
Use Naïve Bayes algorithm to predict [0: Overcast, 2: Mild] tuple belongs to
which class whether to
play the sports or not.
Answer:
from sklearn import preprocessing
from sklearn.naive_bayes import GaussianNB

Given data

weather =
['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny','R
ainy','Sunny','Overcast','Overcast','Rainy']
temp =
['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild'
,'Hot','Mild']
play =
['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

Creating labelEncoder

le = preprocessing.LabelEncoder()

Converting string labels into numbers

weather_encoded = le.fit_transform(weather)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)

Combinig weather and temp into single list of tuples

features = list(zip(weather_encoded,temp_encoded))

Create a Gaussian Classifier

model = GaussianNB()

Train the model using the training sets

model.fit(features,label)

Predict Output

predicted = model.predict([[0,2]]) # 0:Overcast, 2:Mild
print(f"The predicted value is: {'Yes' if predicted==1 else 'No'}")
Slip 5
Q1. Write a R program to concatenate two given factors.
Answer:
f1 <- factor(sample(LETTERS, size=6, replace=TRUE))
f2 <- factor(sample(LETTERS, size=6, replace=TRUE))
print("Original factors:")
print(f1)
print(f2)
f = factor(c(levels(f1)[f1], levels(f2)[f2]))
print("After concatenate factor becomes:")
print(f)
Q2. Write a Python program build Decision Tree Classifier using Scikit- learn
package for diabetes
data set (download database from
Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

Load the dataset

data = pd.read_csv('diabetes.csv')

Split the dataset into features and target variable

X = data.drop('Outcome', axis=1)
y = data['Outcome']

Split the dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)

Create a Decision Tree Classifier object

clf = DecisionTreeClassifier()

Train Decision Tree Classifier

clf = clf.fit(X_train,y_train)

Predict the response for test dataset

y_pred = clf.predict(X_test)

Model Accuracy

print("Accuracy:", accuracy_score(y_test, y_pred))
Slip 6
Q1. Write a R program to create a data frame using two given vectors and display
the duplicate
elements.
Answer:
companies <- data.frame(Shares = c("TCS", "Reliance", "HDFC Bank", "Infosys",
"Reliance"),
Price = c(3200, 1900, 1500, 2200, 1900))
companies
cat("After removing Duplicates ", "\n")
companies[duplicated(companies),]
Q2. Write a python program to implement hierarchical Agglomerative clustering
algorithm.
(Download Customer.csv dataset from github.com).
Answer:
import pandas as pd
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

Load the dataset

data = pd.read_csv('Customer.csv')

Specify the number of clusters

cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')

Fit the model to the data

cluster.fit_predict(data)

Plot the clusters

plt.scatter(data.iloc[:,0], data.iloc[:,1], c=cluster.labels_, cmap='rainbow')
plt.show()
Slip 7
Q1. Write a R program to create a sequence of numbers from 20 to 50 and find the
mean of
numbers from 20 to 60 and sum of numbers from 51 to 91.
Answer:
print("Sequence of numbers from 20 to 50:")
print(seq(20,50))
print("Mean of numbers from 20 to 60:")
print(mean(20:60))
print("Sum of numbers from 51 to 91:")
print(sum(51:91))
Q2. Consider the following observations/data. And apply simple linear regression
and find out
estimated coefficients b1 and b1 Also analyse the performance of the model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
Answer:
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Slip 8
Q1. Write a R program to get the first 10 Fibonacci numbers.
Answer:
Fibonacci <- numeric(10)
Fibonacci[1] <- Fibonacci[2] <- 1
for (i in 3:10) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]
print("First 10 Fibonacci numbers:")
print(Fibonacci)
Q2. Write a python program to implement k-means algorithm to build prediction
model (Use Credit
Card Dataset Download from )
Answer:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

Load the dataset

data = pd.read_csv('CC GENERAL.csv')

Preprocess the data

data = data.drop('CUST_ID', axis=1)
data.fillna(method ='ffill', inplace=True)

Standardize the data

scaler = StandardScaler()
data = scaler.fit_transform(data)

Specify the number of clusters

kmeans = KMeans(n_clusters=5)

Fit the model to the data

kmeans.fit(data)

Get the cluster assignments for each data point

labels = kmeans.labels_
print(f'The cluster assignments for the data points are: {labels}')
Slip 9
Q1. Write an R program to create a Data frames which contain details of 5
employees and display
summary of the data.
Answer:
Employees = data.frame(Name=c("Alexa" , " OK Google" ,"Siri ", "Shreyas ","Bili
mosa"),
Gender=c("M","M","F","M","F"),
Age=c(23,22,25,20,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576"))
print("Details of the employees:")
print(Employees)
Q2. Write a Python program to build an SVM model to Cancer dataset. The dataset
is available in
the scikit-learn library. Check the accuracy of model with precision and recall.
Answer:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics

Load dataset

cancer = datasets.load_breast_cancer()

Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
test_size=0.3,random_state=109)

Create a svm Classifier

clf = svm.SVC(kernel='linear')

Train the model using the training sets

clf.fit(X_train, y_train)

Predict the response for test dataset

y_pred = clf.predict(X_test)

Model Accuracy

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Model Precision

print("Precision:",metrics.precision_score(y_test, y_pred))

Model Recall

print("Recall:",metrics.recall_score(y_test, y_pred))
Slip 10
Q1. Write a R program to find the maximum and the minimum value of a given
vector.
Answer:
nums = c(10, 20, 30, 40, 50, 60)
print('Original vector:')
print(nums)
print(paste("Maximum value of the said vector:",max(nums)))
print(paste("Minimum value of the said vector:",min(nums)))
Q2. Write a Python Programme to read the dataset (“”). dataset download
from
(s) and apply Apriori algorithm
Answer:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

Load the dataset

data = pd.read_csv('Iris.csv')

Preprocess the data

data = data.astype(str)
dataset = data.values.tolist()

Apply the transaction encoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

Apply the Apriori algorithm

frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
print(frequent_itemsets)
Slip 11
Q1. Write a R program to find all elements of a given list that are not in
another given list.
AB= list("x", "y", "z")
= list("X", "Y", "Z", "x", "y", "z")
Answer:
l1 = list("x", "y", "z")
l2 = list("X", "Y", "Z", "x", "y", "z")
print("Original lists:")
print(l1)
print(l2)
print("All elements of l2 that are not in l1:")
setdiff(l2, l1)
Q2. Write a python program to implement hierarchical clustering algorithm.
(Download Wholesale
customers data dataset from).
Answer:

import pandas as pd
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

Load the dataset

data = pd.read_csv('Wholesale customers data.csv')

Preprocess the data

data = data.drop('Channel', axis=1)
data = data.drop('Region', axis=1)

Specify the number of clusters

cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')

Fit the model to the data

cluster.fit_predict(data)

Plot the clusters

plt.scatter(data.iloc[:,0], data.iloc[:,1], c=cluster.labels_, cmap='rainbow')
plt.show()
Slip 12
Q1. Write a R program to create a Dataframes which contain details of 5 employees
and display the
details.
Employee contain (empno,empname,gender,age,designation)
Answer:
Employees = data.frame(empno=c(1,2,3,4,5),
empname=c("Amit S","Dikish R","Shweta J", "Jikita A","Riya M"),
Gender=c("M","M","F","F","F"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"))
print("Details of the employees:")
print(Employees)
Q2. Write a python program to implement multiple Linear Regression model for a
car dataset.
Dataset can be downloaded from:
Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

Load the dataset

data = pd.read_csv('cars.csv')

Split the dataset into features and target variable

X = data[['Mileage', 'Age']]
y = data['Sell Price']

Split the dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

Create a linear regression model

model = LinearRegression()

Train the model using the training sets

model.fit(X_train, y_train)

Predict the response for test dataset

y_pred = model.predict(X_test)

Model Accuracy

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test,
y_pred)))
Slip 13
Q1. Draw a pie chart using R programming for the following data distribution:
Answer:

Create data for the graph.

digits <- c(7,2,6,3,4,8)
Frequency <- c(1,2,3,4,5,6)

Plot the chart.

pie(digits, Frequency)
Q2. Write a Python program to read “StudentsPerformance.csv” file. Solve
following:

  • To display the shape of dataset.
  • To display the top rows of the dataset with their columns. Note: Download
    dataset from following
    link :

Answer:
import pandas as pd

Load the dataset

data = pd.read_csv('StudentsPerformance.csv')

Display the shape of the dataset

print('Shape of the dataset:', data.shape)

Display the top rows of the dataset with their columns

print(data.head())
Slip 14
Q1. Write a script in R to create a list of employees (name) and perform the
following:
a. Display names of employees in the list.
b. Add an employee at the end of the list
c. Remove the third element of the list
Answer:
list_data <- list("Ram Sharma","Sham Varma","Raj Jadhav", "Ved Sharma")
print(list_data)
new_Emp <-"Kavya Anjali"
list_data <-append(list_data,new_Emp)
print(list_data)
list_data[3] <- NULL
print(list_data)
Q2. Write a Python Programme to apply Apriori algorithm on Groceries dataset.
Dataset can be
downloaded from
Also display support and confidence for each rule.
Answer
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

Load the dataset

data = pd.read_csv('Groceries_dataset.csv')

Preprocess the data

data = data.groupby(['Member_number',
'Date'])['itemDescription'].apply(list).reset_index()
dataset = data['itemDescription'].tolist()

Apply the transaction encoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

Apply the Apriori algorithm

frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)
print(frequent_itemsets)
Slip 15
Q1.Write a R program to add, multiply and divide two vectors of integer type.
(vector length should
be minimum 4)
Answer:
vector1 = seq(10,40 , length.out=4)
vector2 = c(20, 10, 40, 40)
print("Original Vectors:")
print(vector1)
print(vector2)
add= vector1+vector2
cat("Sum of vector is ",add, "\n")
sub_vector= vector1-vector2
cat("Substraction of vector is ",sub_vector, "\n")
mul_vector= vector1 * vector2
cat("Multiplication of vector is ",mul_vector, "\n")
print("Division of two Vectors:")
div_vector = vector1 / vector2
print(div_vector)
Q2. Write a Python program build Decision Tree Classifier for from
pandas and predict
class label for show starring a 40 years old American comedian, with 10 years of
experience, and a
comedy ranking of 7? Create a csv file as shown in
Answer:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

Load the dataset

data = pd.read_csv('shows.csv')

Split the dataset into features and target variable

X = data[['Age', 'Experience', 'Rank']]
y = data['Nationality']

Create a Decision Tree Classifier object

clf = DecisionTreeClassifier()

Train the model using the training sets

clf = clf.fit(X, y)

Predict the class label for a show starring a 40 years old American comedian,

with 10 years of experience, and a comedy ranking of 7
prediction = clf.predict([[40, 10, 7]])
print(f'The predicted class label is: {prediction[0]}')
or
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

Load the dataset

data = pd.read_csv('diabetes.csv')

Split the dataset into features and target variable

X = data.drop('Outcome', axis=1)
y = data['Outcome']

Split the dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)

Create a Decision Tree Classifier object

clf = DecisionTreeClassifier()

Train the model using the training sets

clf = clf.fit(X_train,y_train)

Predict the response for test dataset

y_pred = clf.predict(X_test)

Model Accuracy

print("Accuracy:", accuracy_score(y_test, y_pred))
Slip 16
Q1. Write a R program to create a simple bar plot of given data.
Answer:

Import lattice

library(lattice)

Create data

gfg <- data.frame(x = c(26,35,32,40,35,50),
grp = rep(c("group 1", "group 2",
"group 3"),
each = 2),
subgroup = LETTERS[1:2])

Create grouped barplot using lattice

barchart(x ~ grp, data = gfg, groups = subgroup)
Q2. Write a Python program build Decision Tree Classifier using Scikit-learn
package for diabetes
data set (
Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

Load the dataset

data = pd.read_csv('diabetes.csv')

Split the dataset into features and target variable

X = data.drop('Outcome', axis=1)
y = data['Outcome']

Split the dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)

Create a Decision Tree Classifier object

clf = DecisionTreeClassifier()

Train the model using the training sets

clf = clf.fit(X_train,y_train)

Predict the response for test dataset

y_pred = clf.predict(X_test)

Model Accuracy

print("Accuracy:", accuracy_score(y_test, y_pred))
Slip 17
Q1. Write a R program to get the first 20 Fibonacci numbers.
Answer:
Fibonacci <- numeric(20)
Fibonacci[1] <- Fibonacci[2] <- 1
for (i in 3:20) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]
print("First 20 Fibonacci numbers:")
print(Fibonacci)
Q2. Write a python programme to implement multiple linear regression model for
stock market data
frame as follows:
Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2
016,20,16,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5
.9,6.2,6.2,6.1],
'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,
965,943,958,971,949,884,866,876,822,704,719] }
And draw a graph of stock market price verses interest rate.
Answer:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

Provided dataset

Stock_Market = {
'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,
2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,
1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.
9,6.2,6.2,6.1],
'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971
,949,884,866,876,822,704,719]
}

Creating a DataFrame from the dictionary

df = pd.DataFrame(Stock_Market)

Selecting columns for regression

X = df[['Interest_Rate', 'Unemployment_Rate']]
y = df['Stock_Index_Price']

Creating and fitting the regression model

model = LinearRegression()
model.fit(X, y)

Predicting values

predicted_stock_price = model.predict(X)

Plotting Stock Market Price versus Interest Rate

plt.scatter(df['Interest_Rate'], df['Stock_Index_Price'], color='blue',
label='Actual')
plt.plot(df['Interest_Rate'], predicted_stock_price, color='red',
label='Predicted')
plt.title('Stock Market Price vs Interest Rate')
plt.xlabel('Interest Rate')
plt.ylabel('Stock Market Price')
plt.legend()
plt.show()
Slip 18
Q1. Write a R program to find the maximum and the minimum value of a given vector
Answer:
nums = c(10, 20, 30, 40, 50, 60)
print('Original vector:')
print(nums)
print(paste("Maximum value of the said vector:",max(nums)))
print(paste("Minimum value of the said vector:",min(nums)))
Q2. Consider the following observations/data. And apply simple linear regression
and find out
estimated coefficients b1 and b1 Also analyse the performance of the model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
Answer:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

Given data

x = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = np.array([7, 14, 15, 18, 19, 21, 26, 23])

Creating a linear regression model

model = LinearRegression()

Fitting the model

model.fit(x, y)

Estimated coefficients

b0 = model.intercept_
b1 = model.coef_[0]
print(f"Coefficient b0 (intercept): {b0}")
print(f"Coefficient b1: {b1}")

Predicting y values based on the model

y_pred = model.predict(x)

Model evaluation

mse = mean_squared_error(y, y_pred)
r_squared = r2_score(y, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r_squared}")
Slip 19
Q1. Write a R program to create a Dataframes which contain details of 5 Students
and display the
details.
Students contain (Rollno,Studname,Address,Marks)
Answer:
Students = data.frame(Rollno=c(21,22,23,24,25),
Name=c("Riya M","Shweta J","Aarya D", "JAMES A","LAURA M"),
Addresss=c("Bhekrai nagar","Hadapsar","Uruli kanchan","Hadapsar","Bhekrai
nagar"),
Marks=c(80,67,90,92,70))
print("Details of the Students:")
print(Students)
Q2. Write a python program to implement multiple Linear Regression model for a
car dataset.
Dataset can be downloaded from:
Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

Load the dataset using pandas

Replace 'path_to_your_file.csv' with the actual path to your downloaded CSV

file
data = pd.read_csv('path_to_your_file.csv')

Assuming your dataset has columns like 'X1', 'X2', 'X3', and 'Y'

Replace these column names with the actual columns in your dataset

X = data[['X1', 'X2', 'X3']] # Features
y = data['Y'] # Target variable

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Create a linear regression model

model = LinearRegression()

Fit the model using the training data

model.fit(X_train, y_train)

Make predictions using the testing set

y_pred = model.predict(X_test)

Evaluate the model

mse = mean_squared_error(y_test, y_pred)
r_squared = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r_squared}")
Slip 20
Q1. Write a R program to create a data frame from four given vectors.
Answer:
name = c('Aarya', 'Riya', 'Shweta', 'Anjali', 'Geeta', 'Mayuri', 'Kirti',
'Akansha', 'Kavita', 'Jagruti')
score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8, 19)
attempts = c(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)
qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes')
print("Original data frame:")
print(name)
print(score)
print(attempts)
print(qualify)
df = data.frame(name, score, attempts, qualify)
print(df)
Q2. Write a python program to implement hierarchical Agglomerative clustering
algorithm.
(Download ).
Answer:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram

Load the dataset using pandas

Replace 'path_t' with the actual path to your downloaded CSV

file
data = pd.read_csv('Customer.csv')

Display the first few rows to understand the structure of the data

print(data.head())

Assuming you want to use 'Age', 'Annual Income (k$)', and 'Spending Score (1-

100)' for clustering
X = data[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]

Perform hierarchical agglomerative clustering

model = AgglomerativeClustering(n_clusters=5, affinity='euclidean',
linkage='ward')
clusters = model.fit_predict(X)

Visualize the dendrogram

from scipy.cluster import hierarchy
Z = hierarchy.linkage(X, 'ward')
plt.figure(figsize=(12, 6))
dendrogram(Z)
plt.title('Dendrogram')
plt.xlabel('Samples')
plt.ylabel('Distance')
plt.show()

Add the cluster labels to the original dataset

data['Cluster'] = clusters

Display the dataset with assigned cluster labels

print(data.head())