OneCompiler

DS16

283

Q. 1) Write Ajax program to get book details from XML file when user select a book name. Create XML
file for storing details of book(title, author, year, price). [Marks 15]

<?xml version="1.0" encoding="UTF-8"?> <books> <book> <title>Book 1</title> <author>Author 1</author> <year>2000</year> <price>20</price> </book> <book> <title>Book 2</title> <author>Author 2</author> <year>2005</year> <price>25</price> </book> <book> <title>Book 3</title> <author>Author 3</author> <year>2010</year> <price>30</price> </book> </books>

Q. 2)Consider any text paragraph. Preprocess the text to remove any special characters and digits.
Generate the summary using extractive summarization process
import re
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Sample text paragraph

text = """
Extractive summarization is a text summarization technique

"""

Preprocess the text to remove special characters and digits

processed_text = re.sub(r'\W+', ' ', text) # Remove special characters
processed_text = re.sub(r'\d+', ' ', processed_text) # Remove digits

Tokenize the text into sentences

sentences = sent_tokenize(processed_text)

Remove stopwords

stop_words = set(stopwords.words("english"))
filtered_sentences = [sentence for sentence in sentences if sentence.lower() not in stop_words]

Calculate importance score for each sentence using CountVectorizer and cosine similarity

vectorizer = CountVectorizer().fit_transform(filtered_sentences)
cosine_matrix = cosine_similarity(vectorizer, vectorizer)

Generate summary by selecting top-ranked sentences

summary_length = 3 # Number of sentences in the summary
summary_indices = cosine_matrix.argsort()[::-1][:summary_length] # Indices of top-ranked sentences
summary = [filtered_sentences[i] for i in sorted(summary_indices)]

Print the summary

print("\n".join(summary))