DS16
Q. 1) Write Ajax program to get book details from XML file when user select a book name. Create XML
file for storing details of book(title, author, year, price). [Marks 15]
Q. 2)Consider any text paragraph. Preprocess the text to remove any special characters and digits.
Generate the summary using extractive summarization process
import re
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
Sample text paragraph
text = """
Extractive summarization is a text summarization technique
"""
Preprocess the text to remove special characters and digits
processed_text = re.sub(r'\W+', ' ', text) # Remove special characters
processed_text = re.sub(r'\d+', ' ', processed_text) # Remove digits
Tokenize the text into sentences
sentences = sent_tokenize(processed_text)
Remove stopwords
stop_words = set(stopwords.words("english"))
filtered_sentences = [sentence for sentence in sentences if sentence.lower() not in stop_words]
Calculate importance score for each sentence using CountVectorizer and cosine similarity
vectorizer = CountVectorizer().fit_transform(filtered_sentences)
cosine_matrix = cosine_similarity(vectorizer, vectorizer)
Generate summary by selecting top-ranked sentences
summary_length = 3 # Number of sentences in the summary
summary_indices = cosine_matrix.argsort()[::-1][:summary_length] # Indices of top-ranked sentences
summary = [filtered_sentences[i] for i in sorted(summary_indices)]
Print the summary
print("\n".join(summary))