"""
from sklearn import feature_extraction, metrics
from pyxll import xl_func, xl_return
from typing import List, Any
import pandas as pd


@xl_func(name="NLP.VLOOKUP")
@xl_return("dataframe<index=False, columns=False>")
def nlp_vlookup(value: str,
                table: List[List[Any]],
                col_index: int = None,
                include_score: bool = False,
                all_matches: bool = False,
                threshold: float = 0.5) -> pd.DataFrame:
    """
    Do a 'VLOOKUP' for value in table and return the value at col_index
    of the matching row.
    :param value: Value to match.
    :param table: Table of input values with the candidates in the left most column.
    :param col_index: Index of column in the table to return.
    :param include_score: Include the score in the return value.
    :param all_matches: If True return a table of all matches.
    :param threshold: Exclude matches below this threshold (0-1).
    """
    # Get the first column in the table as the list of words to match.
    words = [str(x[0]) for x in table]

    # Vectorize all the strings by creating a Bag-of-Words matrix, which extracts
    # the vocabulary from the corpus and counts how many times the words appear in each string.
    vectorizer = feature_extraction.text.CountVectorizer()
    vectors = vectorizer.fit_transform([value]+words).toarray()

    # Then, we calculate the cosine similarity, a measure based on the angle between two
    # non-zero vectors, which equals the inner product of the same vectors normalized to both
    # have length 1.
    cosine_sim = metrics.pairwise.cosine_similarity(list(vectors))

    # Get the scores for each word and put them into a DataFrame.
    scores = cosine_sim[0][1:]
    scores_df = pd.DataFrame({"score": scores}, index=words)

    # Join the input table and the scores into a single DataFrame.
    table_df = pd.DataFrame(table, index=words)
    df = table_df.join(scores_df)

    # Filter out any with a score below the threshold.
    df = df[df["score"] >= threshold]

    # If there are no matches then raise an exception.
    if not len(df.index):
        raise ValueError("No matches found")

    # Sort by score.
    df = df.sort_values(by="score", ascending=False)

    # Get the top result if not returning all matches.
    if not all_matches:
        df = df.head(1)

    # Reindex to get only the columns we're interested in and put the score first.
    columns = table_df.columns.to_list() if col_index is None else [col_index-1]
    if include_score:
        columns = ["score"] + columns
    df = df.reindex(columns=columns)

    return df


if __name__ == "__main__":
    words = [
        ["Bridgewater Associates", "United States Westport, CT", "$98,918"],
        ["Renaissance Technologies", "United States East Setauket, NY", "$70,000"],
        ["Man Group", "United Kingdom London", "$62,300"],
        ["Millennium Management", "United States New York City, NY", "$43,912"],
        ["Elliott Management", "United States New York City, NY", "$42,000"],
        ["BlackRock", "United States New York City, NY", "$39,907"],
        ["Two Sigma Investments", "United States New York City, NY", "$38,842"],
        ["The Children's Investment Fund Management", "United Kingdom London", "$35,000"],
        ["Citadel LLC", "United States Chicago, IL", "$34,340"],
        ["D.E. Shaw & Co.", "United States New York City, NY", "$34,264"],
        ["AQR Capital Management", "United States Greenwich, CT", "$32,100"],
        ["Davidson Kempner Capital Management", "United States New York City, NY", "$31,850"],
        ["Farallon Capital", "United States San Francisco, CA", "$30,000"],
        ["Baupost Group", "United States Boston, MA", "$29,100"],
        ["Marshall Wace", "United Kingdom London", "$27,800"],
        ["Capula Investment Management", "United Kingdom London", "$23,000"],
        ["Canyon Capital Advisors", "United States Los Angeles, CA", "$22,800"],
        ["Wellington Management Company", "United States Boston, MA", "$21,000"],
        ["Viking Global Investors", "United States Greenwich, CT", "$19,950"],
        ["PIMCO", "United States Newport Beach, CA", "$17,453"],
    ]

    match = nlp_vlookup("Capital", words, include_score=True, all_matches=True, threshold=0.5)
    print(match)

created 3 years ago by Kieu Nguyen

Python Online Compiler

Write, Run & Share Python code online using OneCompiler's Python online compiler for free. It's one of the robust, feature-rich online compilers for python language, supporting both the versions which are Python 3 and Python 2.7. Getting started with the OneCompiler's Python editor is easy and fast. The editor shows sample boilerplate code when you choose language as Python or Python2 and start coding.

Taking inputs (stdin)

OneCompiler's python online editor supports stdin and users can give inputs to programs using the STDIN textbox under the I/O tab. Following is a sample python program which takes name as input and print your name with hello.

import sys
name = sys.stdin.readline()
print("Hello "+ name)

About Python

Python is a very popular general-purpose programming language which was created by Guido van Rossum, and released in 1991. It is very popular for web development and you can build almost anything like mobile apps, web apps, tools, data analytics, machine learning etc. It is designed to be simple and easy like english language. It's is highly productive and efficient making it a very popular language.

Tutorial & Syntax help

Loops

1. If-Else:

When ever you want to perform a set of operations based on a condition IF-ELSE is used.

if conditional-expression
    #code
elif conditional-expression
    #code
else:
    #code

Note:

Indentation is very important in Python, make sure the indentation is followed correctly

2. For:

For loop is used to iterate over arrays(list, tuple, set, dictionary) or strings.

Example:

mylist=("Iphone","Pixel","Samsung")
for i in mylist:
    print(i)

3. While:

While is also used to iterate a set of statements based on a condition. Usually while is preferred when number of iterations are not known in advance.

while condition  
    #code

Collections

There are four types of collections in Python.

1. List:

List is a collection which is ordered and can be changed. Lists are specified in square brackets.

Example:

mylist=["iPhone","Pixel","Samsung"]
print(mylist)

2. Tuple:

Tuple is a collection which is ordered and can not be changed. Tuples are specified in round brackets.

Example:

myTuple=("iPhone","Pixel","Samsung")
print(myTuple)

Below throws an error if you assign another value to tuple again.

myTuple=("iPhone","Pixel","Samsung")
print(myTuple)
myTuple[1]="onePlus"
print(myTuple)

3. Set:

Set is a collection which is unordered and unindexed. Sets are specified in curly brackets.

Example:

myset = {"iPhone","Pixel","Samsung"}
print(myset)

4. Dictionary:

Dictionary is a collection of key value pairs which is unordered, can be changed, and indexed. They are written in curly brackets with key - value pairs.

Example:

mydict = {
    "brand" :"iPhone",
    "model": "iPhone 11"
}
print(mydict)

Supported Libraries

Following are the libraries supported by OneCompiler's Python compiler

Name	Description
NumPy	NumPy python library helps users to work on arrays with ease
SciPy	SciPy is a scientific computation library which depends on NumPy for convenient and fast N-dimensional array manipulation
SKLearn/Scikit-learn	Scikit-learn or Scikit-learn is the most useful library for machine learning in Python
Pandas	Pandas is the most efficient Python library for data manipulation and analysis
DOcplex	DOcplex is IBM Decision Optimization CPLEX Modeling for Python, is a library composed of Mathematical Programming Modeling and Constraint Programming Modeling