Using Databricks and Pinecone to create and index vector embeddings at scale
withColumn
adds a column to the dataframe, containing a simple increasing identifier that you cast to a string.
mapPartitions
function, which provides finer control over the execution of the UDF by explicitly applying it to each partition of the RDD.