Document
has a ton of useful information, but depending on which Loader you choose, you may have to clean your data. In this case, you need to remove things like remaining \n
characters and broken, hyphenated words (e.g., alg o-\nrithms
→ algorithms
).
pipeline
and run it.
.describe_index_stats()
:
SemanticSplitterNodeParser
split your list of Documents into 46 Nodes.
vector_index
object by calling vector_index.as_query_engine().query(‘some query')
, but then you wouldn’t be able to specify the number of Pinecone search results you’d like to use as context.
To control how many search results your RAG app uses from your Pinecone index, you will instead create your Query Engine using the RetrieverQueryEngine class. This class allows you to pass in the retriever
created above, which you configured to retrieve the top 5 search results.
.source_nodes
attribute. Let’s inspect the first Node:
.passing
attribute.
Let’s see what happens when we send a totally out of scope query through your RAG app. Issue a random query you know your RAG app won’t be able to answer, given what’s in your index: