PDF/网页/文档
↓
DocumentLoader
↓
TextSplitter
↓
Embeddings(OpenAI等)
↓
VectorStore(FAISS等)
↓
.as_retriever()
↓
用于 RetrievalQA / ConversationalRetrievalChain
🧪 一句话跑通全部流程的示意代码(RAG 典范):
python
复制编辑
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
# 1. 加载文档
loader = PyPDFLoader("10-K.pdf")
docs = loader.load()
# 2. 切块
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
# 3. 嵌入 + 建立向量数据库
embedding = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embedding)
# 4. 构造 Retriever
retriever = vectorstore.as_retriever()