Langchain load multiple pdfs - def main(): load_dotenv() st.

 
Compute the embeddings with <strong>LangChain</strong>'s OpenAIEmbeddings wrapper. . Langchain load multiple pdfs

I have installed langchain (multiple times), pyPDF and streamlit. In this video, we will look into how we can build a system which allows us both to summarize and chat with PDF documents using lanchain library and OpenAI AP. This example goes over how to load data from JSONLines or JSONL files. This property contains the JSON. Index and store the vector embeddings at PineCone. If you are not familiar with LangChain, check out my previous blog post and video. There are reasonable limits to concurrent requests, defaulting to 2 per second. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more. Click on New Token. But if I use it for a second PDF (that is, I change the file path to another PDF), it still puts out the summary for the first PDF, as if the embeddings from the first PDF/previous round get somehow stored and not deleted. This video will guide you through step-by-step process about how c. xpath: XPath inside the XML representation of the document, for the chunk. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar. To load and extract data from files using LangChain, you can follow these steps. The pymilvus and milvus libraries are for our vector database and python. A well-designed church program serves multiple purposes. By default we combine those together, but you can easily keep that separation by specifying mode="elements". when I use the following code - which summarizes long PDFs -, it works fine for the first PDF. 541 integrations Request an integration. Next, move the documents for training inside the "docs" folder. We use vector similarity search to find the chunks needed to answer our question. Chroma is a vectorstore for storing. LangChain provides modular components and off-the-shelf chains for working with language models, as well as integrations with other tools and platforms. First, you need to load your document into LangChain’s `Document` class. Step 5: Retrieve Data from the Vector Database. data can include many things, including: Unstructured data (e. pdf", {// you may need to add `. Start by installing LangChain and some dependencies we’ll need for the rest of the tutorial: pip install langchain==0. So let's load the API key from a file: Create a directory called. result = pdf_qa ( {"question": query, "chat_history": ""}) print (result ["answer"]) This behavior holds true even when re-starting Python, or when I try a number of other pdfs. By leveraging this API and using LangChain & LlamaIndex, developers can integrate the power of these models into their own applications, products, or services. we can directly convert a PDF file containing tabular data directly to a CSV file using convert_into () method in tabula library. To be able to look up our document splits, we first need to store them where we can later look them up. One document will be created for each row in the CSV file. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. load_hidden - Whether to load hidden files. OpenAI’s API, developed by OpenAI, provides access to some of the most advanced language models available today. Suppose we want to summarize a blog post. We’ll work with three example papers and cover the following steps: Set up and dependencies; Setting up the large language model (LLM) Summarizing PDFs. def main(): load_dotenv() st. Overview of the Flan-T5 Model. In this example, we can actually re-use our chain for combining our docs to also. Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next. This example goes over how to load data from CSV files. langchain/ document_loaders/ fs/ text. You can update the second parameter here in the similarity_search. Your Docusaurus site did not load properly. Navigate to the directory where your chatbot file is located. Private Chatbot with Local LLM (Falcon 7B) and LangChain; Private GPT4All: Chat with PDF Files; 🔒 CryptoGPT: Crypto Twitter Sentiment Analysis; 🔒 Fine-Tuning LLM on Custom Dataset with QLoRA; 🔒 Deploy LLM to Production; 🔒 Support Chatbot using Custom Knowledge; 🔒 Chat with Multiple PDFs using Llama 2 and LangChain. split_documents (documents). from_loaders (loaders) from the langchain package, where loaders is a list of UnstructuredPDFLoader instances, each intended to load a different PDF file. What you will need: be registered in Hugging Face website (https://huggingface. The second argument is the column name to extract from the CSV file. In our chat functionality, we will use Langchain to split the PDF text into smaller chunks, convert the chunks into embeddings using OpenAIEmbeddings, and create a knowledge base using F. This can be useful for distilling long documents into the core pieces of information. By default we use the pdfjs build bundled with pdf-parse, which is compatible with most environments, including Node. Step 2. 55 requests openai transformers faiss-cpu. load (inp) And finally define your build_retrieval_qa () as follows:. Step 1: Installing Required Libraries. Document loaders make it easy to load data into documents, while text splitters break down long pieces of text into smaller chunks for better processing. To be able to look up our document splits, we first need to store them where we can later look them up. indexes import VectorstoreIndexCreator. Loading PDF data into Langchain : To Use or Not to Use Unstructured. js library. It provides indices over structured and unstructured data, helping to abstract away the differences across data sources. Conveniently, LangChain has utilities just for this purpose. CharacterTextSplitter from langchain. Import Dependencies. Here, we are using a very simple TextLoader, which reads a single file. 4: Fetching Numerical Embeddings for the Text. Tiếp theo, chúng ta sẽ khởi tạo Embedding và cơ sở dữ liệu Chroma. qa = ConversationalRetrievalChain. Langchain gpt-3. chat_models import ChatOpenAI from langchain. We use LangChain's PyPDFLoader to load the document and split it into individual pages. Create a LangChain pipeline using the language model and. Picture feeding a PDF or maybe multiple PDF files to a machine and then asking it questions about those files. concatenate_pages - If True, concatenate all PDF pages into one a single document. streamlit at the root of your app. Hey u/Brian-Hose225, please respond to this comment with the prompt you used to generate the output in this post. Writes a pickle file with the questions and answers about a candidate. from PyPDF2 import PdfReader from langchain. Sign in to comment. PDF Loading: The app reads multiple . class langchain. We will then put the data loading logic in LangChain, put the prompts in LangChainHub, and put the examples in the LangChain documentation to make it as easy as possible for others to get started. langchain/ chat_models/ openai. This example goes over how to load data from CSV files. Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next. and thus giving the result for only that pdf. pdf") pages = loader. Windows/Linux: Papercrop is free, simple utility that automatically restructures PDF files to fit more comfortably on small smartphone and eBook reader screens. One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. Here using LLM Model as AzureOpenAI and Vector Store as Pincone with LangChain framework. Check out the document loader integrations here to browse the set. The Chat with Multiple PDF Files App is a Python application that allows you to chat with multiple PDF documents. Let's start by building a function that will tell you where the page breaks need to be. One possible option would be to use os. from langchain. The third step is to load PDF files from a directory using the PyPDFDirectoryLoader class, which extracts text from PDF documents and returns it in a list of tuples (file name, text extracted from. from langchain. persist () Now, after storing the data, I want to get a list of all the documents and. Load PDF files The next section of code allows the user to upload a PDF. Running LLMs like GPT with your own data allows you to quickly build personalized applications. First, let’s get all our imports set up and set an environment variable to contain our OpenAI key. Query the papers using LangChain. when I use the following code - which summarizes long PDFs -, it works fine for the first PDF. Our step-by-step guide. It uses the getDocument function from the PDF. run ingest will automatically ingest all directories and all PDF files in those directories, and will create namespaces which match the subdirectory name. We'll use the LangChain library to create a chain that can retrieve relevant documents and answer questions from them. A LLMChain is the most common type of chain. The second argument is a map of file extensions to loader factories. The steps we need to take include: Use LangChain to upload and preprocess multiple documents. Next, we add the OpenAI api key and load the documents present in the data folder. One example of this is a text splitter that splits a large document into many smaller. Microsoft PowerPoint. endswith (". The PDF document is split into individual pages using the PagedPDFSplitter class. If you use "elements" mode, the unstructured library will split the document into elements such as Title and NarrativeText. Unstructured data can be loaded from many sources. Finally, it uses the OutputParser (if provided) to parse the output of the LLM. In the context of building LLM-related applications, chunking is the process of breaking down large pieces of text into smaller segments. gpt4all_path = 'path to your llm bin file'. If you still need an answer, you must convert the blob data into BytesIO object, and save it locally (whether temporarily or forever) before processing the files. The user can then switch between topics on the home page. When there are multiple ways to solve a single challenge, then choosing the solution with least cost and time pays off. Load file. If there is, it loads the documents. Next, move the documents for training inside the “docs” folder. ChatGPT For Your DATA | Chat with Multiple Documents Using LangChainIn this video, I will show you, how you can chat with any document. convert_into ("pdf_file_name", "Name_of_csv_file. The GUI supports all common features of the command line tool in a comfortable way. Use a pre-trained sentence-transformers model to embed each chunk. The web pages are then automatically scraped and de-HTMLized. This object is pretty simple and consists of (1) the text itself, (2) any metadata associated with that text (where it came from, etc). To use paper-qa, you need to have a list of paths (valid extensions include:. Chroma from langchain. Step 3. Having looked through the langchain website, I haven't found a tutorial for multiple documents. Langchain Chatbot for Multiple PDFs: Harnessing GPT and Free Huggingface LLM. Langchain Chatbot for Multiple PDFs: Harnessing GPT and Free Huggingface LLM Alternatives. Chatting with Multiple PDFs at once. from langchain. Use Pythons PyPDF2 library to extract text. LangChain - Prompt Templates (what all the best prompt engineers use) by Nick Daigler. A lazy loader for Documents. Also presented with a drop down for PDF analytics. Step 5: Retrieve Data from the Vector Database. This example goes over how to load data from folders with multiple files. If you have a mix of text files, PDF documents, HTML web pages, etc, you can use the document loaders in Langchain. Eagerly parse the blob into a document or documents. However, extracting information from PDFs can be a challenging task for developers. Chains are an important feature of LangChain enable users to combine multiple components together to create a single, coherent application. The third step is to load PDF files from a directory using the PyPDFDirectoryLoader class, which extracts text from PDF documents and returns it in a list of tuples (file name, text extracted from. Conveniently, LangChain has utilities just for this purpose. This covers how to use WebBaseLoader to load all text from HTML webpages into a document format that we can use downstream. Therefore, it is neccessary to split them up into smaller chunks. Background & Problem Statment. One way is to input multiple smaller documents, after they have been divided into chunks, and operate over them with a MapReduceDocumentsChain. This covers how to use the DirectoryLoader to load all documents in a directory. By default we use the pdfjs build bundled with pdf-parse, which is compatible with most environments, including Node. Instead, they shift the load to a subsequent model once the current one hits its limit. , SQL); Code (e. This sections shows results of using the refine Chain to do question answering with sources. Unstructured File. OpenAI recently announced GPT-4 (it's most powerful AI) that can process up to 25,000 words - about eight times as many as GPT-3 - process images and handle much more. A model can read PDF file and I can then ask him questions about specific PDF file. Ask your question. In this example we will see some strategies that can be useful when loading a large list of arbitrary files from a directory using the TextLoader class. It appears that when working with PDF documents, there's a consistent issue with splitting at page breaks taking precedence over separators, especially when the chunk size exceeds the page length. Again, because this tutorial is focused on text data, the common format will be a LangChain Document object. Next, we need data to build our chatbot. openai import OpenAIEmbeddings from langchain. Now, we will use Langchain's PDFLoader to preprocess and load our PDF into text. PDF Loading: The app reads multiple PDF documents and extracts their text content. Callbacks 27. We will start first by creating a pdf file loader and loading the pdf file and after that, we will split it into separate pages. I am able to do this when I can download the file locally. Code: import os os. That said, there are, e. Having looked through the langchain website, I haven't found a tutorial for multiple documents. Query the papers using LangChain. pdf") pages = loader. As you can see, we first loaded the document and then created an index over it. data can include many things, including: Unstructured data (e. 44 and older). This example goes over how to load data from folders with multiple files. endswith (". How to Talk to a PDF using LangChain and ChatGPT by Automata Learning Lab. 19 may 2023. Load PDF. There are several occasions where we wanted to merge PDF files, so as to organize them to reduce clutter or to share it with someone else. Load a chain from LangchainHub or local filesystem. LangChain supports various popular LLM architectures, such as GPT-3, enabling developers to work with state-of-the-art models for their applications. Luckily, LangChain can help us load external data, calculate text embeddings, and store the documents in a vector database of our choice. user_api_key = st. Use a PDF-to-text converter: There are several online tools and software that can convert your PDF to plain text. Select a PDF document related to renewable energy from your local storage. If you have a mix of text files, PDF documents, HTML web pages, etc, you can use the document loaders in Langchain. Chat with Multiple PDFs using Llama 2 and LangChain (Use Private LLM & Free Embeddings for QA) · Details · Related Courses · Reviews. By default, the loader will utilize the specialized loaders in this library to parse common file extensions (e. llm = OpenAI() chain = load_qa_chain(llm, chain_type="stuff") chain. The second argument is a map of file extensions to loader factories. To use this loader, you need to pass in a Path to a local file. OpenAIEmbeddings from langchain. JSONLines files. Read how to migrate your code. Example folder:. The user can then switch between topics on the home page. Each record consists of one or more fields, separated by commas. Welcome to PDF Chain. Chat Models 26. document_loaders import PyPDFLoader loader=PyPDFLoader (file) pages = loader. memory import ConversationBufferMemory from langchain import PromptTemplate from langchain. Then I proceed to install langchain (pip install langchain if I try conda install langchain it does not work). Each record consists of one or more fields, separated by commas. You can optionally pass in your own custom loaders. Actually as far as I understand, SequentialChain is made to receive one or more input for the first chain and then feed the output of the n-1 chain into the n chain. GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files. Load and split the data ## load the PDF using pypdf from langchain. You have 5$ of credit, but. We use vector similarity search to find the chunks needed to answer our question. In this video we'll learn how to use OpenAI's new GPT-4 api to 'chat' with and analyze multiple PDF files. Simple Diagram of creating a Vector Store. , if you are building a legal-specific. document_loaders import UnstructuredPDFLoader from. 45 (compatible with PDFtk 1. 📄️ PDF files. The most common way to do this is to embed the contents of each document split. 5 and GPT-4. This class uses the pdfminer library to extract the text from each page of the PDF. pip install tiktoken - #load required packages from langchain. Step 4: Store the Data in Vector Storage. Language Model: The application utilizes a language model to generate vector representations (embeddings) of the text chunks. Solutions # To run the code examples, make sure you have the latest versions of openai and langchain installed: pip install openai --upgrade pip install langchain --upgrade In this post, we'll be using openai==0. A Document is the base class in LangChain, which chains use to interact with information. docx, etc. The loader will load all strings it finds in the JSON object. Therefore, your function should look like this: def get_response (query): #print (query) result = index. We use vector similarity search to find the chunks needed to answer our question. It can store context required for prompt engineering, deal. 9 Who can help? No response Information The official example notebooks/scripts My own modified scripts Related Components LLMs/Chat Models Embedding Models Prompts / Prompt Templates / Prompt Select. Step 4: Consider formatting and file size: Ensure that the formatting of the PDF document is preserved and intact in. Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF, CSV, TET files. Using this library, one can load documents available in s3, blob storage,google storage, URL and many more. document_loaders module to load and split the PDF document into separate pages or sections. One example of this is a text splitter that splits a large document into many smaller. Load the Obsidian notes. Use PyPDF to convert those bytes into string text. for the quarter ended March 31. docx, etc). If you're looking to harness the power of large language models for your data, this is the video for you. Information in such streams is coded in XML. For example, there are document loaders for loading a simple. load() → List[Document] [source] ¶. In simple terms, a stuff chain will. You can use the ChatOpenAI wrapper that supports OpenAI chat models. We will chat with PDFs using just a few lines of Python code. If you use “single” mode, the document will be returned as a single langchain Document object. def load_doc (file): from langchain. Read how to migrate your code. We have a public discord server. OpenAI recently announced GPT-4 (it’s most powerful AI) that can process up to 25,000 words – about eight times as many as GPT-3 – process images and handle much more. Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files. from_documents (content, OpenAIEmbeddings ()) else: faiss_index_i = FAISS. Takes an input, formats it, and passes it to an LLM for processing. Let's Dive into Building the Document Query System. text_splitter import CharacterTextSplitter from langchain import OpenAI. To get started, we need to set up our libraries. By default, this LLM uses the “text-davinci-003” model. For example, there are document loaders for loading a simple. gloryhole cumshot compilation, amc river hills 10

Here is the link from Langchain. . Langchain load multiple pdfs

I can parse documents using document loaders using <b>langchain</b>. . Langchain load multiple pdfs korean nudemodel

One of the main ways they do this is with an open source Python package. It loads the PDF using the PyPDFLoader and splits the content into smaller parts. Chat with your PDF: Using Langchain, F. so you don’t have to convert files to text by yourself. Using GPT-3 and LangChain's question_answering to query these documents. By the end of this tutorial, you'll have the knowledge and tools to tackle large volumes of text efficiently. When initializing tools, we either create a custom tool or load a prebuilt tool. Create an index with the information. This example goes over how to load data from text files. In our chat functionality, we will use Langchain to split the PDF text into smaller chunks, convert the chunks into embeddings using OpenAIEmbeddings, and create a knowledge base using F. Showing how with a few minor changes, we can speed parts of the process up by a factor of 4x or more. Azure Blob Storage Container. Once you've created your search engine, click on "Control Panel". Loading Data. Embeddings can be used to create a numerical representation of textual data. Every document loader exposes two methods: 1. openai import OpenAIEmbeddings from langchain. Document loaders provide a "load" method for loading data as documents from a configured source. The final step is to load our chain and start querying. LangChain is a powerful framework designed to help developers build end-to-end applications using language models. Example JSON file:. Use FAISS to create our vector database with the embeddings. Code: import os os. ) Provides ways to structure your data (indices, graphs) so that this data can be. Conveniently, LangChain has utilities just for this purpose. Ecosystems of hugging face, LangChain and Pytorch make open-source models easy to infer and finetune for specific use cases. In this example we will see some strategies that can be useful when loading a large list of arbitrary files from a directory using the TextLoader class. You can also add SQL database files, as explained in this Langchain AI tweet. Langchain is a powerful tool that enables efficient information retrieval from multiple PDF files. Step 3. This video will guide you through step-by-step process about how c. First, let’s get all our imports set up and set an environment variable to contain our OpenAI key. load(text SecretMap = {}, optionalImportsMap Promise. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader. langchain/ chat_models/ openai. To use paper-qa, you need to have a list of paths (valid extensions include:. [docs] class CSVLoader(BaseLoader): """Load a `CSV` file into a list of Documents. Document transformers. You can refer to the official documentation if you want to load a large text document and split it with a Text Splitter. By combining LangChain's PDF loader with the capabilities of ChatGPT, you can create a powerful system that interacts with PDFs in various ways. OpenAI recently announced GPT-4 (it's most powerful AI) that can process up to 25,000 words - about eight times as many as GPT-3 - process images and handle much more. If you use "single" mode, the document will be returned as a single langchain Document object. first commit 4 months ago utils upgrade langchain and pinecone, migrate from pnpm to yarn 3 months ago visual-guide first commit 4 months ago. In this tutorial, we'll use the latest Llama 2 13B GPTQ model to chat with multiple PDFs. def main(): load_dotenv() st. Compute the embeddings with LangChain's OpenAIEmbeddings wrapper. Document loaders make it easy to load data into documents, while text splitters break down long pieces of text into smaller chunks for better processing. A static method that creates an instance of MultiPromptChain from a BaseLanguageModel and a set of prompts. We begin by loading a basic text file and us. To load and extract data from files using LangChain, you can follow these steps. GPT and LangChain APIs are powerful tools for summarizing long PDF documents quickly and efficiently. chains import RetrievalQA from langchain. 2K subscribers in the Streamlit community. run(input_documents=docs, question. The video is a tutorial on how to load multiple PDF files into LangChain for efficient information retrieval using open AI models. Next, we need data to build our chatbot. Sign in to comment. Dosu-beta suggested modifying the load_file method in the DirectoryLoader to log the name of the file being processed at the debug level, which can help pinpoint the problematic file. [docs] class CSVLoader(BaseLoader): """Load a `CSV` file into a list of Documents. Language Model: The application utilizes a language model to generate vector representations (embeddings) of the text chunks. The loader. Next, we add the OpenAI api key and load the documents present in the data folder. Ingest data: loading the data from arbitrary sources in the form of text into the document loader. The text_to_docs() function converts a list of strings (e. Text Chunking: The extracted text is split into smaller chunks to improve the efficiency of retrieval and provide more precise answers. Working with MULTIPLE PDF Files in LangChain: ChatGPT for your Data. Language models take text as input - that text is commonly referred to as a prompt. LangChain makes it easy to manage interactions with. Create an index with the information. Chroma runs in various modes. Interacting with a single pdf. Instantiate langchain libraries class ‘AnalyzeDocumentChain’ with chain_type = ‘map_reduce’ and run it with extracted text to get the summary. Create an index with the VectorStore. Luckily, LangChain can help us load external data, calculate text embeddings, and store the documents in a vector database of our choice. document_loaders import PyPDFLoader loader = PyPDFLoader (". The JSONLoader uses a specified jq. Get started. The 3 key ingredients used in this recipe are: The document loader (here PyPDFLoader): one of Langchain's tools to easily load data from various files and sources. But similarly, I have a folder which contains many pdf documents. How to create a PDF summarizer app using LangChain In the last section, we created a basic text summarization app using langchain summarization chains. , PyPDFLoader) for pdfs. This can be useful for distilling long documents into the core pieces of information. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data. But similarly, I have a folder which contains many pdf documents. The Langchain Chatbot for Multiple PDFs is implemented using Python and utilizes several libraries and components to provide its functionality. The Chat with Multiple PDF Files App is a Python application that allows you to chat with multiple PDF documents. Reload to refresh your session. py and import Streamlit and the functions we made earlier. 23 jul 2023. # # Install package ! pip install "unstructured [local-inference]" ! pip install layoutparser [ layoutmodels,tesseract]. You can add multiple text or PDF files (even scanned ones). You can take a look at the source code here. from langchain. Defaults to False. from langchain. run ingest will automatically ingest all directories and all PDF files in those direc. from_loaders(loaders) from the langchain package, where loaders is a list of UnstructuredPDFLoader instances, each intended to load a different PDF file. We can pass in the argument model_name = ‘gpt-3. See here for setup instructions for these LLMs. This PR allows users to add multiple subdirectories in docs and to include multiple files in each subdirectory. parse(blob: Blob) → List[Document] ¶. load_and_split ( [text_splitter]) Load Documents and split into chunks. 2K subscribers in the Streamlit community. LangSmith Python Docs. So, in a way, Langchain provides a way for feeding LLMs with new data that it has not been trained on. Current configured baseUrl = / (default value) We suggest trying baseUrl = / /. Thanks! Ignore this comment if your post doesn't have a prompt. The primary index and retrieval types supported by LangChain are currently centered around vector databases, and therefore a lot of the functionality we dive deep on those topics. document_loaders import PyPDFDirectoryLoader loader = PyPDFDirectoryLoader(pdf_folder_path) docs = loader. In this video we'll learn how to use OpenAI's new GPT-4 api to 'chat' with and analyze multiple PDF files. Then I create a rapid prototype using Streamlit. Now, to dive into the step-by-step code explanation. document_loaders import DirectoryLoader loader = DirectoryLoader("data",. in-memory - in a python script or jupyter notebook; in-memory with persistance - in a script or notebook and save/load to disk; in a docker container - as a server running your local machine or in the cloud; Like any other database, you can:. listdir(data_directory) if f. text_splitter = CharacterTextSplitter (chunk_size=800, chunk_overlap=0) texts = text_splitter. If you use "elements" mode, the unstructured library will split the document into elements such as Title and NarrativeText. Production applications should favor the lazy_parse method instead. LangChain as my LLM framework. It offers a suite of tools, components, and interfaces that simplify the process of creating applications powered by large language models (LLMs) and chat models. 2) A PDF chatbot is built using the ChatGPT turbo model. langchain/ document_loaders/ web/ sort_xyz_blockchain. import customtkinter. A lazy loader for Documents. Not sure whether you want to integrate multiple csv files for your query or compare among them. Unleash the full potential of language model-powered applications as you revolutionize your interactions with PDF documents through the synergy of. Here you will read the PDF file using PyMuPDFLoader from Langchain. The document_loaders and text_splitter modules from the LangChain library. In short, LangChain just composes large amounts of data that can easily be referenced by a LLM with as little computation power as possible. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar. # dotenv is a library that allows us to securely load env variables from dotenv import load_dotenv # used to load an individual file (TextLoader) or multiple files (DirectoryLoader) from langchain. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. . namethatpirn