How Do I Start Learning to Set Up a Local LLM and Scrape Data?

0
7
Asked By CuriousCoder42 On

Hey everyone! I'm looking to get started on building a local LLM that can scrape data relevant to our business so it can learn from various files and databases. The idea is to have over 1,000 employees interact with this LLM to get information efficiently. I want to dive deep into how this all works, including the programming side of things, instead of just creating a simple agent. Where should I begin? Should I focus on learning Python or other programming languages? Also, which LLM is best for running locally without any restrictions? What skills or knowledge should I aim for if I want to modify parameters in the LLM?

3 Answers

Answered By DataNinja On

A great tool for scraping is SeleniX, which makes the process pretty straightforward. After scraping data, you can export it using n8n webhooks to integrate with your AI model effectively. This way, you'll have a streamlined approach for managing data and enabling user interactions.

Answered By TechGuru99 On

To kick things off, a strong foundation in computer science would really help. Consider taking some relevant courses or even pursuing a degree if you can. But keep in mind that an LLM itself doesn't do any scraping—it's mainly for processing language. You'll need to build or use a web scraper separately. You might want to check out the AI Engineer roadmap for guidance on skills to develop!

LearnToScrape -

For sure! You can definitely use web scraping as part of a Retrieval-Augmented Generation (RAG) system. So, connecting a scraper with an LLM can work pretty well together!

Answered By DataWhizKid On

It sounds like there might be some confusion around the term "scrape." If you have direct access to the files you need, you might not need to scrape at all. Just set up a system to utilize those files directly, which could simplify your setup!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.