So let’s start with the steps you need to take in ChatGPT, for example, to answer a question. Again, just like search engines, you need to collect data first.
Next, we need to store the data in an accessible format and finally return an answer like a ranking. Starting with data collection, this is the closest part to the search engines we are familiar with. So they’re basically visiting web pages and cruising the internet, but if they’re not visiting web pages or getting other sources of information, they just don’t know the answer. Search engines are at a disadvantage here in some ways because they are still in their infancy, whereas search engines have been doing this and recording this information for decades.
So they have a lot of work to do to catch up. The internet is full of different places that they couldn’t actually visit. One thing that can collect information that other search engines don’t have access to is chat data. So when you use the platform, it collects data about what you type and how you interact with it, and that feeds into the training model.
One thing to keep in mind when using a platform like ChatGPT is that if you enter private data there, that data doesn’t necessarily stay private after you enter it. So you might want to check your settings or consider using the API as it tends to promise not to train on API data. Moving on to the second stage of storing that information, this is called indexing in search, and this is where things get a little different, but there are still quite a few similarities.
So in the early days of search engines, and actually the indexes and stored data, they weren’t updated live in the way that we’re used to. Just because something is published on the Internet doesn’t mean you can be sure it will show up anywhere in a search engine right away. It was very expensive, so I had to update it every few months. Updating these indexes was expensive in terms of time and money. We currently have a similar situation with large language models.
You may have noticed that they sometimes say “Yes, I updated”. The information he obtained will be valid until April or so. Because if you want to add more information to the model, you actually have to retrain the whole thing. Again, this is very expensive to do. Both of these limitations are reflected in the final answer you get.
I’m sure you’ve seen it before. If you’re using ChatGPT, the information you’re asking for may not be displayed, or the information displayed may be outdated.