Google Echo Chamber

A few years ago, I was scrolling through one of my news feeds and came across an article about how someone believed in some conspiracy theory and ended up making the news from the actions they took. I thought to myself “How can Person A believe that Theory B is real? Its clearly fake“. Normally I would then keep scrolling, but on that particular day it got me really thinking about it.

With the current state of the internet, anything and everything can be found. Cooking recipes, how to tie your shoes, and even 10 min videos summarizing quantum physics. As long as you know what to type, you can find pretty much anything.

Search Text to Article Matching

But how do search engines know what to give you when you look something up? There are a few main levels of matching that search engines go through in order to decide what to provide you (I’m sure there are plenty more, but its out of my field of studies and out of scope for this article).

Direct Matching

The first type would be direct text matching which is pretty straight forward. I’ve searched “How to make delicious pecan pie”, so the search engine will provide me articles that have that exact phrase in its article.

Post Language Processing

The second type is after language processing. Search engines will parse the text you’ve entered in and determine the root meanings. Based on the root meanings, it will serve you similar articles that also match the same root meanings. That way it can try and serve you similar articles, but may not have the exact same text. “How to make pecan pie”:

  • “How to”: Looking for instruction so it may also match for “make”, “instructions on”, and “steps to”
  • “Make”: Would also match “create”, “put together”, and “assemble”
  • “Pecan Pie”: its pretty specific, so it may not match other phrases, but it may match just “pecan” or “pie”.

So not only would you get articles matching “How to make pecan pie”, but you could also get articles that say “Instructions on assembling pecan pie” or “Steps to put together pecan pie”.

Other users also clicked on

The last step for search engines is to show links that other users have more likely clicked on.

If someone searches for “WordPress” and they’re looking for www.wordpress.com, they will flip through a few pages until they get www.wordpress.com, then click on that link. Over time as more people use that search term and then click on the www.wordpress.com link, the search engine recognizes it as being statistically significant: It is more statistically likely that a user searching for WordPress wants to go to www.wordpress.com. The search engine will then start showing the wordpress.com site higher up on the list of results when a user uses the WordPress search term.

This not only makes searching for stuff online very accurate to what you’re actually looking for, but can also put you into a bubble that you may not be aware of, nor approve of.

False Information Use Case

Lets consider this thought experiment: Say you were watching a news channel that you’ve never seen before. During one of their reports, they claim that air is 74.8 Nitrogen. You become curious, so you go to your search engine to find more about it. Then yoy type in “Air 74.8 Nitrogen” and hit search.

The articles that will show up will be articles that people who made that search before you, also clicked on. Who are those people? They could be people that are also curious about that fact, or they could be people that were looking up the data to support their claim.

However, the ones that will go to the 2nd or 3rd page are likely the ones looking for something to support their claim. They believe that air is 74.8 Nitrogen, and are looking for online articles that confirm that fact – even if its incorrect. The articles that were true may get buried under all of the false articles. But as someone learning about this for the first time – are you going to go digging to page 3 to see if its true or not? Most likely not.

Ultimately, you type in “Search A”, and you’re going to get articles that support that claim. Whether or not that’s correct is a different issue entirely. Search engines can’t easily police all of the articles that are cached by their engines; its simply way too much data.

Related Searches

Some search engines have also found a big improvement to accuracy if they know more about your personal life. People who type their search results in English probably want English only sites, people loving in Switzerland probably want locations near them rather than locations in Canada, and people who identify as male probably want to see more masculine clothing stores. All of these data points can be fed into an AI program (AKA. statistics) to predict what links the user may want to pick.

Another thing a search engine most likely does is also track what you’ve clicked on before. Knowing what you’ve clicked on means they will serve up that same article again, but also other articles that other people have also clicked on. If someone searched up how to make pecan pies, but also clicked on an article on how to make apple pie, you may be served both articles when you look up how to make pecan pie.

The Bubble

We’re in a position now where we know that search engines track as much as they can in order to provide better search results (and to make money, but that’s not in the scope of this article either). But the main thing is that a lot of the search results you see are driven by other people similar to you, what you’ve looked up in the past, and your current search query.

Lets say you want to look something that is polarizing – Ripped jeans vs non-ripped jeans. A fashion trend right now is to wear ripped jeans but the polar opposite is to see ripped jeans as of low and cheap quality. I’ve noticed a trend in the real world where women tend to wear ripped jeans, and men do not. I acknowledge that it could totally be a confirmation bias, but that’s what I’ve noticed.

Google – “Jeans” shows mostly women’s jeans, where some women’s jeans are ripped

In theory if the search engine knows that I identify as a woman then it will be more likely to serve me articles that contain ripped jeans when I use the search term “Jeans”. As I keep browsing, I’ll see more and more ripped jeans – to the point where I assume that ripped jeans are everywhere. If I notice that ripped jeans are everywhere, I’ll probably consider them to be normal and socially acceptable. To fit in, I will then be likely to accept ripped jeans as fact and buy a pair.

Conclusion

The power behind good search engines is theoretically a big problem. Polarizing topics will tend to create this huge divide among users as it will force everyone into their own groups: {statement} is either A or B. Milk then cereal, or cereal then milk; Donald Trump or Biden; Vaccine or not. While I do approve of improving technology, I can’t support companies collecting personal information without oversight. Originally it was because I don’t want targeted advertisements, but now its also because I don’t want to be forced into a bubble that I can’t see out of.