Since i originally wrote this post, i stumbled across the subreddit r/OSINT (open-source intelligence gathering). If you browse the top posts of all time, you can find megalists of hundreds of tools to help you find information on the internet. There’s also an “awesome” list for OSINT tools and Michael Bazzel’s OSINT Techniques: Resources for Uncovering Online Information.
Needless to say, this post is a bit useless in comparison, at least until i go through these comprehensive lists and filter for the most useful resources.
Isn’t the internet such a magically useful tool? Thirty years ago, if you wanted to know how many plays Shakespeare wrote, you would have to physically walk to your local library and find a relevant book. Now, you can find the answer in less than ten seconds, at any time, wherever you are.
However, the internet is not a truthful, superintelligent oracle. Rather, it’s a dangerous jungle of knowledge you must learn to navigate if you wish to find the truth. Good information is censored, hidden behind paywalls or within piles of spam, and difficult to differentiate from untrustworthy information. This post won’t be a complete guide on how to navigate the world wide web of knowledge, but it will give you some tools I’ve discovered over the years that you can throw in your digital rucksack to aid your journey.
- The great internet sage Gwern Branwen wrote an advanced guide on finding references, papers, and books online.
- The search engines Brave Search and Kagi have the features “Goggles” and “Lenses” respectively, which are presets that filter or re-rank entire categories of websites in your results.
- Blog Surf is a search engine for posts from manually approved personal blogs and newsletters.
- SearXNG is a highly customizable internet metasearch engine.
- Perplexity uses natural language processing to answer your query with a paragraph (with sources) and allows you to ask followup questions.
- Metaphor allows you to find websites by writing creative and long-form prompts, also using NLP.
- Elicit is a research assistant that helps you find relevant research papers, also using NLP.
Sometimes you know exactly where to find a piece of information, but it’s locked behind a paywall or deleted from the internet.
- Internet Archive is a non-profit library of free books, movies, websites, et cetera. It’s famous for the Wayback Machine, which displays past archived snapshots of a given URL.
- Bypass Paywalls is a browser extension to help bypass paywalls on selected sites.
- The subreddit r/piracy has a wiki with loads of resources on obtaining copyrighted material for free.
- Anna’s Archive is a shadow library metasearch engine that aggregates results from websites that host copyrighted books, academic papers, magazines, et cetera.
It is particularly frustrating to find trustworthy knowledge about certain topics because of misaligned incentives: researching which product to buy or which supplements actually work is hard because everyone’s trying to sell you something.