Build your own search engine with YaCy
Search the web on your own terms
When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.
Mainstream search engines likeGoogleare pretty good at what they do, but many people choose not to use them because ofprivacy concerns. Then there are those who are concerned about content falling through the cracks just because the creator hasn’t followed the best practices forsearch engine optimization (SEO).
YaCy, anopen sourcedistributed search engine, works pretty much like its mainstream peers, but doesn’t suffer from any of their ills. YaCy uses a peer-to-peer (P2P) network, so every user running an instance of the search engine joins in the effort to index the internet. The index is distributed and redundant across all YaCy users.
To further bolster its privacy credentials, YaCy ensures that no one can tell who has searched for what words, in essence making all searches functionally anonymous.
YaCy only indexes publicly accessible, non-password-protected pages. You can also use it as a search engine for yourwebsite, or use it to index pages on the intranet, which it ensures aren’t accessible to anyone outside your network.
Installation
YaCy is written in Java and runs onWindows,macOS, andLinux. Search engines are complex beasts, but thanks to YaCy’s distributed nature, you don’t need a fast machine, nor a lot of space to run a YaCy client.
Installation is fairly simple. Before you begin, ensure you have Java installed on the machine. Windows and macOS users can obtain pre-built binaries fromAdoptium, while Linux users can pull it from their official repositories.
For instance, Debian users can usesudo apt install default-jdk, while Fedora users can search for the available versions withsudo dnf search openjdk, before installing the latest version withsudo dnf install .
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Once you have Java installed,download the YaCy executablefor your platform, and extract it. For instance, the commandsudo tar –extract –file yacy_*z –directory /opt -v, will extract the installer under the /opt directory on Linux. Now simply change into the extracted directory and start YaCy:
cd /opt/yacy
./startYACY.sh
YaCy is now running on port 8090 on your computer. Fire up a web browser, and head tohttp://localhost:8090to access the YaCy instance. You can now search the internet just as you would using a regular search engine.
Crawl the internet
There’s much more you can do with the YaCy search engine than just search passively. For instance, since P2P indexing is user-driven, you can ask YaCy to crawl any website.
To access the advanced administrative controls of your search engine, click theAdministrationbutton in the top-right corner. This brings up the admin panel, which among other things lets you tweak how your YaCy instance interacts with other YaCy clients in the network.
To initiate a manual web crawl, navigate toLoad Web Pages, Crawleroption under theFirst Stepsmenu. Enter the URL in the space provided and hitStart New Crawl. As the crawler gets underway, it’ll start showing all kinds of statistics about the crawl, and you can scroll down to view the structure of the scrolled website graphically.
After initiating the crawl, head toMonitoring > Index Browserto view how many pages have been indexed and view other details, such as their name and number of outbound links.
For now you can go with the default option, and explore the other options, such as limiting the crawler, once you get comfortable with YaCy. The search engine can run multiple crawls at the same time, and you can either initiate them serially from under theFirst Stepssection, or head toProduction > Advanced Crawlerto crawl multiple websites at the same time.
Once the crawl job starts, YaCy indexes the URLs you enter and stores the index on your local machine. To ensure your index is available to YaCy users all over the globe, you’ll have to join YaCy’s P2P network.
For this you must open port 8090 in your router’s firewall. Log into your router’s administration page and look for a configuration panel controlling thefirewallorport forwarding.
Once you find the preferences for your router’s firewall, add port 8090 to the whitelist. If your router is doing port forwarding, then you must forward the incoming traffic to your computer’s IP address, using the same port.
After you’ve joined the YaCy network, you can toggle theDo remote indexingoption under theAdvanced Crawler. This enables your client to broadcast the URLs it is indexing, and other clients on the network that have opted to accept requests can help you perform the crawl.
Your very own Google
Instead of searching the web, you can use YaCy to search through your own data or to implement a search system for local file shares inside your corporate intranet.
For this you’ll need to run YaCy as an internal indexer. In these modes, only people in your local network can use your personalized instance of YaCy to find shared files, and none of the data is shared with users outside your network.
Head toAdministration > First steps > Use Case & Account. Here you can specify basic details such as the language for YaCy’s interface.
You’ll also be able to change the behavior of your YaCy instance from here. The default option is to use your client as part of YaCy’s global P2P network to help crawl and index the web.
To create a search portal for your own website, you need to select theSearch portal for your own web pagesoption. Then scroll down and press theSet Configurationbutton. Next, you need to crawl your domain to generate the content that will be available through your search tool.
To integrate the search into your website, scroll down the left-side column to theSearch Portal Integrationsection. You’re dropped to thePortal Configurationpage, from where you can customize YaCy’s appearance with your corporate branding to blend it into your website. When you are done, hit theChange Search Pagebutton. You can now use any of the generated iframe code snippets to integrate the YaCy-powered customized search into your website.
Similarly, to use YaCy to index the local network, you’ll have to select the third option in theFirst Stepssection. You can then use theAdvanced Crawlerto crawl your intranet.
Conclusion
There’s so much more you can do with YaCy. The project doesn’t offer enough documentation to cover all the features of the search engine. However, the project is fairly intuitive, and its interface is verbose enough to help you toggle the correct option.
All things considered, YaCy is one of the best options for users who want an unbiased, ad-free, privacy-respecting, anonymous web search engine that you can also use to help users search for content on your website or privately inside your intranet.
We’ve listed the best web browsers.
With almost two decades of writing and reporting on Linux, Mayank Sharma would like everyone to think he’sTechRadar Pro’sexpert on the topic. Of course, he’s just as interested in other computing topics, particularly cybersecurity, cloud, containers, and coding.
7 myths about email security everyone should stop believing
Best Usenet client of 2024
I fell in love with the cute and compact Hyundai Inster, but it has one major drawback