Date: Fri, 04 May 2012 07:56:21 +0200
>From: Michael Przybilla <micha>
To: 44net
Subject: [44net] Introducing YaCy Searchengine to HAMNET


Hi Folks

I'd like to introduce the YaCy decentralized search system as a distributed search engine for the HAMNET

The important information on how to use it can be found here on our Webserver at DB0FHW:
http://www.afu-ag.de/joomla/index.php?option=com_content&view=article&id=83:hamnet-yacy&catid=16&Itemid=186


If you let Google translate the page, unfortunately, you do not get a very good translation...
http://translate.google.de/translate?sl=de&tl=en&js=n&prev=_t&hl=de&ie=UTF-8&layout=2&eotf=1&u=www.afu-ag.de%2Fjoomla%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D83%3Ahamnet-yacy%26catid%3D16%26Itemid%3D186

... So I'll explain it in a bit more detail.

YaCy is a free decentralized search engine that anyone can use to build a search portal for their intranet
General information can be found here: http://yacy.de/en/ ... So lets consider HAMNET as a kind of intranet.

We defined a root.unit for the system at DB0FHW, which has a quite good connection to the German HAMNET, an anybody can now connect a peer to this root.unit

This unit is shared to all peers, so the network will stay alive, even if DB0FHW goes down.
Peers temporarily connect each other like a full mesh, so a peer only needs to know one other peer to find its way back into the network.


How do you connect peers now:
1. Download and install YaCy http://yacy.de/en/index.html
2. Do not start the YaCy Service, yet.
3. Edit the line "network.unit.definition" in your file "defaults/yacy.init". It should point to this URL now: http://yacy.db0fhw.ampr.org:8090/yacy.hamnet.unit
4. Start the YaCy Service


Philosophy:
There should be (minimum) one YaCy peer in each AS of the HAMNET, crawling the Servers INSIDE it's own AS.
How often you want to crawl is your free choice. Just depending on the changes of the servers content.
It it holds "hot news", you may host you yacy peer quite close to it an crawl every hour.
If content changes weekly, crawl weekly... (if you have a huge AS, you may have more peers and define which of them is crawling which server)

So... Sounds not that hard at the end
Give it a try

73
Micha, DD2MIC


Translated from: http://www.amateurfunk-wiki.de/index.php/Suchmaschine

To add functionality in HAMNET a search function, the software is YaCy used.

YaCy is a peer-to-peer search engine. This means simplified that there is locally administered by searching that make their (local) Search results mutually available.

For Hamnet ie (at least) a Yacy server per AS would be desirable. Optimal here is a so-called "Principal Peer". It is a Yacy-Peer, the information about the accessibility of other Yacy peers stores on a Web server. (Undertow. Seed list.) These lists can other Yacy server available in the network

installation

Installation can most easily via download from YaCy done. Debian system is available also a .deb package. See [1] The installation is straightforward; However YaCy tries at first start with the public YaCy network on the Internet to connect (free world). However, this connection attempt is not critical, because its own network definition is used in HAMNET:

Integration of a peer in the common network

The aim, therefore, first in each of the AS HAMNET at a location where there are anyway IT (eg a web server) a YaCy peer to set up and connect to the unit "hamnet.yacy". (This should currently work out of all AS).

For this purpose, the own peer under ConfigNetwork_p.html 8090 / http: // localhost changed the network definition. Please fill in one of the following address here:

http://search.db0uhi.as64636.de.ampr.org/www/hamnet.yacy

http://dk0mav.as64643.de.ampr.org/yacy/hamnet.yacy
http://db0hmk.as64636.de.ampr.org/hamnet.yacy

After restarting the YaCy server service to your peer should connect to the network "Hamnet".

Network Definitions

Thus, the YaCy server must find each other in Hamnet a uniform definition of the search engine network exist. This network definition is the above-mentioned text file "hamnet.yacy". If you want to edit ( "hamnet.yacy") for your (Hamnet-) around its own network definition file, I present here the currently used file.

 network.unit.name = HAMNET
 network.unit.description = HAMNET YaCy Community

 network.unit.domain = global
 network.unit.dht = true
 network.unit.dhtredundancy.junior = 1
 network.unit.dhtredundancy.senior = 1


 # The vertical partition of the dht: this Applies a division
 # Of the dht into 2 ^^ <partition exponent> fragments Which get
 # All the same word-partition targets but a document-dht computed
 # Fragment of all documents
 network.unit.dht.partitionExponent = 0

 # Maximum search time for remote queries (milliseconds)
 network.unit.remotesearch.maxtime = 10000

 # Maximum number of results per remote query
 network.unit.remotesearch.maxcount = 50

 # Multi-word burst: percentage of the number of all peers did
 # Shall be Accessed for multi-wordsearches.  Multi-word search is
 # A hard problems when you search the distributed network is divided by
 # Term (as done with yacy, partly ..).
 # Scientific solutions for this problem-is to apply heuristics.
 # This heuristic Enables to switch on a full network scan to get so
 # Non-distributed multi-word positions.  For a full scan set thisValue to 100th
 # Attention: this may out-number the maxcount of available httpc network connections.
 network.unit.dht.burst.multiword = 30

 network.unit.bootstrap.seedlist1 = http://search.db0uhi.as64636.de.ampr.org/www/seed.txt
 network.unit.bootstrap.seedlist2 = http://dk0mav.as64643.de.ampr.org/yacy/seed.txt
 network.unit.bootstrap.seedlist3 = http://db0bi.as64635.de.ampr.org:8090/www/seed.txt
 network.unit.bootstrap.seedlist4 = http://db0hmk.ampr.org/seed.txt
 network.unit.bootstrap.seedlist5 = http://44.225.93.8:8090/www/seed.txt
 network.unit.bootstrap.seedlist6 = http://search.db0sda.ampr.org/yacy/seedlist.html
 network.unit.bootstrap.seedlist7 = http://44.142.162.18/yacy/seed.txt
 network.unit.bootstrap.seedlist8 = http://44.225.164.3:8090/yacy/seedlist.html

 # Properties for in-protocol response authentication:
 network.unit.protocol.control = uncontrolled


 # Greedy learning: nearly information acquisition heuristic for new peers 
 greedylearning.enabled = true
 greedylearning.limit.doccount = 1000

Seed Lists

called Seedlisten are required for the development of the network. These are text files, in which the presence and availability of other YaCy server communicates. This is particularly important when it first starts up a new YaCy peers. The stability of the YaCy network thus depends directly on the current seed lists. I ask you therefore to configure your YaCy server as possible as "Principal server"; ie that your server records even a seed list and makes it available via a (probably existing) Web server. I propose to maintain attainable Seedlisten and the current network definition in this wiki article; it is interested always up to date.

Currently known Seed lists:

 http://search.db0uhi.as64636.de.ampr.org/www/seed.txt
 http://web.do4bz.as64636.de.ampr.org/seed.txt
 http://dk0mav.as64643.de.ampr.org/yacy/seed.txt
 http://db0bi.as64635.de.ampr.org:8090/www/seed.txt
 http://db0hmk.ampr.org/seed.txt
 http://44.225.93.8:8090/www/seed.txt
 http://search.db0sda.ampr.org/yacy/seedlist.html
 http://44.142.162.18/yacy/seed.txt
 http://44.225.164.3:8090/yacy/seedlist.html
 http://db0alg.ampr.org/yacy/seed.txt

problems

power Limited

The dynamics of Hamnets and the experimentation of users and admins cause a high turnover. Since the abolition of DB0FHW (here the search engine project has been started) is the YaCy network collapsed repeatedly times in Hamnet. Therefore, some admins no longer offer the service "search engine" to.

as "orphaned" Fallen Linkstrecken made longer, the YaCy server. Unreachable peers are sorted out from the active list and no longer addressed until they make themselves again. (Mutual deadlock) In the currently small number of peers this case occurs quite frequently. Workaround case the peers no longer connect: The entries from "Copy <yacy folder> /DATA/INDEX/HAMNET/NETWORK/seed.old.heap in seed.new.heap and restart.

The star topology of HAMNETs and the consequent "islanding" in link failures resulting in the operation of peer-to-peer applications forcibly problems. Increasing the density of subscribers can counteract.

Furthermore, regular restart the server service offer. At the start, the network definitions ( "hamnet.yacy") be read accordingly and rebuilt the network.

Due to the system

Quick notice that the message exchange via DHT not (obviously) takes place. This is due to the currently low number of participants. (YaCy exchanged until 32 Teilnhemnern data via DHT procedure.) This is not a hindrance for the HAMNET. In a search all available users are queried. Up to an attendance of 32 is thus produced only no redundancy of data.

Do's and don'ts

should order to keep the traffic in HAMNET low initially only Web server to be indexed in their own environment. Automatically "crawl" of HAMWEB servers and PR mailboxes Webfrontend can cause large amounts of data. Under "experts Crawl Home", further parameters can be adjusted.

Please note the following parameters:

^44\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$