shadowFT re-sharing the FastTrack network version 1.4 09 Nov 2001 Copyright (C) 2001 J.A. Bezemer Released under GNU GPL >=2 http://www.dddi.nl/~costar/shadowFT FEATURES -------- - Fully automatic scanning and indexing of FastTrack nodes - Indexing of manually found FastTrack nodes - Calculation of interesting FastTrack network statistics - Sharing of scanned indexes on the shadowFT network - Searching in all indexes re-shared on the shadowFT network - Downloading from FastTrack nodes as indicated by shadowFT search results - Sharing local content on the shadowFT network - !HOT! KZA Aided Scanning[tm] QUICKSTART ---------- ~john$ su - Install the necessary packages Password: (everyone runs Debian, right? ;-) /root# apt-get install bc screen netcat \ sed bash grep textutils shellutils \ findutils fileutils bsdutils gcc \ libc6-dev make patch libreadline-dev /root# exit We do EVERYTHING else as non-root ~john$ mkdir shadowFT; cd shadowFT Make new dir Download the package ~john/shadowFT$ wget http://www.dddi.nl/~costar/shadowFT/shadowFT-1.4.tar.gz ~john/shadowFT$ tar xzvf shadowFT-1.4.tar.gz Unzip it ~john/shadowFT$ cd shadowFT-1.4 ~john/shadowFT/shadowFT-1.4$ make Compile a few things (if this fails, see below) ~john/shadowFT/shadowFT-1.4$ ./sft scan auto 24 Start scanning ~john/shadowFT/shadowFT-1.4$ ./sft scan auto 24 ~john/shadowFT/shadowFT-1.4$ ./sft scan auto 65 ~john/shadowFT/shadowFT-1.4$ ./sft scan auto 66 ~john/shadowFT/shadowFT-1.4$ ./sft scan auto 128 ~john/shadowFT/shadowFT-1.4$ ./sft scan update Start the updater ~john/shadowFT/shadowFT-1.4$ ./sft share start Share scanned indexes ~john/shadowFT/shadowFT-1.4$ ./sft web start Start web interface ~john/shadowFT/shadowFT-1.4$ ./sft search Search for something gnut-sFT(search)> find mich jack smooth criminal mp3 [After a while, press Spaces/Enters to view results] : 24)michael jackson - smooth criminal.mp3 REF: x4118279x12124x (97x) 207. 46.197.159:1214 size: 4.118M speed: 28 rating: * : gnut-sFT(search)> get 24 Download something we like Starting shadowFT download(s) in background. gnut-sFT(search)> exit ~john/shadowFT/shadowFT-1.4$ cd ~/shadowFT/downloads Check the progress ~john/shadowFT/downloads$ ls -alF -rw-r--r-- 1 john john 2089894 Nov 7 13:45 x4118279x12124x -rw-r--r-- 1 john john 2895 Nov 7 13:45 x4118279x12124x.ipp -rw-r--r-- 1 john john 667 Nov 7 13:45 x4118279x12124x.log -rw-r--r-- 1 john john 24 Nov 7 13:45 x4118279x12124x.sta -rw-r--r-- 1 john john 204 Nov 7 13:45 x4118279x12124x.url [And after a short while:] ~john/shadowFT/downloads$ ls -alF -rw-r--r-- 1 john john 4118279 Nov 7 13:46 Michael Jackson - Smooth Criminal.mp3 Impressed? And this is only the beginning.. ;-) INTRODUCTION ------------ The FastTrack network (better known by the client names KaZaA, Grokster and MusicCity/Morpheus) is at this moment the biggest file sharing network with user counts that equal or surpass those of the Napster network before that was effectively shut down. There are many interesting properties of the FT network, but the most important ones in this context are that it is entirely closed source and that it uses strong cryptography to prevent unauthorized (non-advertising/open source) clients to access the network. There are some efforts to create a completely open FastTrack alternative, under the name "openFT". However, any such new network would require large amounts of popular content before people will switch to using it. The FT network, with on average 500,000+ users online, provides enormous amounts of readily available content, but the closed nature of it seems to prevent transferring this content to other networks' search facilities. Fortunately, the FT protocol apparently specifies that every FT "node" (i.e. computer running FT software) should have a small HTTP-like server running on port 1214 that can produce a plaintext list or index of shared files on that node, when asked for it. So, when the IP address of a FT node is known, the index can be requested and shared via different means than the FT network. This is what shadowFT is all about. The problem is how to get the IP addresses of the FT nodes. One possibility that works very well is just to generate random IP addresses and try to access port 1214. Certain ranges of IP addresses have a relatively high hit probability, for example approximately one of every 100 addresses in the 24.x.x.x range runs a FT client. This is multiplied by (nearly) two, as most FT clients point to one other FT client that can be scanned as well (this is a so-called "supernode", but that's not relevant in this context). When trying only one "smart" random IP address per second, this gives an average of about one FT node per minute or well over 5,000 FT nodes per week. Another possibility is to monitor official clients in one way or another, and extract IP addresses from data gathered that way. One such method is available in this package and it works very well, resulting in many more FT nodes per minute, of course at the cost of notably increased scanning bandwidth. The ultimate goal of shadowFT is to provide start-up content to an open FastTrack alternative network, however such a network does not yet exist. In the mean time, one can use the scanned indexes for one's own pleasure, but it is of course much more fun to share that data on a temporary dedicated small network. For that purpose, a heavily patched version of the excellent (originally Gnutella network client) "gnut" is supplied, which forms the basis of the shadowFT network. This version of "gnut" also allows for quick searching in (big parts of) the collectively scanned indexes. Combined with an enhanced version of "giFTget", this enables fast, reliable and completely backgrounded downloading, unlike many other systems. Especially for people who want to share their own content (rather than indexes) on the shadowFT network, a version of "giFT" is available that essentially functions as a non-networked web server that can be scanned and downloaded from in exactly the same way as any other "official" FT node. I want to stress again that the shadowFT network is a temporary hack to do useful things with the scanned indexes during the time that an open FastTrack alternative is not yet available. The shadowFT network is intended to disappear as soon as there is something better to share the scanned indexes on. By the way, you can forget all program names mentioned here, since there's one single front-end command "sft" that knows how to do everything requested. INSTALLATION ------------ First make sure you have all necessary software packages installed. These are (GNU versions preferred where applicable): textutils, shellutils, findutils, fileutils and bsdutils sed grep bc netcat (command "nc") and a C development environment, as in gcc, make, patch and (if available) a libreadline development package. And you need a functioning /dev/urandom. The netcat souce code is available everywhere, for example http://ftp.us.debian.org/debian/pool/main/n/netcat/netcat_1.10.orig.tar.gz For KZA Aided Scanning[tm] (which works only on i386 Linux), screen is required as well; source code everywhere, for example ftp://ftp.uni-erlangen.de/pub/utilities/screen/ Note: the not-really-standard programs "nc" and "screen" do not have to be in your $PATH; their exact locations can be configured in config.sh (search for NC= and SCREEN=). The shadowFT system has a default configuration which requires that the (non-root!!) user running the stuff is able to write in the shadowFT directory structure at all times. The scanned indexes require large amounts of disk space; when scanning only one host per second, expect about 20 MB growth per day or 600 MB per month. For multi-user and NFS-networked computers, having the shadowFT directory structure accessible by all interested users is very handy too. Therefore it is strongly advised that you place the shadowFT stuff somewhere under your home directory, because that is usually the only location fitting all needs. (Note: you _can_ install all shadowFT stuff in places like /usr/local, but that requires many not-really-trivial changes to the config file. Public access is easily possible via a single symlink, see below.) So extract the .tar.gz archive in some directory under your $HOME. Anywhere will do, but ~/shadowFT/ or ~/src/shadowFT/ are the most usual locations. Then cd to the newly created directory and give the command "make". This will configure and compile all necessary things. If something fails here, it should be relatively easy to figure out; the top-level Makefile is simple enough. The last item, the kza stuff, is optional (only useful for i386 Linux systems) and is allowed to fail. CONFIGURATION ------------- For easy personal access, you can symlink to the "sft" script from your $HOME/bin directory. For easy system-wide access, you can symlink to it from /usr/local/bin or the like. If you do any symlinking at all, you MUST edit the sft script and change the BASETREE="..." line and point it to the base tree of the shadowFT installation. That is, the full path to the config.sh file is supposed to be $BASETREE/config.sh. If you don't do this, you'll notice. Try sft (if it's in your $PATH) or ./sft (if it's not) to see if things work all right. You should get a short usage note. At the bottom of that, a few directories are mentioned that you can change in the config.sh file (but the defaults should be fine). Note: if you experience any problem at all (and especially those related to `['), install GNU bash as your /bin/sh and try again; or alternatively point all shell scripts to the proper location of a GNU bash version 2 or higher. Also having GNU versions of textutils, shellutils, findutils and fileutils in your $PATH can make a difference. Portability was not a particularly important goal when programming this, but since most things are shell scripts, it should be relatively easy to make them work on all *UX-like systems. If you are going to run shadowFT things on a (firewall-like) computer that has both an "inside" and "outside" network connection, edit the config.sh file and change the GNUTIP variable to have the IP address of the outside network connection. Also, if you have assigned multiple IP addresses to your "outside" network card, you can have the scanning of FT nodes appear from any one of these addresses by changing the NCIP variable. By default, the first address will be taken. Note that these settings are computer-specific, and therefore grouped inside a hostname check which you should edit as well. If you run shadowFT stuff from multiple NFS-sharing computers, you can still use a single shadowFT installation and just have multiple hostname checks here (if applicable). Computers operating from _inside_ a masqueraded/NAT'ed environment don't need GNUTIP or NCIP settings since IPs will be changed by the masquerading anyway. If you have a firewall, make sure it allows (at least) outgoing connections to ports 1214, 1412 and 1413, otherwise shadowFT is useless for you. In case any user wants other settings than the system-wide defaults in sft and config.sh, several override features are available. Check the scripts for details. In the following, I will assume sft is available in the $PATH (otherwise use ./sft) and I will use $BASETREE to indicate the shadowFT installation directory. STEP 1 - SCANNING ----------------- Scanning can be done in several ways. A. Automatic scanning sft scan auto [] This will automatically scan random IP addresses. You can specify a prefix to limit scanning to a specific range, for example "sft scan auto 24" will only scan 24.x.x.x addresses and "sft scan auto 128.125" will only scan 128.125.x.x addresses. If you specify no prefix, fully random addresses will be generated; specifying a 3 or 4 byte prefix is possible but relatively useless since scanning will be completed within an hour or a second respectively, while the automatic scanning is designed to run for weeks or months. (Exception: sharing files yourself, see below.) The interesting question of course is where to search. The "quickstart" example above already mentioned the ranges with the most FT nodes, namely 24.x.x.x, 128.x.x.x, 65.x.x.x and 66.x.x.x; more than one out of every 100 IP addresses in these regions actually runs FT. But when you have scanned for a day or two, you can have a look at your own statistics (see below) and draw your own conclusions. The most interesting are the supernode statistics, since these give an accurate image of node distribution no matter what regions you have scanned. Every autoscan process scans about one host per 5 seconds. You can start many simultaneous processes to scan more; 5 processes is advised (and still doable for modem connections), while more than 20 processes will show some notable increase in bandwidth. To stop an autoscan process, just kill it. Or "killall shadowFTautoscan" to kill all of them at once. B. Update scanning sft scan update This will automatically re-scan all known FT-running hosts that you've encountered thus far. Running at least one such process is necessary to keep your indexes current, since automatic scanning takes on average more than 100 days to generate the same random IP. Update scanning will re-visit each known host every day or two. An updatescan process will re-scan about one host every 15 seconds. This is sufficient for all practical purposes, but you can run more simultaneous processes if you really want. Killing the updatescan process stops it, "killall shadowFTupdatescan" to stop them all. C. Manual scanning sft scan manu [ ] This will scan only IPs that are specified in the manual-scanning list, which defaults to $BASETREE/pub/manuscan.hosts but is overridable from the commandline. The IPs of successfully scanned hosts are saved in $BASETREE/pub/manuscan.scanned or whatever is specified on the commandline. Hosts that are scanned once this way will never be scanned again -- unless someone empties the scanned-list. The intended use on multi-user/computer setups is that there are only manuscan processes running on one computer and controlled by one user. Making the manuscan.{hosts,scanned} files writable by everyone allows people to suggest IPs to try. Note that this requires a certain amount of trust in all users that can write to these files. One manuscan process scans about 1 host per 10 seconds. You can run more processes simultaneously to scan faster (especially in combination with KZA Aided Scanning[tm], see below). Before scanning, the lists are sorted in one random way out of 384 possibilities, so multiple simultaneous manuscans should never interfere (and even if they did, it wouldn't harm). However do keep an eye on the bandwith, especially if many of the tried hosts are hits. The manuscan processes run in the foreground, as these are usually interesting to watch. To kill, press Ctrl-C and hold it down until the process has really stopped (this may take some time). But you can of course also background them in the usual way. Note that the hit rate will naturally go down with time, so if there are relatively many misses in your back-scroll buffer, it may be time to clean up the old manuscan.{hosts,scanned} files and start with a new data set. Doing this once per day should be fine. You don't have to stop and restart the manuscan processes while you do this; they will use the new data when their current pass of old data is finished. All scanning methods store their results in a tree under (default) $BASETREE/indexes/$HOSTNAME/data. A one-level hash is used, namely the last byte of the IP address (this byte is stripped from the index filenames). Summary information on hosts that were tried, succeeded and failed is saved in $BASETREE/indexes/$HOSTNAME/hosts.* . Hosts that are tried a second time (mostly during updatescan and manuscan) may have updated content. The new index is always stored in the standard location, but in case too much content is gone or changed (10+ items), the old index is appended to $BASETREE/indexes/$HOSTNAME/oldindexesbackup. This is done to prevent loss of valuable information in the event that FastTrack clients are instructed to stop serving plaintext indexes or to start serving bogus data. If you are very sure that this is not yet the case, it's safe to empty the oldindexesbackup once in a while (every few weeks or so); it's always safe to rotate and compress it like a logfile. In the interest of full disclosure: your ISP may receive complaints from people who have absolutely no clue and run extremely paranoid software that thinks you're trying to break into their system. If you're asked what you're doing, either say that you're involved in a scientific study to map the FT network and its shared content, or say that you have recently installed a very interesting screensaver but that you have noticed some strange decrease in Internet speed since then. Your choice. But DO scan, see below why. STEP 1.5 - STATISTICS --------------------- sft stats [] This command will calculate interesting statistics, among which a top- (default 20) of most-shared files. Note that this command may start taking considerable time after a week of scanning. STEP 2 - SHARING ---------------- sft share start sft share stop These commands start and stop the heavily patched version of gnut that shares your scanned indexes on the shadowFT network. At startup, gnut will read your scanned indexes and store them in memory for fast access. This is done in a special way that could use some explanation. The FT network does not reference files by name, but rather by content in the form of a checksum. This is the fundamental feature that enables multi-source parallel downloads, even when file names are different on each source. The "official" FT checksum is very long (20 bytes), but only a derived short checksum of 2 bytes is available via the HTTP index on port 1214. However, combined with the filesize (4 bytes) this forms a "reference" that is very useful for all practical purposes. In plaintext notation, this "shadowFT reference number" is written as xxx, for example x4118279x12124x. A shadowFT search is done in two steps. First, keywords are matched against file names, and each result contains the reference number of the content that has a matching name on one of the scanned FT nodes. The second step is searching again, but now for the reference number, which results in (hopefully) many FT node IP addresses that share that content under one name or another. The download program then queries all found IP addresses for their indexes and searches them for the name belonging to the reference. Note that the user only has to do the first (keyword) search interactively; the download program just needs the reference and will do the rest itself. To prevent the sharing gnut (process "gnut-sFT") from allocating too much memory, not all available name->reference and reference->IP pairs are stored in memory, but only the most recently (re-)scanned ones. This is done by looking at the modify-time of the saved index files, using the "find" command. Reference->IP pairs cost only 9 bytes (average) of memory, so many can be stored; name->reference is much more expensive, 50-150 bytes, so this needs a much lower recentness limit. Standard limits are 4 days for references (also called checksums in some places) and 5 hours for names, but this is tunable with the CHKFINDOPT and NAMFINDOPT settings in config.sh. When running five autoscan processes (so one scan per second) and one updatescan process, the default limits will cause a memory usage of about 25 MB for gnut-sFT. (This is a multithreaded process, so "top" and "ps" will show the same amount of memory for all spawned processes, but it is actually only allocated once.) NOTE: if you have less than 128 MB of memory, or are heavily using KZA Aided Scanning[tm] (which gives many many recent results), you are strongly advised to scale down both limits. In general, if you change the config.sh file, you have to stop and re-start the affected services for the changes to take effect. The sharing gnut-sFT process will be restarted every half our or so. This is done to 1) read new data (which can be done reliably only once), and 2) to prevent gnut's once-famous memory leaks from getting too much impact. The output of gnut-sFT is appended to ~/shadowFT/tmp/gnutsFT.log. Note that each user can only run one instance of gnut-sFT per computer (pid and info are saved to host-specific files in ~/shadowFT/tmp), but more aren't needed anyway. Some more things about this version of gnut. It runs on port 1412 (or a bit higher). It is probably the only version of gnut that can be used to build a stable Gnutella-like network in and of itself because automatic ping generation ("updating") was added. Because the matching takes relatively much CPU time, only one query per second will be processed (QUERYSECS in connection.c); this is also the cause for it being run at nice +18. To reduce bandwidth usage, maximally 25 results will be returned; with 100 bytes per result that means 2.5 kB packets which take about half a second to transfer over a modem. For overbroad queries, we will always reach that limit simply because we know so much. But experience shows that usually no more than 15-20 results can be found for accurate, specific queries, so this limit encourages such behaviour. Also note that this is a per-host limit and that the query will be processed by many hosts in parallel. The "availability count" after the reference ("(32x)" etc.) is calculated by gnut-sFT on-demand at the first encounter of a particular result, and saved for later use. Maximally 100 occurrences are counted, but only 25 of these will be returned on a reference query; the idea is that other shadowFT nodes will return more results of very popular files (even if they don't currently know them by name). The shadowFT network is intentionally fully incompatible with the existing Gnutella network and all existing clients for it. It is impossible to couple both networks directly; while it is possible to write a bridge program, please don't do that. Why? Simply because it would completely defy the scalability of the shadowFT network (also see below). STEP 3 - SEARCHING ------------------ Searching is possible via several means. A. Gnut searching sft search This will start a (text mode) gnut process that you can use interactively. This is the preferred search (and download) method because it runs everywhere, doesn't require additional programs and has some very nice features. If you have used the "real" gnut before, you should feel immediately comfortable, and even more when you realize that you don't (shouldn't) have to worry about getting and/or staying connected to the shadowFT network. If you're new to gnut, here is a short introduction. - find Start searching for keyword(s). Results will come in and a counter will be shown. When enough results have arrived, press Space or Enter to view them (see "r"). Results may keep coming in while doing other things; use "r" to see an updated list. - set wait_after_find [0|1] Either display the result counter after a "find" command or return the prompt immediately. - r [] or response [] View or re-view current results of last search; view all or only those matching (which is a regexp). Space or Enter for next page, Q to return to prompt. By default, results are sorted by decreasing size, so that identical files (possibly with different names) are grouped together. - set paginate [0|1] Wait for key between pages or show everything at once (in case you prefer your scroll-back buffer). Note that paginating is a little buggy but that's usually no problem. - get Start downloading the specified result number(s), which is/are taken from the last displayed result list (i.e. results that have come in after the last "r" command will be ignored here). The download is started in the background, so you can quit gnut at any time. Any number of downloads can run simultaneously, limited only by available process memory. Note that you can download any unique reference only once until it is completed; you also need to run a gnut web service for this, see below. - play As "get" above, but the files will be played/viewed once the download is completed. Both downloading and playing/viewing is done in the background. See below for more info. - i or info Show a few statistics and the shadowFT nodes you're currently connected to. - q or quit or e or exit (or Ctrl-C, but _not_ Ctrl-D) Quit the program. - help [] Information on commands. B. Web browser searching sft web start sft web stop These commands start and stop a gnut instance that provides a search facility to use with your web browser (and it's also used for downloading). This service is running on port 1412 or 1413 or maybe a bit higher, and you can point your web browser to it. The search interface is very simple. Two problems with this method are that results are not visible until the search is over, and results are not sorted in any useful way. Access to the web interface is not limited to your local system; everyone who knows where it is can access it. If you have a problem with that, you can add a "-p " option to the $GNUTWEB command in 4-share/gnutrunnerweb, so that it starts on a non-standard port, and also add a "set hidden 1" setting to the gnutrc script that is generated in 4-share/start-gnutweb, so that the web service's location will not be advertised on the shadowFT network. The actual port and IP address are saved to $BASETREE/pub/gnutweb.savedconfig, search for "local_"; this is also how other programs (anywhere on the NFS-network, if applicable) can automatically find the correct location of the web service. Note that there can be only one instance of the gnut web service running per machine or multi-machine-NFS-network (pid and info is saved to $BASETREE/pub and ~shadowFT/tmp), but that's all what's needed. C. Long-term searching sft keepsearching If you are searching for a very specific file that is not widely shared, you will notice one of the limitations of the system which is that much more data is available than can possibly be shared. To solve this problem, you can use the "keepsearching" command, which sends queries to the local gnut web interface (which must be running, see above) approximately once every half hour. After 10 minutes of waiting for new results (which are shown immediately), they are merged with previous results, sorted by decreasing size, and shown again, wait, repeat. This is very suitable to leave running overnight, hoping that your back-scroll buffer is large enough (otherwise look in ~/shadowFT/tmp/shadowFTkeepsearching*). Or you can redirect its output to a file and leave it running in the background. About the search results of all methods: look at the "availability counts" to see how popular files are. Of very popular files, often a few "unpopular" matches are found that have exactly the same length but a different checksum; these are bad or incomplete downloads and you should ignore them. While you will usually search for keywords (file names), it's also possible to search directly for reference numbers and see how many IP addresses sharing that content are known on the shadowFT network. Also remember that the gnut-share processes only consider one search request per second to keep CPU usage within limits. So please don't generate more than one query per minute, otherwise you will not only cause inconvenience for others, but also certainly for yourself. STEP 4 - DOWNLOADING -------------------- There are several methods to _start_ a download, but the actual downloading is always done by the same program (set of programs really). This program needs a gnut web interface to be running, see above. On NFS- sharing local networks, only one web interface needs to run (anywhere); its presence will be detected via $BASETREE/pub/gnutwebrunner.pid and its exact location via $BASETREE/pub/gnutweb.savedconfig (grep "local_"). A. Command-line downloading sft get [view|noview] This is the command that every method calls eventually, but you can also call it directly. The is the xxx "number", for example x4118279x12124x. The download is always started in the background, and is saved by default in ~/shadowFT/downloads. You can also watch the progress there. A few details: "shadowFTget" starts three processes. Process one queries the gnut web interface with the specified reference number and saves the resulting IP addresses and ports in the .ipp file. Process two takes the addresses from that file, tries to connect and get the full index; the matching full filename (URL) is extracted and saved to the .url file. Process three finally starts the giFTget process which actually does the download, and repeatedly passes the full URLs to it. giFTget produces a .log file with some progress details, and a .sta file which indicates which portions of the file have been downloaded already. The downloaded file itself has a temporary name equal to the reference. Once the download is completed, the file is renamed to the most popular name from the URLs file (with a . extension if that name already existed); also several semaphore-like .stop* files are used to tell the other processes to stop. If "view" is specified on the command line, or RUNVIEWER is set to 1 in config.sh and _no_ "noview" is specified, the downloaded file will be viewed or played by an appropriate program. See the "viewers" file for details. Because of the file names, the same content (i.e. same reference number) can not be downloaded more than once simultaneously; you have to wait till all associated processes are finished before the download can be started again (this may take up to 10 minutes). To stop a download halfway, kill -KILL the associated giFTget process. All other associated processes will be stopped automatically a while later. Already downloaded data is kept intact, and the download can be continued by just starting it again. If you want to continue a download that wasn't stopped properly (for example after a system reboot), first delete everything except the and .sta files. B. Gnut-search downloading gnut-sFT(search)> get or play As indicated above, these can be used to start downloading. The mechanism used is that "sft helperget" is called with fake URLs of the specified results. The helperget program extracts the reference number(s) from its commandline arguments and starts the appropriate "sft get" processes. Note that "get" forces "noview" behaviour, and "play" forces "view". The RUNVIEWER default in config.sh is not used in this case. C. Web browser downloading The result list of a web-based search action contains interesting URLs for all results. Clicking on those results will contact the gnut web service again with an "echo" request. The web service then sends a file with type "application/x-gnutsft" that contains that same URL. If you have configured a viewer application for that file type, the browser will save the contents of the transferred file (so in this case the URL) to a temporary file and start the viewer application with the name of the tempfile as parameter. We can use that feature in combination with "sft helperget" to start downloading a certain file. In Netscape for example, select menu "Edit", "Preferences", then expand the "Navigator" category and click on "Applications". The "New" button brings up a form; enter "application/x-gnutsft" (without the quotes) in the "MIMEType" field, select the "Application" radio button and enter "sft helperget %s" (without the quotes) on the "Application" text line. Leave the rest blank, "OK" twice to confirm. In some browsers, you may have to enter extra quotes as in "sft helperget '%s'" or "sft helperget "%s"", or something entirely different; look at similar entries to see what's needed. If "sft" is not in your $PATH, you of course need to supply the full path of it. If your browser can only pass a URL to a viewer program (and not the name of the file downloaded from that URL), that's okay too. The "helperget" program can handle everything. When used in this way, "helperget" does honour the RUNVIEWER default setting from config.sh, but you can specify either "sft helperget view %s" or "sft helperget noview %s" to override that. If things are configured correctly, you should be able to click on one of the search results and get a message "starting download in background" a second later. Both with gnut-search and web browser downloading, it doesn't matter if a name->reference or reference->IP search was performed; the reference is mentioned in all returned search results and that's all the helperget program needs. Note that a downloaded file may sometimes have glitches or corrupt places halfway the file. This is caused by the FastTrack checksumming method, which only looks at a limited number of intervals in the file and does not care about the rest. So if someone shares a file which is incorrect/ corrupted in places where the checksum doesn't look, the checksum will be equal to that of the original file and we will happily download it. If this happens to you, just wait a few hours and try to download the same reference again; you'll get either a fully correct download or possibly corruption in other places. A different (and more difficult) approach is to fake a stopped download with missing data only in the corrupted parts and start the download again to fill it up; the .sta file is the key here. STEP 5 - SHARING FILES ---------------------- sft sharefiles start [] sft sharefiles stop sft scan auto The shadowFT network was never intended for sharing files (only scanned indexes), but if you insist it is possible to share files too. This is actually done in the same index-sharing way, but now the index of your own shared files. The "sharefiles start" command starts a modified giFT process in the background that does not try to connect to any network but just sits waiting for incoming requests like a web server. By default, your download directory will be shared (~/shadowFT/downloads), but you can override this from the command line. To make your content available for searching on the shadowFT network, it has to be scanned. This is possible with manual scanning, but running a dedicated autoscan process is more convenient. Be sure to specify your "outside" IP address, since that address is the only one that can be used to connect to you. When auto-scanning a single host, it will be re-scanned every 10 minutes (in addidion to the regular update scanning) which means that your content will be available for searching always. Note however that the current sharing implementation in giFT is apparently relatively unfeatureful, which means that an unlimited number of simultaneous connections will be accepted and that everyone can download at unlimited speeds. KZA AIDED SCANNING[tm] ---------------------- sft kza [] sft scan manu [ ] If you are searching for very new or relatively unpopular content, or just want to scan really large amounts of hosts very quickly, then use KZA Aided Scanning[tm]. This uses a should-be-patent-pending ingenious method to automate the use of and extract data from any console application, in this case the i386 Linux version of the closed-source "kza" program. Download the i386 Linux version of kza from http://download.kazaa.com and extract it somewhere (anywhere). Version 0.4 ("alpha release") is currently the only available version and is therefore the only version that the current automation implementation is designed for. First run kza manually to make sure logging in and searching works correctly, especially that you shouldn't have to log in explicitly any more after the first run (since the automation won't do that for you). If everything is okay, copy the kza binary to the X-kza/ directory, or point the KZA= setting in config.sh to the correct location. Then find or create a list of words to search for, or actually lines of (possibly multiple) words since one complete line is considered a search query. A list with less than 30 lines is considered "specific searching", all lines will be entered completely and in sequence in one kza run. More than 30 lines is considered "generic searching" (for example /usr/share/dict/american-english); per kza run, 15 random words are selected and the first 4 characters of them are entered as search queries, which usually results in many files from many IP addresses. As implicated, kza is stopped after one round of searching, and a waiting period of 15 minutes is observed to let the scanners catch up a bit. After that, kza is started again, with either the same specific queries or other random general queries. Finally start the process with "sft kza ". This must be done in the foreground in a (possibly virtual, possibly remote) terminal, not in the background, but a minimized X terminal window is fine. Everything will then run automatically, and the (default) $BASETREE/pub/manuscan.hosts file will be filled with high-probability IP addresses. Run one or more manuscan processes to scan them. If you don't want to use the default manuscan lists, you can override them from the commandline of both actions. The scanned indexes will become available on the shadowFT network after the next sharing-gnut restart (about every 30 minutes). During the kza searching, a few strange things may be shown on the screen, most notably parts of the current search words. This is normal behaviour. Do not press any key while the kza search is running, as that will disturb the process (even Ctrl-C may do quite weird things). To stop the kza process, wait until the "waiting" notice is displayed, then press Ctrl-C. To stop the manual scanners, press & hold Ctrl-C (as described above). You shouldn't let the kza stuff run for more than a few hours, since it will usually generate so many IP addresses that the manuscan process(es) need a day or more to properly check them all. Note that, when the shadowFT network gets very big, it might happen that you can't directly find your own kza-provided results because your gnut-share and gnut-search processes are connected on "opposite far ends" of the network. To solve this, you can have your gnut-search connect directly to your gnut-share with gnut's "open" command; see the gnut documentation for details (don't forget to specify the exact port number). An easier alternative is to let a "keepsearching" process run for some time, or just grep through your own tree of saved indexes. FAQS ---- 1. Where do all those files get stored and why? Scanned indexes go to $BASETREE/indexes/$HOSTNAME/ Downloads go to ~/shadowFT/downloads/ And there are (default) three locations for "temporary"/"generated" files: Public files: $BASETREE/pub/ Private files, either long-term or very big: ~/shadowFT/tmp/ Private files, short-term _and_ small: /tmp/shadowFT-$USER/tmp/ Public files (like manuscan lists and web interface config) need to go to a place where everyone can find them (for multi-user and multi-computer setups), and the "only" publicly known and available place is under $BASETREE. Big tempfiles can get _very_ big (hunderds of MBs, especially when running "sft stats") and that space may not be available in /tmp. Long-term tempfiles should be kept safe from automatic /tmp cleaners. Note that it is recommended to check your ~/shadowFT/tmp/ directory every once in a while and delete tempfiles that definately aren't used any more. But of course you can change everything in config.sh and the other scripts. 2. Does it scale? The disk space required for the scanned indexes scales linearly with time and with the number of simultaneous scanning processes. But that's probably not what you asked for. The shadowFT network should scale well, but only if two conditions are met: 1) There should be no parasites, i.e. every host connected to the shadowFT network for any purpose SHOULD SCAN AND SHARE. Even if it's just a little bit. One autoscan and one updatescan process are enough. But if you don't do that and still run a gnut-search or gnut-web, you will only decrease the ttl of queries that are routed through you and with the ttl the chance of finding results, without adding any content back to the network. 2) There should be no strange or altered clients. The shipped version of gnut is specially tuned to keep everything witin limits. Every change (including in that "strangely low" ttl value, the number of min/max connections and the maximum number of returned results) will do bad things to bandwidth and other scalability aspects. And of course some netiquette should be observed. Don't search more than once per minute. Don't search with overbroad queries that won't return useful results anyway (use a web search engine to get more information and find better keywords). Stop the gnut-search and gnut-web processes when you won't need them for a while. 3. I have a problem running on . What to do? This stuff currently runs fine on Debian GNU/Linux 2.2 "Potato" for i386, and that's all I care about. Solve any problems yourself (also see Q. Z). 4. When will the shadowFT network be obsoleted? As soon as openFT is running and has a way to re-share the scanned indexes there. OpenFT development is done mainly using the giFT facilities (see Q. Z) and could probably use a few more talented people, both those with *UX and those with M$ development skills. When openFT is ready for your indexes, this will be announced on the giFT forum and on the shadowFT home page at http://www.dddi.nl/~costar/shadowFT, so check beck there every once in a while. 5. Why didn't you integrate this into the Gnutella network? The first tests actually _were_ "integrated" into Gnutella (as far as that was fundamentally possible), which is actually the cause for the particular format of the reference numbers. But running two gnutella clients on the same host exposed some interesting properties of the Gnutella network, namely that queries take 15-30 minutes(!) to reach the "other side" of the network, IF they reach it at all, and that query replies never arrive back because one of the intermediate nodes has been disconnected in the mean time. The idea of the shadowFT network is that eventually many redundant copies of the scanned indexes will be present and that a "very" small search horizon (ttl=6, a few thousand shadowFT nodes) will be sufficient to get everything we want. Of course possibly combined with "keepsearching" to try again every once in a while, not only because new indexes are loaded regularly, but also because the network keeps slowly reconnecting itself. (The download program keeps searching automatically already). Now you hopefully understand the answer to Q. 2 better, and also why gnut-sFT is intentionally incompatible with the Gnutella network. 6. Shouldn't you explain the details of in this README? No. Go read the source. And the gnut & giFT docs if you're looking for info on those things. 7. Do I have to shut everything down before a reboot? and I can't restart things after my system crashed, what to do? You don't have to shut down anything manually before a reboot; no important state information is kept in memory that needs to be written to disk. But when you try to start things again after a reboot, some programs may complain that they are still running and can't be started again. This is because some pid files and other information did (intentionally) not get removed automatically. The remedy is to simply run the appropriate "stop" command first, which will neatly clean up things. 8. Does shadowFT contain spyware? Yes, in gnut-search use the "hosts" and "monitor" commands. So don't include personal information or credit card numbers in search queries. And of course we spy on FT nodes, if that's what you mean. Y. Why did you start this? How long did it take? What did it cost you? Because ST:ENT won't get broadcasted here until 2006 or so. About 4 weeks, almost full-time. About US$200 in lost income (or actually, in extra debts). Z. Where can I post my other questions? Unfortunately I don't have any time left to answer questions via private e-mail. Please use the giFT forum on http://sourceforge.net/projects/gift/ (also accessible from http://www.giftproject.org) or the #gift channel on irc.openprojects.net. Contact me by e-mail only if you want to donate me something (preferably money, any amount in any currency as long as it's no coins) as J.A.Bezemer@dddi.nl. LEGALESE -------- Unless indicated otherwise, everything in the shadowFT package is Copyright (C) 2001 J.A. Bezemer This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA