shadowFT

                        re-sharing the FastTrack network

                                  version 1.4
                                  09 Nov 2001

                        Copyright (C) 2001  J.A. Bezemer
                           Released under GNU GPL >=2
                      http://www.dddi.nl/~costar/shadowFT

FEATURES
--------

- Fully automatic scanning and indexing of FastTrack nodes
- Indexing of manually found FastTrack nodes
- Calculation of interesting FastTrack network statistics
- Sharing of scanned indexes on the shadowFT network
- Searching in all indexes re-shared on the shadowFT network
- Downloading from FastTrack nodes as indicated by shadowFT search results
- Sharing local content on the shadowFT network
- !HOT! KZA Aided Scanning[tm]


QUICKSTART
----------

~john$ su -                                     Install the necessary packages
Password:                                    (everyone runs Debian, right? ;-)
/root# apt-get install bc screen netcat \
       sed bash grep textutils shellutils \
       findutils fileutils bsdutils gcc \
       libc6-dev make patch libreadline-dev
/root# exit                                  We do EVERYTHING else as non-root

~john$ mkdir shadowFT; cd shadowFT                                Make new dir
                                                          Download the package
~john/shadowFT$ wget http://www.dddi.nl/~costar/shadowFT/shadowFT-1.4.tar.gz
~john/shadowFT$ tar xzvf shadowFT-1.4.tar.gz                          Unzip it
~john/shadowFT$ cd shadowFT-1.4
~john/shadowFT/shadowFT-1.4$ make                         Compile a few things
                                                    (if this fails, see below)

~john/shadowFT/shadowFT-1.4$ ./sft scan auto 24                 Start scanning
~john/shadowFT/shadowFT-1.4$ ./sft scan auto 24
~john/shadowFT/shadowFT-1.4$ ./sft scan auto 65
~john/shadowFT/shadowFT-1.4$ ./sft scan auto 66
~john/shadowFT/shadowFT-1.4$ ./sft scan auto 128
~john/shadowFT/shadowFT-1.4$ ./sft scan update               Start the updater
~john/shadowFT/shadowFT-1.4$ ./sft share start           Share scanned indexes
~john/shadowFT/shadowFT-1.4$ ./sft web start               Start web interface

~john/shadowFT/shadowFT-1.4$ ./sft search                 Search for something
gnut-sFT(search)> find mich jack smooth criminal mp3
 [After a while, press Spaces/Enters to view results]
 :
24)michael jackson - smooth criminal.mp3 REF: x4118279x12124x (97x)
 207. 46.197.159:1214 size: 4.118M   speed:  28 rating: *
 :
gnut-sFT(search)> get 24                            Download something we like
Starting shadowFT download(s) in background.
gnut-sFT(search)> exit

~john/shadowFT/shadowFT-1.4$ cd ~/shadowFT/downloads        Check the progress
~john/shadowFT/downloads$ ls -alF
-rw-r--r--    1 john     john      2089894 Nov  7 13:45 x4118279x12124x
-rw-r--r--    1 john     john         2895 Nov  7 13:45 x4118279x12124x.ipp
-rw-r--r--    1 john     john          667 Nov  7 13:45 x4118279x12124x.log
-rw-r--r--    1 john     john           24 Nov  7 13:45 x4118279x12124x.sta
-rw-r--r--    1 john     john          204 Nov  7 13:45 x4118279x12124x.url

 [And after a short while:]
~john/shadowFT/downloads$ ls -alF
-rw-r--r--    1 john     john      4118279 Nov  7 13:46 Michael Jackson - Smooth Criminal.mp3


Impressed? And this is only the beginning.. ;-)


INTRODUCTION
------------

The FastTrack network (better known by the client names KaZaA, Grokster
and MusicCity/Morpheus) is at this moment the biggest file sharing network
with user counts that equal or surpass those of the Napster network before
that was effectively shut down. There are many interesting properties of
the FT network, but the most important ones in this context are that it is
entirely closed source and that it uses strong cryptography to prevent
unauthorized (non-advertising/open source) clients to access the network.

There are some efforts to create a completely open FastTrack alternative,
under the name "openFT". However, any such new network would require large
amounts of popular content before people will switch to using it. The FT
network, with on average 500,000+ users online, provides enormous amounts
of readily available content, but the closed nature of it seems to prevent
transferring this content to other networks' search facilities.

Fortunately, the FT protocol apparently specifies that every FT "node"
(i.e. computer running FT software) should have a small HTTP-like server
running on port 1214 that can produce a plaintext list or index of shared
files on that node, when asked for it. So, when the IP address of a FT
node is known, the index can be requested and shared via different means
than the FT network. This is what shadowFT is all about.

The problem is how to get the IP addresses of the FT nodes. One
possibility that works very well is just to generate random IP addresses
and try to access port 1214. Certain ranges of IP addresses have a
relatively high hit probability, for example approximately one of every
100 addresses in the 24.x.x.x range runs a FT client. This is multiplied
by (nearly) two, as most FT clients point to one other FT client that can
be scanned as well (this is a so-called "supernode", but that's not
relevant in this context). When trying only one "smart" random IP address
per second, this gives an average of about one FT node per minute or well
over 5,000 FT nodes per week.

Another possibility is to monitor official clients in one way or another,
and extract IP addresses from data gathered that way. One such method is
available in this package and it works very well, resulting in many more
FT nodes per minute, of course at the cost of notably increased scanning
bandwidth.

The ultimate goal of shadowFT is to provide start-up content to an open
FastTrack alternative network, however such a network does not yet exist.
In the mean time, one can use the scanned indexes for one's own pleasure,
but it is of course much more fun to share that data on a temporary
dedicated small network. For that purpose, a heavily patched version of
the excellent (originally Gnutella network client) "gnut" is supplied,
which forms the basis of the shadowFT network.

This version of "gnut" also allows for quick searching in (big parts of)
the collectively scanned indexes. Combined with an enhanced version of
"giFTget", this enables fast, reliable and completely backgrounded
downloading, unlike many other systems.

Especially for people who want to share their own content (rather than
indexes) on the shadowFT network, a version of "giFT" is available that
essentially functions as a non-networked web server that can be scanned
and downloaded from in exactly the same way as any other "official" FT
node.

I want to stress again that the shadowFT network is a temporary hack to do
useful things with the scanned indexes during the time that an open
FastTrack alternative is not yet available. The shadowFT network is
intended to disappear as soon as there is something better to share the
scanned indexes on.

By the way, you can forget all program names mentioned here, since there's
one single front-end command "sft" that knows how to do everything
requested.


INSTALLATION
------------

First make sure you have all necessary software packages installed. These
are (GNU versions preferred where applicable):

  textutils, shellutils, findutils, fileutils and bsdutils
  sed
  grep
  bc
  netcat (command "nc")

and a C development environment, as in gcc, make, patch and (if available)
a libreadline development package. And you need a functioning /dev/urandom.

The netcat souce code is available everywhere, for example
http://ftp.us.debian.org/debian/pool/main/n/netcat/netcat_1.10.orig.tar.gz

For KZA Aided Scanning[tm] (which works only on i386 Linux), screen is
required as well; source code everywhere, for example
ftp://ftp.uni-erlangen.de/pub/utilities/screen/

Note: the not-really-standard programs "nc" and "screen" do not have to be
in your $PATH; their exact locations can be configured in config.sh
(search for NC= and SCREEN=).


The shadowFT system has a default configuration which requires that the
(non-root!!) user running the stuff is able to write in the shadowFT
directory structure at all times. The scanned indexes require large
amounts of disk space; when scanning only one host per second, expect
about 20 MB growth per day or 600 MB per month. For multi-user and
NFS-networked computers, having the shadowFT directory structure
accessible by all interested users is very handy too. Therefore it is
strongly advised that you place the shadowFT stuff somewhere under your
home directory, because that is usually the only location fitting all
needs.

(Note: you _can_ install all shadowFT stuff in places like /usr/local, but
that requires many not-really-trivial changes to the config file. Public
access is easily possible via a single symlink, see below.)

So extract the .tar.gz archive in some directory under your $HOME.
Anywhere will do, but ~/shadowFT/ or ~/src/shadowFT/ are the most usual
locations.

Then cd to the newly created directory and give the command "make". This
will configure and compile all necessary things. If something fails here,
it should be relatively easy to figure out; the top-level Makefile is
simple enough. The last item, the kza stuff, is optional (only useful for
i386 Linux systems) and is allowed to fail.


CONFIGURATION
-------------

For easy personal access, you can symlink to the "sft" script from your
$HOME/bin directory. For easy system-wide access, you can symlink to it
from /usr/local/bin or the like.

If you do any symlinking at all, you MUST edit the sft script and change the
  BASETREE="..."
line and point it to the base tree of the shadowFT installation. That is,
the full path to the config.sh file is supposed to be $BASETREE/config.sh.
If you don't do this, you'll notice.

Try
  sft
(if it's in your $PATH) or
  ./sft
(if it's not) to see if things work all right. You should get a short
usage note. At the bottom of that, a few directories are mentioned that
you can change in the config.sh file (but the defaults should be fine).


Note: if you experience any problem at all (and especially those related
to `['), install GNU bash as your /bin/sh and try again; or alternatively
point all shell scripts to the proper location of a GNU bash version 2 or
higher. Also having GNU versions of textutils, shellutils, findutils and
fileutils in your $PATH can make a difference. Portability was not a
particularly important goal when programming this, but since most things
are shell scripts, it should be relatively easy to make them work on all
*UX-like systems.


If you are going to run shadowFT things on a (firewall-like) computer that
has both an "inside" and "outside" network connection, edit the config.sh
file and change the GNUTIP variable to have the IP address of the outside
network connection.

Also, if you have assigned multiple IP addresses to your "outside" network
card, you can have the scanning of FT nodes appear from any one of these
addresses by changing the NCIP variable. By default, the first address
will be taken.

Note that these settings are computer-specific, and therefore grouped
inside a hostname check which you should edit as well. If you run shadowFT
stuff from multiple NFS-sharing computers, you can still use a single
shadowFT installation and just have multiple hostname checks here (if
applicable).

Computers operating from _inside_ a masqueraded/NAT'ed environment don't
need GNUTIP or NCIP settings since IPs will be changed by the masquerading
anyway.

If you have a firewall, make sure it allows (at least) outgoing connections
to ports 1214, 1412 and 1413, otherwise shadowFT is useless for you.


In case any user wants other settings than the system-wide defaults in sft
and config.sh, several override features are available. Check the scripts
for details.

In the following, I will assume sft is available in the $PATH (otherwise
use ./sft) and I will use $BASETREE to indicate the shadowFT installation
directory.


STEP 1 - SCANNING
-----------------

Scanning can be done in several ways.


A. Automatic scanning

  sft scan auto [<prefix>]

This will automatically scan random IP addresses. You can specify a prefix
to limit scanning to a specific range, for example "sft scan auto 24" will
only scan 24.x.x.x addresses and "sft scan auto 128.125" will only scan
128.125.x.x addresses. If you specify no prefix, fully random addresses
will be generated; specifying a 3 or 4 byte prefix is possible but
relatively useless since scanning will be completed within an hour or a
second respectively, while the automatic scanning is designed to run for
weeks or months. (Exception: sharing files yourself, see below.)

The interesting question of course is where to search. The "quickstart"
example above already mentioned the ranges with the most FT nodes, namely
24.x.x.x, 128.x.x.x, 65.x.x.x and 66.x.x.x; more than one out of every 100
IP addresses in these regions actually runs FT. But when you have scanned
for a day or two, you can have a look at your own statistics (see below)
and draw your own conclusions. The most interesting are the supernode
statistics, since these give an accurate image of node distribution no
matter what regions you have scanned.

Every autoscan process scans about one host per 5 seconds. You can start
many simultaneous processes to scan more; 5 processes is advised (and
still doable for modem connections), while more than 20 processes will
show some notable increase in bandwidth.

To stop an autoscan process, just kill it. Or "killall shadowFTautoscan"
to kill all of them at once.


B. Update scanning

  sft scan update

This will automatically re-scan all known FT-running hosts that you've
encountered thus far. Running at least one such process is necessary to
keep your indexes current, since automatic scanning takes on average more
than 100 days to generate the same random IP. Update scanning will
re-visit each known host every day or two.

An updatescan process will re-scan about one host every 15 seconds. This
is sufficient for all practical purposes, but you can run more
simultaneous processes if you really want.

Killing the updatescan process stops it, "killall shadowFTupdatescan" to
stop them all.


C. Manual scanning

  sft scan manu [<hosts-to-scan-list> <hosts-scanned-list>]

This will scan only IPs that are specified in the manual-scanning list,
which defaults to $BASETREE/pub/manuscan.hosts but is overridable from the
commandline. The IPs of successfully scanned hosts are saved in
$BASETREE/pub/manuscan.scanned or whatever is specified on the
commandline. Hosts that are scanned once this way will never be scanned
again -- unless someone empties the scanned-list.

The intended use on multi-user/computer setups is that there are only
manuscan processes running on one computer and controlled by one user.
Making the manuscan.{hosts,scanned} files writable by everyone allows
people to suggest IPs to try. Note that this requires a certain amount of
trust in all users that can write to these files.

One manuscan process scans about 1 host per 10 seconds. You can run more
processes simultaneously to scan faster (especially in combination with
KZA Aided Scanning[tm], see below). Before scanning, the lists are sorted
in one random way out of 384 possibilities, so multiple simultaneous
manuscans should never interfere (and even if they did, it wouldn't harm).
However do keep an eye on the bandwith, especially if many of the tried
hosts are hits.

The manuscan processes run in the foreground, as these are usually
interesting to watch. To kill, press Ctrl-C and hold it down until the
process has really stopped (this may take some time). But you can of
course also background them in the usual way.

Note that the hit rate will naturally go down with time, so if there are
relatively many misses in your back-scroll buffer, it may be time to clean
up the old manuscan.{hosts,scanned} files and start with a new data set.
Doing this once per day should be fine. You don't have to stop and restart
the manuscan processes while you do this; they will use the new data when
their current pass of old data is finished.


All scanning methods store their results in a tree under (default)
$BASETREE/indexes/$HOSTNAME/data. A one-level hash is used, namely the
last byte of the IP address (this byte is stripped from the index
filenames). Summary information on hosts that were tried, succeeded and
failed is saved in $BASETREE/indexes/$HOSTNAME/hosts.* .

Hosts that are tried a second time (mostly during updatescan and manuscan)
may have updated content. The new index is always stored in the standard
location, but in case too much content is gone or changed (10+ items), the
old index is appended to $BASETREE/indexes/$HOSTNAME/oldindexesbackup.
This is done to prevent loss of valuable information in the event that
FastTrack clients are instructed to stop serving plaintext indexes or to
start serving bogus data. If you are very sure that this is not yet the
case, it's safe to empty the oldindexesbackup once in a while (every few
weeks or so); it's always safe to rotate and compress it like a logfile.


In the interest of full disclosure: your ISP may receive complaints from
people who have absolutely no clue and run extremely paranoid software
that thinks you're trying to break into their system. If you're asked what
you're doing, either say that you're involved in a scientific study to map
the FT network and its shared content, or say that you have recently
installed a very interesting screensaver but that you have noticed some
strange decrease in Internet speed since then. Your choice. But DO scan,
see below why.


STEP 1.5 - STATISTICS
---------------------

  sft stats [<number>]

This command will calculate interesting statistics, among which a
top-<number> (default 20) of most-shared files. Note that this command may
start taking considerable time after a week of scanning.


STEP 2 - SHARING
----------------

  sft share start
  sft share stop

These commands start and stop the heavily patched version of gnut that
shares your scanned indexes on the shadowFT network. At startup, gnut will
read your scanned indexes and store them in memory for fast access. This
is done in a special way that could use some explanation.

The FT network does not reference files by name, but rather by content in
the form of a checksum. This is the fundamental feature that enables
multi-source parallel downloads, even when file names are different on
each source. The "official" FT checksum is very long (20 bytes), but only
a derived short checksum of 2 bytes is available via the HTTP index on
port 1214. However, combined with the filesize (4 bytes) this forms a
"reference" that is very useful for all practical purposes. In plaintext
notation, this "shadowFT reference number" is written as
x<filesize>x<shortchecksum>x, for example x4118279x12124x.

A shadowFT search is done in two steps. First, keywords are matched
against file names, and each result contains the reference number of the
content that has a matching name on one of the scanned FT nodes. The
second step is searching again, but now for the reference number, which
results in (hopefully) many FT node IP addresses that share that content
under one name or another. The download program then queries all found IP
addresses for their indexes and searches them for the name belonging to
the reference. Note that the user only has to do the first (keyword)
search interactively; the download program just needs the reference and
will do the rest itself.

To prevent the sharing gnut (process "gnut-sFT") from allocating too much
memory, not all available name->reference and reference->IP pairs are
stored in memory, but only the most recently (re-)scanned ones. This is
done by looking at the modify-time of the saved index files, using the
"find" command. Reference->IP pairs cost only 9 bytes (average) of memory,
so many can be stored; name->reference is much more expensive, 50-150
bytes, so this needs a much lower recentness limit. Standard limits are 4
days for references (also called checksums in some places) and 5 hours for
names, but this is tunable with the CHKFINDOPT and NAMFINDOPT settings in
config.sh. When running five autoscan processes (so one scan per second)
and one updatescan process, the default limits will cause a memory usage
of about 25 MB for gnut-sFT. (This is a multithreaded process, so "top"
and "ps" will show the same amount of memory for all spawned processes,
but it is actually only allocated once.)

NOTE: if you have less than 128 MB of memory, or are heavily using KZA
Aided Scanning[tm] (which gives many many recent results), you are
strongly advised to scale down both limits.

In general, if you change the config.sh file, you have to stop and
re-start the affected services for the changes to take effect.

The sharing gnut-sFT process will be restarted every half our or so. This
is done to 1) read new data (which can be done reliably only once), and 2)
to prevent gnut's once-famous memory leaks from getting too much impact.

The output of gnut-sFT is appended to ~/shadowFT/tmp/gnutsFT.log.

Note that each user can only run one instance of gnut-sFT per computer
(pid and info are saved to host-specific files in ~/shadowFT/tmp), but
more aren't needed anyway.


Some more things about this version of gnut. It runs on port 1412 (or a
bit higher). It is probably the only version of gnut that can be used to
build a stable Gnutella-like network in and of itself because automatic
ping generation ("updating") was added. Because the matching takes
relatively much CPU time, only one query per second will be processed
(QUERYSECS in connection.c); this is also the cause for it being run at
nice +18.

To reduce bandwidth usage, maximally 25 results will be returned; with 100
bytes per result that means 2.5 kB packets which take about half a second
to transfer over a modem. For overbroad queries, we will always reach that
limit simply because we know so much. But experience shows that usually 
no more than 15-20 results can be found for accurate, specific queries, so
this limit encourages such behaviour. Also note that this is a per-host
limit and that the query will be processed by many hosts in parallel.

The "availability count" after the reference ("(32x)" etc.) is calculated
by gnut-sFT on-demand at the first encounter of a particular result, and
saved for later use. Maximally 100 occurrences are counted, but only 25 of
these will be returned on a reference query; the idea is that other
shadowFT nodes will return more results of very popular files (even if
they don't currently know them by name).

The shadowFT network is intentionally fully incompatible with the existing
Gnutella network and all existing clients for it. It is impossible to
couple both networks directly; while it is possible to write a bridge
program, please don't do that. Why? Simply because it would completely
defy the scalability of the shadowFT network (also see below).


STEP 3 - SEARCHING
------------------

Searching is possible via several means.


A. Gnut searching

  sft search

This will start a (text mode) gnut process that you can use interactively.
This is the preferred search (and download) method because it runs
everywhere, doesn't require additional programs and has some very nice
features.

If you have used the "real" gnut before, you should feel immediately
comfortable, and even more when you realize that you don't (shouldn't)
have to worry about getting and/or staying connected to the shadowFT
network. If you're new to gnut, here is a short introduction.

- find <keyword [..]>
  Start searching for keyword(s). Results will come in and a counter will
  be shown. When enough results have arrived, press Space or Enter to view
  them (see "r"). Results may keep coming in while doing other things; use
  "r" to see an updated list.

- set wait_after_find [0|1]
  Either display the result counter after a "find" command or return the
  prompt immediately.

- r [<match>]  or  response [<match>]
  View or re-view current results of last search; view all or only those
  matching <match> (which is a regexp). Space or Enter for next page, Q to
  return to prompt. By default, results are sorted by decreasing size, so
  that identical files (possibly with different names) are grouped together.

- set paginate [0|1]
  Wait for key between pages or show everything at once (in case you
  prefer your scroll-back buffer). Note that paginating is a little buggy
  but that's usually no problem.

- get <result#[,..]>
  Start downloading the specified result number(s), which is/are taken
  from the last displayed result list (i.e. results that have come in
  after the last "r" command will be ignored here). The download is
  started in the background, so you can quit gnut at any time. Any number
  of downloads can run simultaneously, limited only by available
  process memory. Note that you can download any unique reference only
  once until it is completed; you also need to run a gnut web service for
  this, see below.

- play <result#[,..]>
  As "get" above, but the files will be played/viewed once the download
  is completed. Both downloading and playing/viewing is done in the
  background. See below for more info.

- i  or  info
  Show a few statistics and the shadowFT nodes you're currently connected
  to.

- q  or  quit  or  e  or  exit  (or Ctrl-C, but _not_ Ctrl-D)
  Quit the program.

- help [<command>]
  Information on commands.


B. Web browser searching

  sft web start
  sft web stop

These commands start and stop a gnut instance that provides a search
facility to use with your web browser (and it's also used for
downloading). This service is running on port 1412 or 1413 or maybe a bit
higher, and you can point your web browser to it. The search interface is
very simple. Two problems with this method are that results are not
visible until the search is over, and results are not sorted in any useful
way.

Access to the web interface is not limited to your local system; everyone
who knows where it is can access it. If you have a problem with that, you
can add a "-p <port>" option to the $GNUTWEB command in
4-share/gnutrunnerweb, so that it starts on a non-standard port, and also
add a "set hidden 1" setting to the gnutrc script that is generated in
4-share/start-gnutweb, so that the web service's location will not be
advertised on the shadowFT network. The actual port and IP address are
saved to $BASETREE/pub/gnutweb.savedconfig, search for "local_"; this is
also how other programs (anywhere on the NFS-network, if applicable) can
automatically find the correct location of the web service.

Note that there can be only one instance of the gnut web service running
per machine or multi-machine-NFS-network (pid and info is saved to
$BASETREE/pub and ~shadowFT/tmp), but that's all what's needed.


C. Long-term searching

  sft keepsearching <keyword [..]>

If you are searching for a very specific file that is not widely shared,
you will notice one of the limitations of the system which is that much
more data is available than can possibly be shared. To solve this problem,
you can use the "keepsearching" command, which sends queries to the local
gnut web interface (which must be running, see above) approximately once
every half hour. After 10 minutes of waiting for new results (which are
shown immediately), they are merged with previous results, sorted by
decreasing size, and shown again, wait, repeat. This is very suitable to
leave running overnight, hoping that your back-scroll buffer is large
enough (otherwise look in ~/shadowFT/tmp/shadowFTkeepsearching*). Or you
can redirect its output to a file and leave it running in the background.


About the search results of all methods: look at the "availability counts"
to see how popular files are. Of very popular files, often a few
"unpopular" matches are found that have exactly the same length but a
different checksum; these are bad or incomplete downloads and you should
ignore them.

While you will usually search for keywords (file names), it's also
possible to search directly for reference numbers and see how many IP
addresses sharing that content are known on the shadowFT network.

Also remember that the gnut-share processes only consider one search
request per second to keep CPU usage within limits. So please don't
generate more than one query per minute, otherwise you will not only
cause inconvenience for others, but also certainly for yourself.


STEP 4 - DOWNLOADING
--------------------

There are several methods to _start_ a download, but the actual
downloading is always done by the same program (set of programs really).
This program needs a gnut web interface to be running, see above. On NFS-
sharing local networks, only one web interface needs to run (anywhere);
its presence will be detected via $BASETREE/pub/gnutwebrunner.pid and its
exact location via $BASETREE/pub/gnutweb.savedconfig (grep "local_").


A. Command-line downloading

  sft get <reference> [view|noview]

This is the command that every method calls eventually, but you can also
call it directly. The <reference> is the x<filesize>x<checksum>x "number",
for example x4118279x12124x. The download is always started in the
background, and is saved by default in ~/shadowFT/downloads. You can also
watch the progress there.

A few details: "shadowFTget" starts three processes. Process one queries
the gnut web interface with the specified reference number and saves the
resulting IP addresses and ports in the <ref>.ipp file. Process two takes
the addresses from that file, tries to connect and get the full index; the
matching full filename (URL) is extracted and saved to the <ref>.url file.
Process three finally starts the giFTget process which actually does the
download, and repeatedly passes the full URLs to it. giFTget produces a
<ref>.log file with some progress details, and a <ref>.sta file which
indicates which portions of the file have been downloaded already. The
downloaded file itself has a temporary name equal to the reference. Once
the download is completed, the file is renamed to the most popular name
from the URLs file (with a .<number> extension if that name already
existed); also several semaphore-like <ref>.stop* files are used to tell
the other processes to stop.

If "view" is specified on the command line, or RUNVIEWER is set to 1 in
config.sh and _no_ "noview" is specified, the downloaded file will be
viewed or played by an appropriate program. See the "viewers" file for
details.

Because of the file names, the same content (i.e. same reference number)
can not be downloaded more than once simultaneously; you have to wait till
all associated processes are finished before the download can be started
again (this may take up to 10 minutes).

To stop a download halfway, kill -KILL the associated giFTget process. All
other associated processes will be stopped automatically a while later.
Already downloaded data is kept intact, and the download can be continued
by just starting it again. If you want to continue a download that wasn't
stopped properly (for example after a system reboot), first delete
everything except the <ref> and <ref>.sta files.


B. Gnut-search downloading

  gnut-sFT(search)> get <result#[,..]>  or  play <result#[,..]>

As indicated above, these can be used to start downloading. The mechanism
used is that "sft helperget" is called with fake URLs of the specified
results. The helperget program extracts the reference number(s) from its
commandline arguments and starts the appropriate "sft get" processes.

Note that "get" forces "noview" behaviour, and "play" forces "view". The
RUNVIEWER default in config.sh is not used in this case.


C. Web browser downloading

The result list of a web-based search action contains interesting URLs for
all results. Clicking on those results will contact the gnut web service
again with an "echo" request. The web service then sends a file with type
"application/x-gnutsft" that contains that same URL. If you have
configured a viewer application for that file type, the browser will save
the contents of the transferred file (so in this case the URL) to a
temporary file and start the viewer application with the name of the
tempfile as parameter.

We can use that feature in combination with "sft helperget" to start
downloading a certain file. In Netscape for example, select menu "Edit",
"Preferences", then expand the "Navigator" category and click on
"Applications". The "New" button brings up a form; enter
"application/x-gnutsft" (without the quotes) in the "MIMEType" field,
select the "Application" radio button and enter "sft helperget %s"
(without the quotes) on the "Application" text line. Leave the rest blank,
"OK" twice to confirm.

In some browsers, you may have to enter extra quotes as in "sft helperget
'%s'" or "sft helperget "%s"", or something entirely different; look at
similar entries to see what's needed. If "sft" is not in your $PATH, you
of course need to supply the full path of it. If your browser can only
pass a URL to a viewer program (and not the name of the file downloaded
from that URL), that's okay too. The "helperget" program can handle
everything.

When used in this way, "helperget" does honour the RUNVIEWER default
setting from config.sh, but you can specify either "sft helperget view %s"
or "sft helperget noview %s" to override that.

If things are configured correctly, you should be able to click on one of
the search results and get a message "starting download in background" a
second later.


Both with gnut-search and web browser downloading, it doesn't matter if a
name->reference or reference->IP search was performed; the reference is
mentioned in all returned search results and that's all the helperget
program needs.

Note that a downloaded file may sometimes have glitches or corrupt places
halfway the file. This is caused by the FastTrack checksumming method,
which only looks at a limited number of intervals in the file and does not
care about the rest. So if someone shares a file which is incorrect/
corrupted in places where the checksum doesn't look, the checksum will be
equal to that of the original file and we will happily download it. If
this happens to you, just wait a few hours and try to download the same
reference again; you'll get either a fully correct download or possibly
corruption in other places. A different (and more difficult) approach is
to fake a stopped download with missing data only in the corrupted parts
and start the download again to fill it up; the <ref>.sta file is the key
here.


STEP 5 - SHARING FILES
----------------------

  sft sharefiles start [<directory>]
  sft sharefiles stop

  sft scan auto <your-outside-IP>

The shadowFT network was never intended for sharing files (only scanned
indexes), but if you insist it is possible to share files too. This is
actually done in the same index-sharing way, but now the index of your own
shared files.

The "sharefiles start" command starts a modified giFT process in the
background that does not try to connect to any network but just sits
waiting for incoming requests like a web server. By default, your download
directory will be shared (~/shadowFT/downloads), but you can override this
from the command line.

To make your content available for searching on the shadowFT network, it
has to be scanned. This is possible with manual scanning, but running a
dedicated autoscan process is more convenient. Be sure to specify your
"outside" IP address, since that address is the only one that can be used
to connect to you. When auto-scanning a single host, it will be re-scanned
every 10 minutes (in addidion to the regular update scanning) which means
that your content will be available for searching always.

Note however that the current sharing implementation in giFT is apparently
relatively unfeatureful, which means that an unlimited number of
simultaneous connections will be accepted and that everyone can download
at unlimited speeds.


KZA AIDED SCANNING[tm]
----------------------

  sft kza <wordlist> [<hosts-to-scan-list>]

  sft scan manu [<hosts-to-scan-list> <hosts-scanned-list>]

If you are searching for very new or relatively unpopular content, or just
want to scan really large amounts of hosts very quickly, then use KZA
Aided Scanning[tm]. This uses a should-be-patent-pending ingenious method
to automate the use of and extract data from any console application, in
this case the i386 Linux version of the closed-source "kza" program.

Download the i386 Linux version of kza from http://download.kazaa.com and
extract it somewhere (anywhere). Version 0.4 ("alpha release") is
currently the only available version and is therefore the only version
that the current automation implementation is designed for.

First run kza manually to make sure logging in and searching works
correctly, especially that you shouldn't have to log in explicitly any
more after the first run (since the automation won't do that for you).

If everything is okay, copy the kza binary to the X-kza/ directory, or
point the KZA= setting in config.sh to the correct location.

Then find or create a list of words to search for, or actually lines of
(possibly multiple) words since one complete line is considered a search
query. A list with less than 30 lines is considered "specific searching",
all lines will be entered completely and in sequence in one kza run. More
than 30 lines is considered "generic searching" (for example
/usr/share/dict/american-english); per kza run, 15 random words are
selected and the first 4 characters of them are entered as search queries,
which usually results in many files from many IP addresses. As implicated,
kza is stopped after one round of searching, and a waiting period of 15
minutes is observed to let the scanners catch up a bit. After that, kza is
started again, with either the same specific queries or other random
general queries.

Finally start the process with "sft kza <wordlist>". This must be done in
the foreground in a (possibly virtual, possibly remote) terminal, not in
the background, but a minimized X terminal window is fine. Everything will
then run automatically, and the (default) $BASETREE/pub/manuscan.hosts
file will be filled with high-probability IP addresses. Run one or more
manuscan processes to scan them. If you don't want to use the default
manuscan lists, you can override them from the commandline of both
actions. The scanned indexes will become available on the shadowFT network
after the next sharing-gnut restart (about every 30 minutes).

During the kza searching, a few strange things may be shown on the screen,
most notably parts of the current search words. This is normal behaviour.
Do not press any key while the kza search is running, as that will disturb
the process (even Ctrl-C may do quite weird things).

To stop the kza process, wait until the "waiting" notice is displayed,
then press Ctrl-C. To stop the manual scanners, press & hold Ctrl-C (as
described above).

You shouldn't let the kza stuff run for more than a few hours, since it
will usually generate so many IP addresses that the manuscan process(es)
need a day or more to properly check them all.

Note that, when the shadowFT network gets very big, it might happen that
you can't directly find your own kza-provided results because your
gnut-share and gnut-search processes are connected on "opposite far ends"
of the network. To solve this, you can have your gnut-search connect
directly to your gnut-share with gnut's "open" command; see the gnut
documentation for details (don't forget to specify the exact port number).
An easier alternative is to let a "keepsearching" process run for some
time, or just grep through your own tree of saved indexes.


FAQS
----

1. Where do all those files get stored and why?

Scanned indexes go to $BASETREE/indexes/$HOSTNAME/

Downloads go to ~/shadowFT/downloads/

And there are (default) three locations for "temporary"/"generated" files:
Public files: $BASETREE/pub/
Private files, either long-term or very big: ~/shadowFT/tmp/
Private files, short-term _and_ small: /tmp/shadowFT-$USER/tmp/

Public files (like manuscan lists and web interface config) need to go to
a place where everyone can find them (for multi-user and multi-computer
setups), and the "only" publicly known and available place is under
$BASETREE.

Big tempfiles can get _very_ big (hunderds of MBs, especially when running
"sft stats") and that space may not be available in /tmp. Long-term
tempfiles should be kept safe from automatic /tmp cleaners. Note that it
is recommended to check your ~/shadowFT/tmp/ directory every once in a
while and delete tempfiles that definately aren't used any more.

But of course you can change everything in config.sh and the other scripts.


2. Does it scale?

The disk space required for the scanned indexes scales linearly with time
and with the number of simultaneous scanning processes. But that's
probably not what you asked for.

The shadowFT network should scale well, but only if two conditions are met:

1) There should be no parasites, i.e. every host connected to the shadowFT
   network for any purpose SHOULD SCAN AND SHARE. Even if it's just a
   little bit. One autoscan and one updatescan process are enough. But if
   you don't do that and still run a gnut-search or gnut-web, you will
   only decrease the ttl of queries that are routed through you and with
   the ttl the chance of finding results, without adding any content back
   to the network.

2) There should be no strange or altered clients. The shipped version of
   gnut is specially tuned to keep everything witin limits. Every change
   (including in that "strangely low" ttl value, the number of min/max
   connections and the maximum number of returned results) will do bad
   things to bandwidth and other scalability aspects.

And of course some netiquette should be observed. Don't search more than
once per minute. Don't search with overbroad queries that won't return
useful results anyway (use a web search engine to get more information and
find better keywords). Stop the gnut-search and gnut-web processes when
you won't need them for a while.


3. I have a problem running on <any architecture/distribution>. What to do?

This stuff currently runs fine on Debian GNU/Linux 2.2 "Potato" for i386,
and that's all I care about. Solve any problems yourself (also see Q. Z).


4. When will the shadowFT network be obsoleted?

As soon as openFT is running and has a way to re-share the scanned indexes
there. OpenFT development is done mainly using the giFT facilities (see
Q. Z) and could probably use a few more talented people, both those with
*UX and those with M$ development skills.

When openFT is ready for your indexes, this will be announced on the giFT
forum and on the shadowFT home page at http://www.dddi.nl/~costar/shadowFT,
so check beck there every once in a while.


5. Why didn't you integrate this into the Gnutella network?

The first tests actually _were_ "integrated" into Gnutella (as far as that
was fundamentally possible), which is actually the cause for the
particular format of the reference numbers. But running two gnutella
clients on the same host exposed some interesting properties of the
Gnutella network, namely that queries take 15-30 minutes(!) to reach the
"other side" of the network, IF they reach it at all, and that query
replies never arrive back because one of the intermediate nodes has been
disconnected in the mean time.

The idea of the shadowFT network is that eventually many redundant copies
of the scanned indexes will be present and that a "very" small search
horizon (ttl=6, a few thousand shadowFT nodes) will be sufficient to get
everything we want. Of course possibly combined with "keepsearching" to
try again every once in a while, not only because new indexes are loaded
regularly, but also because the network keeps slowly reconnecting itself.
(The download program keeps searching automatically already).

Now you hopefully understand the answer to Q. 2 better, and also why
gnut-sFT is intentionally incompatible with the Gnutella network.


6. Shouldn't you explain the details of <whatever> in this README?

No. Go read the source. And the gnut & giFT docs if you're looking for
info on those things.


7. Do I have to shut everything down before a reboot?  and  I can't restart
things after my system crashed, what to do?

You don't have to shut down anything manually before a reboot; no
important state information is kept in memory that needs to be written to
disk.

But when you try to start things again after a reboot, some programs may
complain that they are still running and can't be started again. This is
because some pid files and other information did (intentionally) not get
removed automatically. The remedy is to simply run the appropriate "stop"
command first, which will neatly clean up things.


8. Does shadowFT contain spyware?

Yes, in gnut-search use the "hosts" and "monitor" commands. So don't
include personal information or credit card numbers in search queries.

And of course we spy on FT nodes, if that's what you mean.


Y. Why did you start this? How long did it take? What did it cost you?

Because ST:ENT won't get broadcasted here until 2006 or so. About 4 weeks,
almost full-time. About US$200 in lost income (or actually, in extra debts).


Z. Where can I post my other questions?

Unfortunately I don't have any time left to answer questions via private
e-mail. Please use the giFT forum on http://sourceforge.net/projects/gift/
(also accessible from http://www.giftproject.org) or the #gift channel on
irc.openprojects.net.

Contact me by e-mail only if you want to donate me something (preferably
money, any amount in any currency as long as it's no coins) as
J.A.Bezemer@dddi.nl.


LEGALESE
--------

Unless indicated otherwise, everything in the shadowFT package is
Copyright (C) 2001  J.A. Bezemer

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA