data flow or all-embracing library
History of the Internet
2.1 The idea 4
2.2 The ARPANET 4
The Internet grows 5
2.4 The present situation and a further outlook 5
The technical structure of the Internet
3.1 The physical structure 7
3.2 The logical structure 7
3.3 The Domain-Name-System (DNS) 8
The Internet – Indigestible data flow or all-embracing library
4.1 To much data but no information 10
4.2 Simple search strategy, based on search services 10
Case study: The status of the Queen in the Canadian 11
Case study: Output of renewable energy sources in 12
Advanced search strategy, without search services 13
An outlook into the future 14
A. Used literature and Internet sites 15
Increase of Internet hosts 16
Every new important technology has rung in a new era.
Fire, the alphabet, the discovery of gunpowder, the steam engine or
electricity. With every new era there was a change in society. The last
important change was the discovery of electricity and the steam engine. They
led humanity into the industrial revolution. People moved into the cities to
find jobs in the new growing industries. Their social behaviour changed, since
they lived in an anonymous huge city and no longer in a small town.
But the eighties and nineties of our century show,
that the industrial era is coming to an end. A new technology, which is
somewhere in the order of industrialisation, is taking over the control over
mankind’s destiny. The information age has just started.
This change goes hand in hand with the digitisation of
information. Digitisation means, that every kind of information (text, audio or
video signals) can be expressed in bits and bytes. So a computer is able to
display, store or manipulate this information. Nowadays, computers are already
capable to replace TV, radio, video, newspapers or books.
For the arithmetical skill of computers, they are
employed in nearly every office to process the enormous amount of incoming
data. You only need to imagine a stock of thousands of tools and a craftsman
who searches only one special tool. How to find it without a computer? For
exactly the same reason computers are used in science. Or how many men would be
necessary to calculate the exact trajectory of a spaceship? However people work
with computers even in their leisure time. Games, formerly only played as broad
versions, are more and more converted into interactive multimedia spectacles.
The number of computer players grows constantly and rapidly.
Scientists declare that the reason for human
predominance in nature can be found in communication. Communication means the
exchange of information. Communication enables the creation of a social order
which is a condition for a peaceful coexistence of all individuals. Since
technical progress has made computer communication possible, utilizability of
computers seems to have multiplied. LANs (Local Area Network) link many PCs (Personal
Computer). So employees of a company can share documents, information sources
and different programs. WANs (Wide Area Network) connect branch offices on a
continent with each other. But there is one very network, which is the most
important one and which ties millions of computers anywhere in the world
together: the INTERNET.
The Internet is a GAN (Global Area Network), which
means that a computer anywhere in the world can go online (can be part of the
Internet). This computer only needs a telephone connection. Entering the
Internet opens a door to a world on its own: the Cyberspace.
The Internet offers a lot of possibilities:
Searching and downloading information
Sending and receiving e-mails (electronic Mails)
Participating in discussion groups
Remote-Computing (remote control of distant computers)
Chatting with other Cyberians (citizens in the Cyberspace)
In this document, I will have a closer look on the
first point: Searching and downloading information.
At first I’ll sum up the history of the Internet. Then
I’ll describe the operation of the Internet. After that, I’ll try to develop a
search strategy, testify its success with the help of two case studies and draw
a conclusion. At the end, I’ll dare to
give an brief outlook into the future.
2. History of the Internet 
2.1 The idea
At the end of the fifties, the Department of Defence
sought a way to ensure communication between military bases and cities after a
nuclear attack. But neither any cable nor any computer would be able to resist
the power of nuclear bombs. And if there was a central authority, which should
control the network, this authority would probably be one of the first targets
to be bombed. The RAND Corporation, a company of the Department of Defence,
published a solution in 1964: a decentralised network. In such a network,
information is not directly transmitted from sender to recipient (like a
telephone connection). Since a network consists of many computers, the whole
network can be subdivided into many knots. Each computer which participates in
this network forms such a knot. In a decentralised network, information is
transmitted from knot to knot until it reaches its destination. If one knot is
destroyed, there will still remain other knots to transfer the information.
This transmission is called “Dynamic Rerouting”.
In the sixties, this draft of a
network was tested by several American universities (namely the Massachussetts
Institute of Technology (MIT) and the University of California Los Angeles
(UCLA) ). In 1968, another company of the Department of Defence, the Advanced
Research Project Agency (ARPA), developed the first decentralised network and
was in charge of it. High-Speed-Computers formed the knots of that network. In
the autumn of 1969, a knot was installed at the UCLA. By the end of that year,
a network came into existence, which was called the ARPANET and which consisted
of four knots. One of these knots could be operated by another knot via
remote-control. This means, that a user at any knot was able to control a
computer which could be right at the other end of the continent. This was of
high value since computer time was quiet precious and expensive, these
days. In 1971, this network was made up
of 15 knots. In 1972, 37 knots already formed the ARPANET. Soon the system was
extended to transmit files and news via e-mail (electronic mail). Only military
personnel or military scientists had access to that network. But this restriction was soon given up. The
first two years had shown, that the ARPANET was not mainly used for
remote-control but for information exchange.
2.3 The Internet grows
The ARPANET grew very fast, because
of its decentralised architecture.
Computer of any OS (Operating System, i.e. MACOS, MS-DOS, WINDOWS or
UNIX) were able to join this network. The computer only had to use the “Network
Control Protocol” (NCP), which was later replaced by the actual standard
“Transmission Control Protocol” (TCP/IP, where IP is the abbreviation for
Internet Protocol) in 1982. In 1973, there was the first international APRANET
connection to Great Britain and Norway. In 1983, the Milnet (Military Network) split
off from the ARPANET. But communication between these two networks was still
possible, since the connection remained. This connection was called DARPA
Internet or simpler: Internet. One year later,
the ARPANET consisted of 1000 knots (, which were from now on called
hosts because these computers are hosting the information). In 1986, the
National Science Foundation Network (NSFNET) was founded, which connected the
different networks in the USA via five high-performance computers (backbones).
This new network connected the ARPANET and several other networks (CSNET
(Computer and Science Network), Usenet(Unix network), NASA, Milnet).
2.4 The present situation and a
Internet is spreading faster than the
telephone or the fax machine. It is the best example for an operating anarchy
(since there is no central authority in control). The users gain access to many
offers concerning business, recreation, entertainment or hobby.
Right now, about 60,000,000 people
are using the Internet. Their number grows every day. According to an
estimation of the “Network Wizards” , there will
be 500,000,000 people online in the year 2000. This amount represents 8.37 per
cent of the world’s population. (For recent Internet growth, see appendix B )
But recent development has shown,
that the Internet is more and more commercialised. In 1993, 1.5 % of all web
sites (or pages) were for commercial purposes.
Three years later, 50 per cent of all web sites were commercial ones.
This development reached its peak in June 1996, when 68 % of all web sites were
offered by companies. The latest figures
of January 1997 show that the .COM sites (their names end with “.COM”) have
fallen to 62.6 per cent again.
With regard to the scientific and
commercial use, governments and local Internet providers (central knots, which
are connected to many computers like “RZ-ONLINE”) have realised the importance
of the Internet. Therefore the Internet infrastructure is improved and will be
improved in the next years with great effort. The US government has set itself
the goal of an “Information Superhighway”. Every user should be able to look
something up in any American library or have access to all public information.
In Europe, the ISDN (Integrated Services Digital Network)- standard was
introduced in 1994. ISDN offers two telephone channels for data or voice and is
nowadays exhaustively cheap available.
Besides, the software has improved.
In the beginning, the browsers (software to navigate trough the Internet) only
showed columns of characters and weren’t really comfortable. These days, the
browsers offer services to display text, music and video animation. Now, even a
novice can “surf online” using the easy interfaces.
technical structure of the Internet 
The physical structure
A complex hierarchical structure is
inherent in the Internet, since this network connects millions of computers. On
the highest level, computer centres are linked via satellites, hired telephone
lines or (more modern) fibre-optic cables. These nodal points are called
“backbones”. They can exchange information with high speed ( from 64 kbit/s to
622 Mbit/s ). This high speed is necessary with regard to the amount of
information, a single backbone has to transmit. The second level contains the
Internet providers. Internet providers are companies, which offer their on-line
services to clients while they handle their transmissions
over the instruments of the upper
level. Clients of these providers can be other providers, companies or
consumers. Internet providers are connected to their providers (backbones or
bigger providers) via hired telephone lines (at a speed of 128 kbit/s to 2
Mbit/s). On the lowest level, there is the end-consumer, who pays his provider
for being on-line. These users of the Internet use ISDN or normal telephone
lines and modems (64 kbit/s) to establish a connection to their provider.
3.2 The logical structure
The logical Internet-structure has to
execute several assignments:
communication and transmission
numerous networks (even with different operating systems)
· Assign an
individual address to every computer, which is on-line
There are many rules to regulate
these jobs. These rules are called “protocols”.
The basic protocol is the Internet
Protocol (IP). It exchanges information via a packet-orientated, indirect and
not guaranteed transmission. Packet-orientated means, that the whole
information is divided into several pieces (packets). Each of these packets
gets a number according to its position in the sent message. So the receiving
computer can compose the separated packets to the original information.
Indirect stands for that there is no direct connection between sender and
recipient (like in a telephone connection). First, every packet is sent to the
highest physical level. There it is transmitted from backbone to backbone until
it reaches the backbone of the recipient. This backbone sends the information
to the provider and the provider forwards it to the recipient. This principle
of searching a way for transmission is called “routing”. Since every packet is
anew routed, packets of the same transmission can take different routes. For
this reason, packets can be received in another order than they were mailed.
Not guaranteed means that there is no test to supervise the correctness of the
Another very important protocol is
the Transmission Control Protocol (TCP). The only difference to the IP is inherent
in the name. The transmissions of this protocol are controlled (guaranteed).
TCP offers a verification whether the received information is correct or not.
If some packets contain errors, these packets will be demanded once more by the
recipient until a correct packet is transmitted.
A combination of these two protocols
represents the standard protocol of the Internet (called TCP/IP) and works on
every operating system (MacOS, Windows, Unix, ).
3.3 The Domain-Name-System
But one problem still remains: Every
computer in the Internet needs an individual name to be the only destination of
a message. In the beginning of the Internet, “hosts [..] were assigned names in
a flat or global name space of character strings” (like ‘USC-ISIF’).
These names were stored in a central list of
in 1986, this
central list of
names contained about
entries and couldn’t be extended. So
a new hierarchical structure for names was introduced: the Domain-Name-System.
A domain is a network, which consists of subordinated computers or networks. If
a domain is subordinated to another domain, the first one is called a
“sub-domain”. In accordance to the DNS, a name or address consists of a user-id
and the domain. The domain of the highest level (which stands is the last
component of the address) is called the top-level domain. If there are
sub-domains, these are listed according to their level between the user-id and
the top-level-domain. A name for a user-id or a sub-domain must be unique in
the higher domain.
Example: firstname.lastname@example.org (our school’s E-mail address)
johannes - identification of the user
de - top-level-domain (de for
rhein-zeitung - first sub-domain
abo - second sub-domain
Addresses of homepages don’t contain
a user-id, since there may be many users responsible for this web-site. The
top-level-domain is an abbreviation, which is a sign for a certain kind of
address (three letters):
com - commercial organisation mil - military organisation
org - homepage of an organisation net -
edu - educational institution gov -
The top-level-domain can also provide
geographical information (two letters):
us - USA de - Germany
- United Kingdom fr - France
jp - Japan ca - Canada
The Internet – Indigestible data flow or all-embracing
To much data but no information
With regard to the incomparable
growth of the Internet, it is becoming more and more difficult to find an
answer to a certain question. While searching for relevant information, one
experiences the difference between the present situation of the Internet and
the aimed information-highway. Even recognised search strategies often fail to
find an accurate answer in an acceptable time. Every search strategy employs at
first a search service . These free services are programs, which look for the
sought word in their databases, which contain millions of homepages and are
permanently updated. There are two types of search services: search-engines and
directories. While employees add homepages to the directory list (like YAHOO – Yet Another
Hierarchical Officious Oracle) and sort these according to their content, computer
programs (called “Spiders”) maintain the database of a search engine (like Alta
Vista). Most people try to get their information by entering just one
expression into such a service. But as an inaccurate question leads to an
inaccurate answer, they will get thousands or even millions of relevant sites.
After having checked the first pages unsuccessfully, they will soon give up. A saying underlines
this frequent experience: “The Internet provides data, but no information.”
Is there any way to get the searched
Simple search strategy, based on search services
The introducing example has shown the
main problem with finding information in the Internet: the formulation of the
question. To find suitable sites, the search service must be fed with striking,
unambiguous keywords. The intelligence of a search-engine or a directory isn’t
able to understand the exact meaning of an imprecise keyword. For these it will
return thousands of useless pages and only some relevant ones, which won’t be
found in the data waste. At first, the searched Information must be
characterised with a couple of associated expressions. The chosen keywords
should be as clear as possible to avoid a differing connotation.
Most services offer operators to link
several keywords to describe the searched information more precisely:
· AND ,+
The resulting sites must
contain all keywords linked with “AND”.
The resulting sites must
contain at least one of the keywords linked with “OR”.
The resulting sites must
contain the keywords linked with “NEAR”. A maximum of eight words may separate
the two keywords.
· NOT , -
Irrelevant subjects can be
excluded by negation (“NOT”).
Apart from these operators, wildcards
(such as “?” or “*”) are allowed. To find only the accurate matching word and
no composition (e.g. only “net” and not “Internet”), keywords can be entered in
quotation marks. Another help might be to search only in structured categories,
e.g. a question on the Internet in “Computer and Internet: Internet” at Yahoo.
Case study: The status of the Queen in the Canadian constitution
To test this search strategy, I
wanted to find out, which is the status of the Queen in the Canadian
At first, I began by searching for
‘Queen’ at Yahoo. I found 9 sites, but none of them provided information on the
question. So I searched more precisely for ‘Queen +constitution’. I found 19936
sites. Since it would take many hours to check this enormous number of pages, I
decided to specify the keywords to ‘Queen +Canadian +constitution’, what led to
an amount of 5459 hits. Having added ‘ +status’ to the search string, Yahoo
found 2787 matching sites. So I had to choose another way to find information
about this topic. Another possibility would be to find the relevant passage in
the Canadian constitution. So, I searched for ‘”Canadian constitution”’
(written in quotation marks) and found two sites. I visited the first one,
which was an index of constitutions from all around the world, and found a link
to the Canadian one. In the
table of contents, I searched for the keyword ‘Queen’ and found section 9 of
the third chapter of the “Canadian Constitution Act” from 1867, saying: “The
Executive Government and Authority of and over Canada is hereby declared to
continue and be vested in the Queen.”
The search took me 12 minutes.
4.4 Case study: Output of
renewable energy sources in the USA
Since one test isn’t enough, I have
run another experiment. I wanted to know, what
was the total production of renewable energy sources in the USA in 1997.
I started by searching for
‘“electricity production”’ at Yahoo. I got three sites, which didn’t supply any
relevant data. I thought, there must be any statistics about this topic. So I
searched for ‘electricity +statistic’
and found a category called “Government: U.S. Government: Statistics”
containing all statistics drawn up for or by the U.S. Government or other
administrations. Then, I visited the
“Center for Environmental Statistics”, where I found no survey about this
topic. But the 'Energy Information Administration' had a quite
interesting statistic about the “U.S. Energy Flow”, saying that the
renewable sources produced in 1996 were 7.06 Quadrillion Btu (U.S. usage of
quadrillion, that means 7,060,000,000,000,000; Btu is an abbreviation for
“British thermal unit”). The figure of 1996 was the latest one.
This search took me 27 minutes.
The case studies (4.3 and 4.4) have
shown, that even complicated questions can be answered with the help of the
Internet. This is interesting with regard to the requirements one need to
access the “World Wide Web” (the
totality of all sites, also WWW). A computer, a modem, telephone connection, an
Internet provider are just enough to get information at any time. A lot of
answers and solutions are waiting to be found in the Internet. The only problem
is to locate them. Searching with an imprecise, ambiguous keyword in the WWW is
like searching for a certain plant and only knowing, that it grows in the
jungle. In order to find relevant information one must know exactly what to
look for. Besides that a certain search strategy should be used to find the
To sum up in a few words, the
Internet provides information on nearly every topic with the help of an
efficient search strategy. Without a search strategy, one can spend his whole
life searching for an answer but not finding it.
The presented search strategy might
not work for every question, but for many.
To improve it, I added an advanced search strategy (4.6). The two case studies can illustrate the
problem, but they are not representative for all searches. I think the Internet
also leaves a lot of questions open. In my opinion, the Internet is more a
fast, large library than an all-embracing one or an indigestible data flow.
Advanced search strategy, without search services
Nearly all search-engines look for
relevant information on the homepages of the Internet. But there are also other
sources full of answers:
Newsgroups are forums, where
people e-mail their opinions on the newsgroup topic. Their mails can be read by
every participant of that forum. There are more than ten thousand newsgroups
discussing many different topics (“Liszt – the mailing list directory” helps to
find a newsgroup on a certain topic). There are already services, which search
through all newsgroups for a keyword (like “Dejanews”).
FAQ-archives contain answers
to Frequently Asked Questions (FAQ) on a certain topic. One can search for a special archive at the
Usenet Hypertext FAQ Archive Search.
Files, comprising information,
can be downloaded from FTP-servers. To locate relevant files, Archie-systems
(search-service for ftp-directories) can be used (like ArchiePlexForm).
The Internet user has also
access to huge databases of nearly every topic. To find these databases, search
services should be used.
If a sufficient answer was still not
found, the question should be asked in a related newsgroup. The last possible
step is to create an own homepage, stating the question clearly and to hope for
outlook into the future
The future development of the
Internet must be seen under three aspects. On the one hand, the WWW grows
permanently, as more and more companies and people are going online. But as the
Internet contains more information, searching for certain information becomes
more difficult. On the other hand, the technical infrastructure is improved for
the growth of the WWW (e.g. ATM – asynchronous transmission mode ). So,
information is transmitted faster.
I’m convinced, that the third aspect
is the most important one. Software companies and other programmers have
realised, that the search services must be improved to allow a fast access to
the Internet. There are already projects inventing search services with
“artificial intelligence”, which would simplify the dialogue between man and
computer. This dialogue is the greatest problem of recent search services.