LIS4361 Week 1 Notes

Internet Background

Internet Myths

1. Everything in on the Internet
2. The Internet contains nothing but trash.
3. Anything on the Web is findable
4. Search engines search the entire Web.
5. The Internet is the Web.

Why Some Things are not Findable

1. Web page just posted and spiders haven’t indexed it yet.
2. URL expired.
3. Web page inadequately indexed by search engines.
4. Bot blocker in effect.
5. Site not completely indexed.
6. Problems with alpha or numeric address.
7. Site is a dynamic site: permanent Web page does not exist, but is generated “on the fly.”
8. The URL is case sensitive.

History of the Internet and the WWW

Hobbe's Internet Timeline Study the history of the Internet.
A Little History of the World Wide Web
See also book in Penrose (and Prospector): History of the Internet : a chronology, 1843 to the present
Ted Nelson and Xanadu: http://jefferson.village.virginia.edu/elab/hfl0155.html
Project Xanadu: http://xanadu.com/

Internet Technologies

Internet addresses consist of many parts. Most important is the top level domain. You are all familiar with "dot coms" - .com. But there are many more top level Internet domains such as .edu (du.edu), gov for US government sites, org for organizations, etc. There are hundredes of top level domains, each administered by an authoritative agency.

TCP/IP: http://searchnetworking.techtarget.com/sDefinition/0,,sid7_gci214173,00.html
Background: http://www.yale.edu/pclt/COMM/TCPIP.HTM
W3C: http://www.w3.org/

Address Assignment

Alpha

Numeric

http://ccbs.ntu.edu.tw/

Now flips to: http://buddhism.lib.ntu.edu.tw/

http://140.112.2.89/ (2006)
http://140.112.113.19/ (now)

http://pears2.lib.ohio-state.edu/

http://128.146.9.101 (2005)
http://128.146.173.254
(2008)
http://140.254.87.102 (2010-2012)

What is the relationship of the following to each other?

http://www.insidedenver.com/  |  http://www.rockymountainnews.com/

http://130.253.4.23 | http://bianca.penlib.du.edu/ | http://catalog.du.edu/ | http://peak.du.edu/

Note: the DU catalog was formerly at: http://130.253.32.83 - But this no longer exists

 

Anatomy of a URL

            secondary               file file
             domain   directory     name type
                |          |   sub   |    |
  transfer      |     top  |   dir-  |    |
  protocol      |    level | ectory  |    |
     |          |    domain|    |    |    |
     |          |      |   |    |    |    |
    http://www.census.gov/acsd/www/sub_p.htm 

When does case matter in URLs?

Domain Lookup Tools:

Webmaster Toolkit: http://www.webmaster-toolkit.com/

Nslookup: Translate from alpha address to numeric IP, and vice versa.

http://www.webreference.com/cgi-bin/nslookup.cgi
http://www.kloth.net/services/nslookup.php
Can you find other nslookup sites? [see also DNSstuff below]

WhoIs: Find out who a domain is registered to and contact info.

http://www.networksolutions.com/cgi-bin/whois/whois (note that this URL flips to a different URL)
http://www.internic.net/whois.html
http://whois.educause.net/
Network Solutions

Top-level Domains (TLDs): Find out Internet domains for every country of the world.

http://en.wikipedia.org/wiki/Top_level_domains AND http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

http://www.norid.no/domenenavnbaser/domreg.en.html

http://www.uninett.no/navn/domreg.html [What is going on with this URL? How can you prove it?]

http://www.iana.org/domains/root/db/

Traceroute: Find the Internet route from one IP address to other addresses throughout the world. See how many "hops" are involved in connections.

http://www.traceroute.org/

http://visualroute.visualware.com/ Try this on your own: requires free registration.

DNSstuff: http://www.dnsstuff.com/ Provides a search form for WhoIs, Traceroute, and IP address lookup

Speed Test http://www.speedtest.net/

 

If you were looking for information from Norway on fishing, you could search Google this way: site:no fishing. If wanted a Chinese perspective on the summer Olympics, you could search like this: olympics site:cn

You can also search using secondary domain information. US government sites are .gov, but state department sites are state.gov. Thus, to find out about treaties from the State Department, search: site:state.gov treaties.