DOCID: 4046925 A Guide To Internet Research he epiniens expressed in this article ere :hese ef the suthertsj and de net represent ettieiel epinien ef epretred fer Release by NBA en /Eb) (3) 86-36 4-1 9-201 3. FCJIH ESE T0381 DOCID: 4046925 Untangling the Web: An Introduction to Internet Research Content Last Updated: February 28, CoverDesignby -.. at 91* 86-_36 DOCID: 40-'-16925 I 9800 Savage Road Suite 6324 Fort Meade, IVID 20755-6324 86-36 DCJCID: 4046925 This Page Intentionally Left Blank DCJCID: 4046925 Table of Contents Preface: The Clew to the Labyrinth .. 1 "Every Angle of the Universe" .. 5 What Will I Learn? .. 6 Why Do I Need Help? .. 7 What's New This Year ., .. 8 Introduction to Searching .. 11 Search Fundamentals .. 11 The Past, Present, and Future of Search .. 12 Understanding Search Engines .. 18 Search Engine Basics .. 28 A Word About Browsers: Internet Explorer and Mozilla Firefox .. 22 The Great Internet Search--Offs .. 26 Types of Search Tools .. 28 Web Directories/Subject Guides/Portals .: .. 28 Metasearch Sites .. 30 Megasearch Sites .. 35 Types of Searches and the Best Ways to Handle Them .. 36 Search Savvy--Mastering the Art of Search .. 43 Google .. 47 Google Hacks .. 73 Yahoo Search .. 89 Yahoo Hacks .. 113 Windows Live Search .. 118 Gigablast .. 141 DOCID: 4046925. Exalead .. 146 Ask .. 161 More Help: Internet Guides and Tutorials .. 173 Specialized Search Tools 8. Techniques ..175 ''Google Hacking" .. 175 Custom Search Engines .. 186 Fagan Finder .. 193 Wikipedia .. 202 Maps and Mapping .. 215 Uncovering the "lnvisible" Internet ..239 A9 Search .. 239 Book Search .. 245 Answers.com .. 260 OAlster .. 264 The Internet Archive 8. the Wayback Machine ..267 Other Invisible Web Resources .. 273 Casting a Wider Net-- nternational Search, Language Tools __277 International Search .. 277 Online Dictionaries and .. 288 You Gotta Know When to Fold 'Em .. 304 Beyond Search Engines--Specialized Research Tools ..306 Email Lookups ..308 Telephone and FAX Directories .. 311 Online Videos and Video Search .. 317 Online Audio, Podcasts, and Audio Search .. 344 DOCID: 4046925 Special Topics--News, Blogs, Technology Search .. 349 Newsgroups, Forums, Mailing Lists .. 349 Weblogs RSS Feeds .. 356 General News Sources .. 361 News Sites Search Engines .. 362 Technology News Sources .. 377 Telecommunications on the Web .. 379 Research How-Tos .. 384 Finding People 384 Using the Internet to Research Companies .. 400 How to Research a Specific Country .. 411 Finding Political Sites on the Web .. 419 Research Round--up: The Best Research Tips Techniques .. 424 Researching Understanding the Internet .. 433 A Plain English Guide to lnternetworking .. 433 Researching Internet Statistics .. 441 Regional Registries and NICs .. 443 Domain Name Registries .. 449 Understanding Domain Name and Whois Lookup Tools .. 451 World Network Whois Databases: APNIC, ARIN, LACNIC, RIPE .. 455 Global Network Whois Search Tools .. 456 Domain Name Whois Lookups .. 458 Internet Toolkits .. 471 How to Research a Domain Name or IP Address .. 474 Traceroute .. 483 in DOCID: 4046925 Geolocating Internet Addresses ..49T Finding 8. Internet Access Points .. 503 Cybergeography, Topology, and Infrastructure 511 Internet Privacy and Security--Making Yourself Less Vulnerable in a Dangerous World ..514 Basics for Improving Your Internet Privacy and Security .. 518 Increase Your Knowledge .. 521 Browser Concerns .. 525 Email Concerns .. 543 Microsoft and Windows Concerns 560 Handle with Care: More Privacy and Security Concerns .. 578 General Security 8. Privacy Resources .. 605 Conclusion ..606 Web Sites by Type ..607 iv DOCID: 4046925 Preface: The Ciew to the Labyrinth One of the most famous stories about libraries tells of the tenth century Grand Vizier of Persia, Abdul Kassem Ismael who, "in order not to part with his collection of 1.17,000 volumes when traveling, had them carried by a caravan of 400 camels trained to walk in alphabetical order." However charming this tale may be, the actual event upon which it is based is subtly different. According to the original manuscript, now in the British Museum, the great scholar and literary patron Sahib lsma'il b. 'Abbad so loved his books that he excused himself from an invitation by King Nuh II to become his prime minister at least in part on the grounds that four hundred camels would be required for the transport of his library alone? A 21st Century version of the story might feature any number of portable electronic devices--a laptop, a PDA, or even a mobile phone--designed to overcome this difficulty. Today, 1000 years later, the Persian scholar/statesman would have to find a new excuse for declining the job offer. Abdul Kassem Ismael (aka Sahib lsma'il b. 'Abbad) would be hard pressed to explain why he couldn't just find what he needed on the Internet. The message seems to be that 'books are passe, replaced by ones and zeroes, the real world replaced by a virtual one, knowledge supplanted by information at best and chaotic data at worst. Have we shrunk the world or expanded it? Or have we in some way replaced it? Untangling the Web for 2007 is the twelfth edition of a book that started as a small handout. After more than a decade of researching, reading about, using, and trying to understand the Internet, I have come to accept that it is indeed a Sisyphean task. Sometimes I feel that all I can do is to push the rock up to the top of that virtual hill, then stand back and watch as it rolls down again. The nternet--in all its glory of information and misinformation--is for all practical purposes limitless,' which of course means we can neverknow it all, see it all, understand it all, or even imagine all it is and will be. The more we know about the Internet, the more acute is our 1 Alberto Manguel, A History of Reading, New York: Penguin, 1997, 19. Manguel cites as his source Edward G. Browne's A Literary History of Persia, 4 vols.. London-: T. Fisher Unwin, 1902-24. I found the specific reference to this story on pages 374-375 of Vol. 1, Book IV, "Decline of the Caliphate." There is, sadly, no mention of the alphabetical arrangement of the library. This entire masterpiece is available online at The Packard Humanities Institute, Persian Texts in Translation, 23 February 2006, (15 November 2006). 2 Edward G. Browne. Vol. 1, Book IV, "Decline of the Caliphate," A Literary History of Persia," 4 vols., London: T. Fisher Unwin, 1902-24, 374-375. Available online at The Packard Humanities Institute, Persian Texts in Translation, 23 February 2006, (15 November 2006). 1 DOCID: 4046925 awareness of what we do not know. The Internet emphasizes the depth of our ignorance because "our knowledge can only be finite, while our ignorance must necessarily be infinite."3 My hope is that Untangling the Web will add to our knowledge of the Internet and the world while recognizing that the rock will always roll back down the hill at the end of the day. I will end this beginning with another story and a word of warning. "Tlon, Uqbar, Orbis Tertius" describes the discovery of an of an unknown planet. This unreal world is the creation of a secret society of scientists, and gradually, the imaginary world of Tlon replaces and obliterates the real world. Substitute "the Internet" for Tlon and listen. Does this sound familiar? "Almost immediately, reality yielded on more than one account. The truth is that it longed to contact and the habit of Tlon have disintegrated this world. Enchanted by its rigor, humanity forgets over and again that it is a rigor of chess masters, not of scattered dynasty of solitary men has changed the face of the world. Their task continues. If our forecasts are not in error, a hundred [or a thousand] years from now someone will discover the hundred volumes of the Second of Tlon. Then English and French and mere Spanish will disappear from the globe. The world will be As we enjoy, employ, and embrace the Internet, it is vital we not succumb to the chauvinism of novelty, that is, the belief that somehow whatever is new is inherently good, is better than what came before, and is the best way to go or best tool to use. I am reminded of Freud's comment about the "added factor of disappointment" that has occurred despite mankind's extraordinary scientific and technical advances. Mankind, claims Freud, seems "to have observed that this newly--won power over space and time, this subjugation of the forces of nature, which is the fulfillment of a longing that goes back thousands of years, has not increased the amount of pleasurable satisfaction which they may expect from life and has not made them feel happier." Indeed, most of the satisfactions derived from technology are analogous to the "cheap by putting a bare leg from under the bedclothes on a cold winter night and drawing it in again."6 What good is all this technology and information if, instead of improving our lot, it only adds to our confusion and suffering? We are continually tempted to treat all technology as an end in itself instead of a means to some end. The Internet is no exception: it has in large 3 Karl Popper, Conjectures and Refutation: The Growth of Scientific Knowledge, London New York: Routledge, 2002, p. 38. 4 Jorge Luis Borges, "Tlon, Uqbar, Orbis Tertius," in Labyrinths, ed. Donald A. Yates and James E. lrby, New York: New Directions Books, 1962, 17-18. 5 Sigmund Freud, "Civilization and Its Discontents," tr. James Strachey, New York: Norton, 34- 35. 6 Freud, 35. DCJCID: 4046925 measure become the thing itself instead of a means of discovery, understanding, and knowledge. Like Tlon, the Internet, "is surely a labyrinth, but it is a labyrinth devised by men, a labyrinth destined to be deciphered by men." We must avoid getting lost in the labyrinth without a clew. My hope is that Untangling the Web will be something akin to Ariadne's clew,7 so that as you unravel it, you can wind your way through the web while avoiding some of its dangers. Remember also that those who use the Internet to do harm, to spread fear, and to carry out crimes are like the mythical Minotaur who, as well as being the monster in the Minoan maze, was also its prisoner. 7Daedalus, the architect of the infamous labyrinth on Crete, purportedly gave King Minos' daughter Ariadne the clew, a ball of thread or yarn, to use to find a way out of the maze. Ariadne in turn gave the clew to Theseus, who slew the Minotaur and found his way out of the labyrinth. Theseus repaid Ariadne's kindness by leaving her on an island on their way back to Athens. 8 "Minotaurus," WikiMedia Commons, (6 February 2007). This image is in the public domain because its copyright has expired. 3 DCJCID: 40-'-16925 Notes 4 DIQCIDI 4046925 "Every Angle of the Universe" One wag has suggested that the Internet is an "electronic Boswell," the chronicler of our age. It is that and more because the Internet chronicles not only a time and place but all times and all places, known and unknown, real and imagi'nary. The Internet is the closest thing to the fantastical "AIeph" imagined by the great Argentine story-'teller Jorge Luis Borges, an object whose diameter is ''little more than an inch" but which nonetheless contains all space, "actual and undiminished," and in which one can see "every angle of the universe." While the comparison with the mythical Aleph may strike you as a bit whimsical, it is in fact not an altogether unfair metaphor. There has never been anything that approaches the Internet's reach (to almost every part of the globe in less than thirty years), its size (estimated at 532,897 terabytes way back in 20039), and its ability to link us together in a new kind of world community (words, pictures, sounds, ideas beyond imagining). But, as with all new technologies, it comes at a cost--many costs, in fact. We pay for the benefits of the Internet less in terms of money and more in terms of the currencies of our age: time, energy, and privacy. The goal of this book is to help you save some of each of these valuable resources: time, by making your searches more efficient; energy, by reducing the frustration using the Internet often entails; and privacy, by pointing out some simple measures to take to lower your cyber--profile and enhance your security. I cannot emphasize strongly enough that this book was already out of date by the time it was published. Even though I have checked and rechecked every link in this book, some addresses are bound to have changed, some sites will have shut down, and some tips and techniques--such as search engine rules and syntax--wiII no longer be accurate. This is a testament to the changeable nature of the Internet and I must beg your forbearance for any such errors. Writing about the Internet is much like trying to catch Proteus1?--as with the mythical prophet, it keeps changing and escaping our grasp. 9 School of Information Management and Systems, University of California at Berkeley, "How Much Information? 2003," 27 October 2003, (October 2005), Executive Summary. 1? "Proteus--i.e. full of shifts, aliases, disguises, etc. Proteus was Neptune's herdsmancatching him but by stealing upon him during sleep and binding him; if not so captured, he would elude anyone who came to consult him by changing his shape, for he had the power of changing it in an instant into any form he chose." "Proteus," Brewer's Dictionary of Phrase and Fable, 1898, (14 November 2006). 5 DCJCID: 4046925 "The Internet has often been called the world's largest library with all of the books on the floor.'' Curtin, M., Ellison, G., Monroe, D., "What's Related? Everything But Your Privacy," 7 October 1998, Revision: 1.5, (14 November 2006). What Will I Learn? To achieve these goals, this book will: help you understand how to use the Internet more efficiently to find useful information and, in so make clear why the Internet is an invaluable resource. This year I have reorganized the book to make it more logical and easier to use. The first part of the book still focuses on the ins and outs of searching: how search engines work, types of search tools, how to handle different types of searches. The next section has expanded to offer in-depth tutorials on six major search engines. Next, the book covers specialized search tools and techniques, including a new section devoted to Wikipedia. I have also moved the discussion of maps and mapping to this section. This is followed by "invisible" web research to include the changes to A9 and Amazon's search inside the book option. Next is the international search and language tools section, followed by specialized research tools, including new sections on video, audio, and podcast searches. The next section covers specific topical research, such as news, telecommunications, blogs, and RSS feeds. This is followed by a series of "how to" guides, culminating with tips and techniques for more effective searching. The book then delves into using the Internet to research the Internet, with the final section still addressing crucial privacy and security issues. DCJCID: 4046925 Why Do I Need Help? There are no Internet research experts. There are people who make a living using the Internet for research and who know more than others about what is on the Internet, how to find what they want on the Internet, and how to do this with relative efficiency. But no one knows what is truly "out there" for two fundamental reasons: The Internet changes constantly. By that I mean daily, hourly, minute~to-- minute, incessantly. It's too darned big! If we can't accurately size the Internet (which we can't), you can be sure we don't know what is available via this resource with any degree of accuracy or completeness. This doesn't mean you can't ever hope to find anything on the Internet. You often can find what you're looking for (and usually a lot more) with comparative ease, but no one should be deluded into believing he has a good grasp of the entire world of information available on the Internet. Realistically, the best search engines index only a fraction of all webpages and keyword searching is at best an art that routinely misses relevant sites while loading you down with dross. Are you discouraged'? Don't often have more luck finding something arcane than seasoned researchers because of the power of creative thinking and serendipity. We learned never to underrate luck and intuition when doing Internet research, but I think the two most important tools for successful Internet research are: 1. a good set of bookmarks" 2. other people with experience searching the Web Never assume others are already aware of some website, tool, or technique you find particularly useful. The sheer quantity of data, information, and knowledge associated with the Internet is so enormous that no one can know more than a fraction of what's on it. While we're talking size, let me mention an important distinction. The Internet and the web are not one and the same, though the web is what most people think of when you say "lnternet." 7 DOCID: 4046925 In fact, as huge as it is, the Worldwide Web is actually a subset of the Internet. The Internet is the network of networks, all the world's servers connected by routers, to put it in semi-technical terms. The web is that portion of the Internet that uses a browser (typically Netscape or Firefox--browsers built upon Mozil a--or Microsoft's Internet Explorer) and some type of hypertext language (usually HTML) to move around. This book focuses primarily on the web because tackling the web by itself is a big enough challenge. As you have no doubt guessed by now, the Worldwide Web does not come with an instruction manual or users guide, which means much if not most of what you learn about researching using the Internet will come from hard--won experience. On the up side, you probably will not be able to break anything on the Internet. More than likely, no matter how lost or hopelessly confused you become, you will only damage your own computer and/or perhaps your good humor and sanity. However, because of the almost astronomical growth of malicious activity, the Internet has become a dangerous place, and users have discovered that they have inadvertently spread malicious software (malware) such as viruses, worms, and Trojan horses. That is why I have devoted the last section of the book to personal computer security and privacy. We are all at risk from the rising tide of bad and in some cases criminal behavior, so we must take responsibility for protecting ourselves and our computers from the ruses and attacks that grow in number and sophistication each year. This book will expand on simple "rules" of Internet research, rules that are really more in the nature of friendly suggestions. These rules are the fruit of my own experiences as an Internet user and may prevent you from repeating all the mistakes I made that gave rise to the rules in the first place. Some of these suggestions may at first strike you as odd or inconsistent, but the rationale for each I hope will become clear as we go along." The fact is that today we are drowning in information and starving for knowledge. The goal of Untangling the Web is to help rescue users from the ocean of information (and misinformation) by throwing them a virtual lifeline. What's New This Year Most people probably have not thought about or been very much affected by the changing search landscape because, as is only natural, most people have one or two sites they routinely use for search and research, regardless of the nature of the inquiry. However, virtually all search professionals will agree that knowing where to look for information is the key to successful searching. Yet few venture beyond If you are using the hypertext version of this book on line, the links in the paper may not load correctly. Try the refresh button, copy and paste the url, or type in the url directly. DOCID: 4046925 the comfortable confines of the familiar search engine. While the major search engines continue to improve each year, they are far from the be all and end all of search. The problem with general search tools is that they cannot provide targeted or tailored results, certainly not without a lot of work on the part of the user. For this reason, a large part of Untangling the Web is devoted to other ways to uncover information, be it subject guides, "deep web" resources, targeted search tools, or unusual tips and techniques for revealing what is hidden. Again this year, I have included detailed information on how to use Google, Yahoo, Gigablast, and Live Search (formerly MSN Search) to find very specific data. I have also updated and expanded the section on Exalead and added to the major search engines. However, unless you spend a fair amount of time using each of these search tools, you will probably find their many options too complicated and cumbersome for everyday use. A different approach is to use specialized search tools, which begs the question of how to find these tools. Untangling the Web maps a number of the Internet's less-traveled roads, excellent but unheralded specialty search tools such as Fagan Finder, Amazon's A9 multipurpose search, and ThomasGloba 's business search. Also, the section on international search is substantially larger than before. In recognition of the growing importance and influence of collaborative websites, there are several new sections in this year's book. One is a separate section devoted to Wikipedia, contributed in part by my colleague Diane White. Video and audio search exploded during 2006, and this year's edition contains a new and extensive examination of video search sites as well as a new section on audio search and godcasting. Two other new sections are devoted to custom search engines and book search, neither of which is an entirely new technology but both of which spread in popularity and improved in quality in the past year. Custom search is fast becoming a replacement for web directories, which continue their slide into irrelevance. The section on researching and understanding the Internet now begins with a new section on "internetworl