Black Hat SEO Crash Course V1.0
Introduction
If you have spent any significant amount of time online, you have likely come across the term Black Hat at one time or another. This term is usually associated with many negative comments. This book is here to address those comments and provide some insight into the real life of a Black Hat SEO professional. To give you some background, my name is Brian. I've been involved in internet marketing for  close to 10 years now, the last 7 of which have been dedicated to Black Hat SEO. As we will discuss shortly, you can't be a great Black Hat without first becoming a great White Hat marketer.  With the formalities out of the way, lets get into the meat of things, shall we?
What is Black Hat SEO?
The million dollar question that everyone has an opinion on. What exactly is Black Hat SEO? The answer here depends largely on who you ask. Ask most White Hats and they immediately quote the Google Webmaster Guidelines like a bunch of lemmings. Have you ever really stopped to think about it though? Google publishes those guidelines because they know as well as you and I that they have no way of detecting or preventing what they preach so loudly. They rely on droves of webmasters to blindl
y repeat everything they say because they are an internet powerhouse and they have everyone brainwashed into believing anything they tell them. This is actually a good thing though. It means that the vast majority of internet marketers and SEO professionals are completely blind to the vast array of tools at their disposal that not only increase traffic to their sites, but also make us all millions in revenue every year.
The second argument you are likely to hear is the age old ,“the search engines will ban your sites if you use Black Hat techniques”. Sure, this is true if you have no understanding of the basic principals or practices. If you jump in with no knowledge you are going to fail. I'll give you the secret though. Ready? Don't use black hat techniques on your White Hat domains. Not directly at least. You aren't going to build doorway or cloaked pages on your money site, that would be idiotic. Instead you buy several throw away domains, build your doorways on those and cloak/redirect the traffic to your money sites. You lose a doorway domain, who cares? Build 10 to replace it. It isn't rocket science, just common sense. A search engine can't possibly penalize you for outside influences that are beyond your control. They can't penalize you for incoming links, nor can they penalize you for sending traffic to your domain from other doorway pages outside of that domain. If they could, I would simply point doorway pages and spam links at my competitors to knock them out of the SERPS. Common sense.
So again, what is Black Hat SEO? In my opinion, Black Hat SEO and White Hat SEO are almost no different. White hat web masters spend time carefully finding link partners to increase rankings for their keywords, Black Hats do the same thing, but we write automated scripts to do it while we sleep. White hat SEO's spend months perfecting the on page SEO of their sites for maximum rankings, black hat SEO's use content generators to spit out thousands of generated pages to see which version works best. Are you starting to see a pattern here? You should, Black Hat SEO and White Hat SEO are one in the same with one key difference. Black Hats are lazy. We like things automated. Have you ever heard the phrase "Work smarter not harder?" We live by those words. Why spend weeks or months building pages only to have Google slap them down with some obscure penalty. If you have
spent any time on web master forums you have heard that story time and time again. A web master plays by the rules, does nothing outwardly wrong or evil, yet their site is completely gone from the SERPS (Search Engine Results Pages) one morning for no apparent  reason. It's frustrating, we've all been there. Months of work gone and nothing to show for it. I got tired of it as I am sure you are. That's when it came to me. Who elected the search engines  the "internet police"? I certainly didn't, so why play by their rules? In the following pages I'm going to show you why the search engines rule
s make no sense, and further I'm going to discuss how you can use that information to your advantage.
Search Engine 101
As we discussed earlier, every good Black Hat must be a solid White Hat. So, lets start with the fundamentals. This section is going to get technical as we discuss how search engines work and delve into ways to exploit those inner workings. Lets get started, shall we?
Search engines match queries against an index that they create. The index consists of the words in each document, plus pointers to their locations within the documents. This is called an inverted file. A search engine or IR (Information Retrieval) system comprises four essential modules:
∗A document processor
∗A query processor
∗A search and matching function
∗A ranking capability
Document Processor The document processor prepares, processes, and inputs the documents, pages, or sites that users search against. The document processor performs some or all of the following steps:
∗Normalizes the document stream to a predefined format.
∗Breaks the document stream into desired retrievable units.
∗Isolates and meta tags sub document pieces.
∗Identifies potential indexable elements in documents.
∗Deletes stop words.
∗Stems terms.
∗Extracts index entries.
it课程资源∗Computes weights.
∗Creates and updates the main inverted file against which the search engine searches in order to ma
tch queries to documents.
While users focus on "search," the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.
Step 4: Identify elements to index. Identifying potential indexable elements in documents dramatically affects the nature and quality of the document representation that the engine will search against. In designing the system, we must define the word "term." Is it the alpha-numeric characters between blank spaces or punctuation? If so, what about non-compositional phrases (phrases in which the separate words do not convey the meaning of the phrase, like "skunk works" or "hot dog"), multi-word proper names, or inter-word symbols such as hyphens or apostrophes that can denote the difference between "small business men" versus small-business men." Each search engine depends on a set of rules that its document processor must execute to determine what action is to be taken by the "tokenizer," i.e. the software used to define a term suitable for indexing.
Step 5: Deleting stop words. This step helps save system resources by eliminating from further processing, as well as potential matching, those terms that have little value in finding useful docume
nts in response to a customer's query. This step used to matter much more than it does now when memory has become so much cheaper and systems so much faster, but since stop words may comprise up to 40 percent of text words in a document, it still has some significance. A stop word list typically consists of those word classes known to convey little substantive meaning, such as articles (a, the), conjunctions (and, but), interjections (oh, but), prepositions (in, over), pronouns (he, it), and forms of the "to be" verb (is, are). To delete stop words, an algorithm compares index term candidates in the documents against a stop word list and eliminates certain terms from inclusion in the index for searching.
Step 6: Term Stemming. Stemming removes word suffixes, perhaps recursively in layer after layer of processing. The process has two goals. In terms of efficiency, stemming reduces the number of unique words in the index, which in turn reduces the storage space required for the index and speeds up the search process. In terms of effectiveness, stemming improves recall by reducing all forms of the word to a base or stemmed form. For example, if a user asks for analyze, they may also want documents which contain analysis, analyzing, analyzer, analyzes, and analyzed. Therefore, the document processor stems document terms to analy- so that documents which include various forms of analy- will have equal likelihood of being retrieved; this would not occur if the engine only indexed
variant forms separately and required the user to enter all. Of course, stemming does have a downside. It may negatively affect precision in that all forms of a stem will match, when, in fact, a successful query for the user would have come from matching only the word form actually used in the query.
Systems may implement either a strong stemming algorithm or a weak stemming algorithm. A strong stemming algorithm will strip off both inflectional suffixes (-s, -es, -ed) and derivational suffixes (-able, -aciousness, -ability), while a weak stemming algorithm will strip off only the inflectional suffixes (-s, -es, -ed).
Step 7: Extract index entries. Having completed steps 1 through 6, the document processor extracts the remaining entries from the original document. For example, the following paragraph shows the full text sent to a search engine for processing:
Milosevic's comments, carried by the official news agency Tanjug, cast doubt over the
governments at the talks, which the international community has called to try to prevent an
all-out war in the Serbian province. "President Milosevic said it was well known that Serbia
and Yugoslavia were firmly committed to resolving problems in Kosovo, which is an
integral part of Serbia, peacefully in Serbia with the participation of the representatives of
all ethnic communities," Tanjug said. Milosevic was speaking during a meeting with British
Foreign Secretary Robin Cook, who delivered an ultimatum to attend negotiations in a
week's time on an autonomy proposal for Kosovo with ethnic Albanian leaders from the
province. Cook earlier told a conference that Milosevic had agreed to study the proposal. Steps 1 to 6 reduce this text for searching to the following:
Milosevic comm carri offic new agen Tanjug cast doubt govern talk interna commun call try
prevent all-out war Serb province President Milosevic said well known Serbia Yugoslavia
firm commit resolv problem Kosovo integr part Serbia peace Serbia particip representa
ethnic commun Tanjug said Milosevic speak meeti British Foreign Secretary Robin Cook
deliver ultimat attend negoti week time autonomy propos Kosovo ethnic Alban lead
province Cook earl told conference Milosevic agree study propos.
The output of step 7 is then inserted and stored in an inverted file that lists the index entries and an indication of their position and frequency of occurrence. The specific nature of the index entries, however, will vary based on the decision in Step 4 concerning what constitutes an "indexable term." More sophisticated document processors will have phrase recognizers, as well as Named Entity recognizers and Categorizers, to insure index entries such as Milosevic are tagged as a Person and entries such as Yugoslavia and Serbia as Countries.
Step 8: Term weight assignment. Weights are assigned to terms in the index file. The simplest of search engines just assign a binary weight: 1 for presence and 0 for absence. The more sophisticated the search engine, the more complex the weighting scheme. Measuring the frequency of occurrence of a term in the document creates more sophisticated weighting, with length-normalization of frequencies still more sophisticated. Extensive experience in information retrieval research over many years has clearly demonstrated that the optimal weighting comes from use of "tf/idf." This algorithm measures the frequency of occurrence of each term within a document. Then it compares that frequency against the frequency of occurrence in the entire database.
Not all terms are good "discriminators" — that is, all terms do not single out one document from another very well. A simple example would be the word "the." This word appears in too many documents to help distinguish one from another. A less obvious example would be the word "antibiotic." In a sports database when we compare each document to the database as a whole, the term "antibiotic" would probably be a good discriminator among documents, and therefore would be assigned a high weight. Conversely, in a database devoted to health or medicine, "antibiotic" would probably be a poor discriminator, since it occurs very often. The TF/IDF weighting scheme assigns higher weights to those terms that really distinguish one document from the others.
Query processing has seven possible steps, though a system can cut these steps short and proceed to match the query to the inverted file at any of a number of places during the processing. Document processing shares many steps with query processing. More steps and more documents make the process more expensive for processing in terms of computational resources and responsiveness. However, the longer the wait for results, the higher the quality of results. Thus, search system designers must choose what is most important to their users — time or quality. Publicly available search engines usually choose time over very high quality, having too many documents to search against. The steps in query processing are as follows (with the option to stop processing and start matching indicated as "Matcher"):
At this point, a search engine may take the list of query terms and search them against the inverted file. In fact, this is the point at which the majority of publicly available search engines perform the search. ∗Tokenize query terms.
Recognize query terms vs. special operators.
————————> Matcher
∗Delete stop words.
∗Stem words.
∗Create query representation.
————————> Matcher
∗Expand query terms.
∗Compute weights.
-- -- -- -- -- -- -- --> Matcher
Step 1: Tokenizing. As soon as a user inputs a query, the search engine  -- whether a keyword-based system or a full natural language processing (NLP) system  -- must tokenize the query stream, i.e., break it down into understandable segments. Usually a token is defined as an alpha-numeric string that occurs between white space and/or punctuation.
Step 2: Parsing. Since users may employ special operators in their query, including Boolean, adjacency, or proximity operators, the system needs to parse the query first into query terms and operators. These operators may occur in the form of reserved punctuation (e.g., quotation marks) or reserved terms in specialized format (e.g., AND, OR). In the case of an NLP system, the query processor will recognize the operators implicitly in the language used no matter how the operators might be expressed (e.g., prepositions, conjunctions, ordering). Steps 3 and 4: Stop list and stemming. Some search engines will go further and stop-list and stem the query, similar to the processes described above in the Document Processor section. The stop list might also contain words from commonly occurring querying phrases, such as, "I'd like information about." However, since most publicly available search engines encourage very short queries, as evidenced in the size of query window provided, the engines may drop these two steps.
Step 5: Creating the query. How each particular search engine creates a query representation depen
ds on how the system does its matching. If a statistically based matcher is used, then the query must match the statistical representations of the documents in the system. Good statistical queries should contain many synonyms and other terms in order to create a full representation. If a Boolean matcher is utilized, then the system must create logical sets of the terms connected by AND, OR, or NOT.
An NLP system will recognize single terms, phrases, and Named Entities. If it uses any Boolean logic, it will also recognize the logical operators from Step 2 and create a representation containing logical Query Processor

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。