Thursday, October 24, 2013

Matt Cutts At Pubcon 2013: Moonshots, Machine Learning & The Future Of Google Search

This morning, the head of Google’s webspam team Matt Cutts gave a keynote speech at Pubcon in Las Vegas. The keynote comes on the heels of a scathing day — 1 keynote from Jason Calacanis, who said that Google rules everything, that they were essentially evil. On Twitter yesterday, Matt asked if Jason wanted the polite response, or a thorough one.  All of us here in attendance are hoping for “thorough.”


Matt starts with the state of the index talking about where will Google go in future.

He’s proud that Google as doubled down on ‘moonshot’ changes, specifically:

  • Knowledge Graph Google has been trying to understand entities — not just the searches. So essentially they are trying to learn about “things not strings.”
  • Voice Search
  • Conversational Search
  • Google Now Matt is proud that today, sometimes you don’t even have to search to find information you need.
  • Deep Learning Google is looking more into the relationships between words. Google will be able to read at a higher level and interpret the relationships between words. Works well with voice search when a user asks Google, “Who is the Prime Minister of Turkey?” then searches again for “How old is he?” and Google can reply with the previous context.

Core Quality Changes

  • Hummingbird This change targets better natural language translation. Search is more than just matching words — instead it’s looking at specific words that are more meaningful for intelligent scoring. For instance, a voice search for “what is the capital of Texas, my Dear” the “my Dear” isn’t that important — Hummingbird will be able to detect this. While Hummingbird affected 90% of queries, it was a very subtle change that most users didn’t recognize but will help users get more pertinent results.
  • Panda Softening This is something that Google has looked into to help bring some sites and content back.
  • Detecting/Boosting Authorities Not done by hand, but applies by topic areas. Webmasters can keep deepening their content on a topic to further their authoritativeness on a specific content area.
  • Smartphone Ranking Doesn’t have flash? won’t display a site that has flash to you then.

Webspam Changes

  • Penguin 2.0 & 2.1 Penguin 2.0 was released – not that intensive. Black hats said wasn’t big, so Google then released turned it up in 2.1. More changes will be continually coming, so buckle up.
  • Spammy Query Algorithms Items like porn and payday loans will be targeted for better results. Right now the SERPS aren’t great, but they will be working on it.
  • Advertorials/Native Advertising Google has cracked down on publishers selling ads that blended in as editorial with dofollow links.. You shouldn’t be paying for links that pass pagerank.
  • Spam Networks They’ve got a pretty good list, just working their way down them. Matt joked that he should talk a poll to determine who to axe next.

Communication

Google has done a great job of increasing the communication with webmasters, especially:
  • New videos for malware/hacking
  • Concrete examples in guidelines
  • >100 speaking events, Hangouts on Air, webmaster office hours
  • How search works website

Future of Search

  • Machine Learning Google’s goal is to provide the world information.  The word “search engine” isn’t anywhere in their mission statement. They want to be able to give answers to specific queries.
  • Mobile Mobile is coming faster than anyone expected. 40% of YouTube videos are now served to mobile devices. If you haven’t thought about mobile, it’s time to start thinking about it.
  • Social/Identity/Authorship Matt starts with “Facebook did a great job of social and knowing who people are.” Then talks about the fact that signal is not just likes/+1s/Tweets but in the long terms; social signals are a sign of authority. You are someone worth listening to — search engines will think you are worth listening to as well.

Webspam Trends

  • Hacking Next 6 months – it’s going to look like we aren’t working on much. Now working on next generation of hacking. Queries like “buy viagra” still looks bad because people are breaking the laws.
  • Hot Topics Items like child porn, international issues and really nasty queries are being addressed.
  • No Toolbar PageRank scheduled for rest of year The pipeline for updating PageRank broke this year and PageRank stopped updating. Google realized that it wasn’t that bad and stopped updating as people seem to pay too much attention to the metric. It’s something they will reassess at a later time.

Advice

  • Mobile Get ready, you need a mobile plan.
  • Request Autocomplete New item in Chrome that allows users to auto-fill forms. Saves users time by using the standard to pull in all information and increase chance of conversions.
  • Ad-heavy pages above the fold Some tweaks are coming to “turn up” this algorithm. Users shouldn’t see a barrage of ads above the fold when they visit a site.
  • Tightening Authorship Matt mentions that a tightening of Authorship may provide better results. Google is looking for a 15% reduction to ensure that the quality of the authorship is still high and relevant.
  • Rich Snippets The ability to have and use rich snippets may be taken away for low quality sites in the coming months
  • Smarter on JavaScript Google is now fetching, rendering and indexing items called by JavaScript. Google is getting smarter and understanding smarter libraries.
Now to the Q and A section:

Matt talks about +1′s specifically and that they are a short term signal, but very bullish on long term signal of authorship.  Next Matt talks about Negative SEO. Worked on Negative SEO for years. With Penguin, it not only removes sites, but can actually have a negative effect on the site. Disavow tool announced last year, use as a last resort. Use Webmaster Tools, find links and disavow at link or domain level. Webmaster tools is now giving better backlinks, not just A-Z, so use Webmaster Tools to help identify, can see 100,000 links.

In response to Jason Calacanis’ claims from yesterday, Matt polls the crowd on whether or not to go into the matter. Crowd wants to hear the response. Matt talks about the initial version of Panda and whether or not they should have rolled out slowly. Matt says that this wouldn’t have been good and cites multiple articles showing the degrading quality of the search results. Google needed Panda. A Googler made a personal blocklist to block specific sites and nearly 200k users installed — people did not want these content farms.

In response to Jason’s claims that Google wasn’t a good partner, Matt talked about the fact that no companies have partnerships with Google. There are YouTube partnerships, not Google search partnerships. In aggregate, Mahalo simply wasn’t a quality site and they came to an impasse at a personal meeting. This wasn’t even a webspam issue, it was a quality issue and nobody received special treatment.

With the Mahalo issue behind, Matt talks about press releases. “If you are paying for PageRank, you probably aren’t doing something right.” Google has identified “a lot” of the top Press Release sites and ignores the links but doesn’t penalize those who are using them.

On infinite scrolling issues, Matt recommends using some type of paginated versions as a safety guard to index all content. On the growing size of the Google bar, Matt mentions that they are aware of the size and pixels being taken up by Google.

That’s a wrap folks.

For More Info about Matt Cutts At Pubcon


Tuesday, October 22, 2013

Character Definitions for htaccess

#
the # instructs the server to ignore the line. used for including comments. each line of comments requires it’s own #. when including comments, it is good practice to use only letters, numbers, dashes, and underscores. this practice will help eliminate/avoid potential server parsing errors.

[F]
Forbidden: instructs the server to return a 403 Forbidden to the client.

[L]
Last rule: instructs the server to stop rewriting after the preceding directive is processed.

[N]
Next: instructs Apache to rerun the rewrite rule until all rewriting directives have been achieved.

[G]
Gone: instructs the server to deliver Gone (no longer exists) status message.

[P]
Proxy: instructs server to handle requests by mod_proxy

[C]
Chain: instructs server to chain the current rule with the previous rule.

[R]
Redirect: instructs Apache to issue a redirect, causing the browser to request the rewritten/modified URL.

[NC]
No Case: defines any associated argument as case-insensitive. i.e., "NC" = "No Case".

[PT]
Pass Through: instructs mod_rewrite to pass the rewritten URL back to Apache for further processing.

[OR]
Or: specifies a logical "or" that ties two expressions together such that either one proving true will cause the associated rule to be applied.

[NE]
No Escape: instructs the server to parse output without escaping characters.

[NS]
No Subrequest: instructs the server to skip the directive if internal sub-request.

[QSA]
Append Query String: directs server to add the query string to the end of the expression (URL).

[S=x]
Skip: instructs the server to skip the next "x" number of rules if a match is detected.

[E=variable:value]
Environmental Variable: instructs the server to set the environmental variable "variable" to "value".

[T=MIME-type]
Mime Type: declares the mime type of the target resource.

[]
specifies a character class, in which any character within the brackets will be a match. e.g., [xyz] will match either an x, y, or z.

[]+
character class in which any combination of items within the brackets will be a match. e.g., [xyz]+ will match any number of x’s, y’s, z’s, or any combination of these characters.

[^]
specifies not within a character class. e.g., [^xyz] will match any character that is neither x, y, nor z.

[a-z]
a dash (-) between two characters within a character class ([]) denotes the range of characters between them. e.g., [a-zA-Z] matches all lowercase and uppercase letters from a to z.

a{n}
specifies an exact number, n, of the preceding character. e.g., x{3} matches exactly three x’s.

a{n,}
specifies n or more of the preceding character. e.g., x{3,} matches three or more x’s.

a{n,m}
specifies a range of numbers, between n and m, of the preceding character. e.g., x{3,7} matches three, four, five, six, or seven x’s.

()
used to group characters together, thereby considering them as a single unit. e.g., (perishable)?press will match press, with or without the perishable prefix.

^
denotes the beginning of a regex (regex = regular expression) test string. i.e., begin argument with the proceeding character.

$
denotes the end of a regex (regex = regular expression) test string. i.e., end argument with the previous character.

?
declares as optional the preceding character. e.g., monzas? will match monza or monzas, while mon(za)? will match either mon or monza. i.e., x? matches zero or one of x.

!
declares negation. e.g., “!string” matches everything except “string”.

.
a dot (or period) indicates any single arbitrary character.

-
instructs “not to” rewrite the URL, as in “...domain.com.* - [F]”.

+
matches one or more of the preceding character. e.g., G+ matches one or more G’s, while "+" will match one or more characters of any kind.

*
matches zero or more of the preceding character. e.g., use “.*” as a wildcard.

|
declares a logical “or” operator. for example, (x|y) matches x or y.

\
escapes special characters ( ^ $ ! . * | ). e.g., use “\.” to indicate/escape a literal dot.

\.
indicates a literal dot (escaped).

/*
zero or more slashes.

.*
zero or more arbitrary characters.

^$
defines an empty string.

^.*$
defines one character that is neither a slash nor a dot.

[^/.]+
defines any number of characters which contains neither slash nor dot.

http://
this is a literal statement — in this case, the literal character string, “http://”.

^domain.*
defines a string that begins with the term “domain”, which then may be proceeded by any number of any characters.

^domain\.com$
defines the exact string “domain.com

-d
tests if string is an existing directory

-f
tests if string is an existing file

-s
tests if file in test string has a non-zero value

Redirection Header Codes
  • 301 – Moved Permanently
  • 302 – Moved Temporarily
  • 403 – Forbidden
  • 404 – Not Found
  • 410 – Gone

Tags : Character Definitions for htaccess, nilesh patelseo services providerfans of photography,  nilesh patel seo

Monday, October 21, 2013

How to do a 301 Redirect Properly on Apache or Windows Servers

In today’s blog post we will discuss how to implement a proper 301 redirect on Apache or Windows servers from one domain to another. Redirects are technical and we see a lot of sites where 301 redirects are not implemented properly. You might want to do 301 redirects because of a number of reasons: redirecting the non www to www and vice versa, or if you are changing your domain or a file within the same domain.  This is also a great post on ways you can fix your 404 error pages.

Before we enter the technical details, it is important to understand the importance of a 301 redirect from non www to www version of your site (or vice versa). First having two versions of your site can create duplicate content, which may result in your website being penalized by search engines. Secondly and most importantly, when you acquire links it’s always much better to have them pointing at one version of the site versus distributing it among two pages which dilutes the search engine importance to your domain.

301 redirects is the most preferred way of handling duplicate content. Other ways include using the ” rel = canonical” tag (don’t use for cross domain, Yahoo/ Bing still don’t recognize it), blocking files in robots.txt and the meta noindex tag.

Let’s dive into the technical details:

How to do a 301 redirect for an Apache server:

Step 1 : To implement a 301 redirect the file we need to work with is the .htaccess file. To access the file you need to go into your FTP and look into the document root.

Step 2 : If you can’t see it, enable viewing of hidden files since the .htacess file is hidden. If there is still no .htaccess file present , create one with a simple text editor.

Step 3 : Insert this code in the file:

Code example from non www to www:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
[NC] RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Obviously you will need to replace ‘example’ with your own domain name.

Also make sure the Rewrite Engine is turned on, you will just need to turn it on once.

Step 4 : Save and Test it!

How to do a 301 redirect for a Windows server:

When setting up a site in IIS, the normal process is to create one account for the site and add both www and non-www versions of the domain name to the host headers for the account. This creates a canonicalization issue;  and the site will then be available at both www and non-www URLs.

Step 1 : Get access to the Windows Server navigation Panel. Navigate your way to the Internet Services Manager (Programs — Administrative Tools — Internet Services Manager).

Step 2 : Create 2 accounts for the site within IIS: one with the www version of the domain in the host header and one with the non-www version of the domain. All of the site files can be placed in the preferred version and a single page in the other.

Step 3 : Right click on the single page you want to redirect FROM and choose Properties. The Properties box will now appear.

Step 4 : Change the redirect option to “A redirection to a URL” and type in the new URL in the box provided.

Step 5 : Be sure to check the box marked “A permanent redirection for this resource”. If you leave this box unchecked, you will create a 302 (temporary) redirect, which is not permanent or beneficial from an SEO standpoint in this situation.

Step 6 : Test it!

Doing a www redirect for Front Page

As I see some of the comments below pertaining to Front Page, it was a matter of time before I had to do one for this God-forsaken MS product myself.  Here’s how I did after some trial and error:

1.  First, you have to identify weather you are running Linux or Windows.  This works for Linux.  Apparently, there is an extension called FollowSymlinks which needs to be turned on, as well as Mod Rewrites, so call your host provider for that one.

2.  FTP uses several .htaccess files – one in the main directory structure, and 3 other .htaccess files called “super files”.  You will find these other .htaccess files here:

/_vt_bin/.htaccess
/_vt_bin/_vti_aut/.htaccess
/_vt_bin/_vti_adm/.htaccess

3.  Make sure this is at the top of all 4 .htaccess files: “Options +FollowSymlinks” underneath “# -FrontPage-”

4.  Underneath this, add your 301 redirect command:

RewriteEngine On 
RewriteCond %{HTTP_HOST} ^yoursite.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

Here, I did a 301 from non-www to the www, because for SEO purposes, most people have more inbound links pointing to the www version.

That’s is – this should work!


Tags : How to do a 301 Redirect Properly on Apache, How to do a 301 Redirect Properly on Windows Servers,  non www to www, redirecting on www, non www to www with .htaccess,  nilesh patelseo services providerbest information of the world,  nilesh patel forum,  fans of photography,  nilesh patel seo

Redirecting non-www to www with .htaccess

If you want to redirect all non-www requests to your site to the www version, all you need to do is add the following code to your .htaccess file:
RewriteEngine On RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
This will redirect any requests to http://my-domain.com to http://www.my-domain.com. There are several benefits from doing that:

This will redirect any requests to http://my-domain.com to http://www.my-domain.com. There are several benefits from doing that:
  • It will avoid duplicate content in Google
  • It will avoid the possibility of split page rank and/or split link popularity (inbound links).
  • It's nicer, and more consistent.
Note that if your site has already been indexed by Google without the www, this might cause unwanted side effects, like lost of PR. I don't think this would happen, or in any case it would be a temporary issue (we are doing a permanent redirect, 301, so Google should transfer all rankings to the www version). But anyway, use at your own risk!

Something nice about the code above is that you can use it for any website, since it doesn't include the actual domain name.

Redirecting www to non-www

If you want to do the opposite, the code is very similar:
RewriteEngine On RewriteCond %{HTTP_HOST} !^my-domain\.com$ [NC] RewriteRule ^(.*)$ http://my-domain.com/$1 [R=301,L]
In this case we are explicitly typing the domain name. I'm sure it's possible to do it in a generic way, but I haven't had the time to work one out and test it. So remember to change 'my-domain' with your domain name!

In this case we are explicitly typing the domain name. I'm sure it's possible to do it in a generic way, but I haven't had the time to work one out and test it. So remember to change 'my-domain' with your domain name!


Tags : non www to www, redirecting on www, non www to www with .htaccess,  nilesh patelseo services providerbest information of the world,  nilesh patel forum,  fans of photography,  nilesh patel seo

How Google is Changing Long-Tail Search with Efforts Like Hummingbird

The Hummingbird update was different from the major algorithm updates like Penguin and Panda, revising core aspects of how Google understands what it finds on the pages it crawls. In today's Whiteboard Friday, Rand explains what effect that has on long-tail searches, and how those continue to evolve.


For reference, here's a still of this week's whiteboard!


Video Transcription

Howdy, Moz fans and welcome to another edition of Whiteboard Friday. This week I wanted to talk a little bit about Google Hummingbird slightly, but more broadly how Google has been making many efforts over the years to change how they deal with long-tail search.

Now long tail, if you're not familiar already, is those queries that are usually lengthier in terms of number of words in the phrase and refer to more specific kinds of queries than the sort of head of the demand curve, which would be shorter queries, many more people performing them, and, generally speaking, the ones that in our profession, especially in the SEO world, the ones that we tend to care about. So those are the shorter phrases, the head of the demand curve, or the chunky middle of the demand curve versus the long tail.

Long tail, as Google has often mentioned, is a very big proportion of the Web search traffic. It's anywhere from 20% to maybe 40% or even 50% of all the queries on the Web are in that long tail, sort of fewer than maybe 10 to 50 searches per month, in that bucket. Somewhere around 18% or 20% of all searches Google says are extremely long tail, meaning they've never seen them before, extremely unique kinds of searches.

I think Google struggles with this a little bit. They struggle from an advertising perspective because they'd like to be able to serve up great ads targeting those long-tail phrases, but inside of AdWords, Google's Keyword Tool, for self-service advertising, it's tough to choose those. Google doesn't often show volume around them. Google themselves might have a tough time figuring out, "hey, is this query relevant to these types of results," especially if it's in a long tail.

So we've seen them get more and more sophisticated with content, context, and textual analysis over the years such that today, with the release of, in August according to Google, Hummingbird, which was an infrastructure update more so than an algorithmic update. You can think of Penguin or Panda as being algorithmic style updates, and Google Caffeine, which upgraded their speed, or Hummingbird, which they say upgrades their text processing and their content and context understanding mechanisms is affecting things today.

I'll try and illustrate this with an example. Let's say Google gets two search queries, "best restaurants SEA," Seattle's airport, that's the airport code, the three-letter code, and "where to eat at Sea-Tac Airport in Terminal C." Let's say then that we've got a page here that's been produced by someone who has listed the best restaurants at Sea-Tac, and they've ordered them by terminals.

So if you're in Terminal A, Terminal B, Terminal C, it's actually easy to walk between most of them except for N and S. I hope you never have to go N. It's just a pain. S is even more of a pain. But in Terminal C, which I assume would be Beecher's Cheese, because that place is incredible. It just opened. It's super good. In Terminal C, they've got a Beecher's Cheese, so they've got a listing for this.

A smart Google, an intelligent engineer at Google would go, "Man, you know, I'd really like to be able to serve up this page for this result. But it doesn't target the words 'where to eat' or 'Terminal C' specifically, especially not in the title or the headline, the page title. How am I going to figure that out?" Well, with upgrades like what we've seen with Hummingbird, Google may be able to do more of this. So they essentially say, "I want to understand that this page can satisfy both of these kinds of results."

This has some implications for the SEO world. On top of this, we're also getting kind of biased away from long-tail search, because keyword (not provided) means it's harder for an individual marketer to say: "Oh, are people searching for this? Are people searching for that? Is this bringing me traffic? Maybe I can optimize my page more towards it, optimize my content for it."

So this kind of combination and this direction that we're feeling from Google has a few impacts. Those include more traffic opportunities, opportunities for great content that isn't necessarily doing a fantastic job at specific keyword targeting.

So this is kind of interesting from an SEO perspective, because we're not saying, and I'm definitely not saying, stop doing keyword targeting, stop putting good keywords in your titles and making your pages contextually relevant to search queries. But I am saying if you do a good job of targeting this, best restaurants at SEA or best restaurants Sea-Tac, you might find yourself getting a lot more traffic for things like this. So there's almost an increased benefit to producing that great content around this and serving, satisfying a number of needs that a search query's intent might have.

Unfortunately, for some of us in the SEO world, it could get rougher for sites that are targeting a lot of mid and long-tail queries through keyword targeting that aren't necessarily doing a fantastic job from a content perspective or from other algorithmic inputs. So if it's the case that I just have to be ranking for a lot of long-tail phrases like this, but I don't have a lot of the brand signals, link signals, social signals, user usage signals, I just have strong keyword signals, well, Google might be trying to say, "Hey, strong keyword signals doesn't mean as much to us anymore because now we can take pages that we previously couldn't connect to that query and connect them up."

In general, what we're talking about is Google rewarding better content over more content, and that's kind of the way that things are trending in the SEO world today.

So I'm sure there's going to be some great discussion. I really appreciate the input of people who have done extensive analysis on top of Hummingbird. Those folks include folks like Dr. Pete, of course, from Moz, Bill Slawski from SEO by the Sea, Ammon Johns, who wrote a great post about this. I think there'll be more great discussion in the comments. I look forward to joining you there. Take care.