Every SEO has pushed Excel beyond it’s limits at some point. Pandas (‘Python Data Analysis Library’) is a widely used Python library that can handle far more data than Excel/Google Sheets.
As an example, here is a Jupyter Notebook with Python / Pandas code that:
Upload a .CSV export from Search Console > Performance > Queries
The notable insight from the recruitin.net data is that due to very strong rankings and a relatively unknown brand, generic terms drive more clicks than brand-terms. In this particular niche the specificity (query length & number of tokens) have a very week impact on clicks.
If you manage a site with millions or even billions of URLs, it’s important to consider that Google and Bing have a crawl budget, a limit in the number of URLs they are prepared to crawl, for every domain, determined by its authority.
If a less-authoritative domain has billions of URLs, Google won’t crawl potentially important sections of your site, thus losing you traffic.
So one of the biggest SEO challenges for large ecommerce sites is balancing:
Not missing traffic by excluding product and aspect (aka attribute) combinations that have search demand
vs.
Not spamming the search engines with many combinations for which there is no demand
An example
For example, Size and Colour aspects for sites that sell both televisions and shoes:
Size
Colour
Televisions
High search volume
Important filter aspect
Low search volume
Unimportant filter
Shoes
Low search volume
Important filter
High search volume
Important filter aspect
Why not open up every combination?
Take one category e.g. Men’s sports shoes, with 6 aspects in your catalogue:
Aspect
Example
Number of aspect values
Brand
Nike
150
Shoe size
10
30
Colour
Blue
15
Style
Basketball shoes
5
Material
Leather
4
Line
Air Jordan
200
Every combination of every aspect value multiplies up very quickly:
150 * 30 * 15 * 5 * 4 * 200 = 270,000,000
That is, 270 million possible URLs for this category alone!
The solution
Understand which aspects have values which are primarily searched in Google and only open those aspects for crawling.
You’ll need large and representative keyword set, potentially millions of keywords depending on the scope of your site, but here’s a rough example on a limited keyword set as an example
Category: Mens Trainers
Matching Aspect 1
Matching Aspect 2
Avg. Monthly UK Google Searches
mens white trainers
Colour
2900
mens running trainers
Style
2900
mens black trainers
Colour
2400
nike mens trainers
Brand
1600
white trainers mens
Colour
1600
mens trainers uk
None
2400
black trainers mens
Colour
1900
mens gym trainers white
Style
Colour
1300
all black trainers mens
Colour
1000
mens red trainers
Colour
880
Full list here, note the data has been tweaked to better illustrate the concept. Also for American readers, trainers = sneakers đ
The search volume for the keywords can be clustered into the category’s aspects e.g.
Aspect
Searches Containing a value for this aspect
Colour
19,920
Style
13,420
None
6,000
Brand
3,750
Material
3,150
Size
2,720
From this we can see that Colour and Style are important to open up to crawl, Material and Size less so.
Good:
https://site.com/trainers/blue
https://site.com/trainers/running
https://site.com/trainers/blue-running
Potentially a waste:
https://site.com/trainers/size-10
https://site.com/trainers/leather/size-10
Just removing Material and Size, dramatically reduces the number of aspect combinations:
150 Ă 15 Ă 5 Ă 200 = 2,250,000
Saving us 267,750,000 URLs required for crawling. Not bad!
As aspects can be common across categories (e.g. size and colour), to exclude categories selectively, append a string which you have excluded in robots.txt to those categories you choose to not have indexed e.g.
Sites with broad inventories, whether products, jobs, holiday destinations or anything else, should be careful to only open for crawl aspect combinations where there is real external demand.
Also important, sign up to my totally unrelated side project, Mustard Threads đ
Practically everyone has, is, or will consider running a side-project at some point. It can be a hobby, a way to learn, boost your portfolio, generate extra cash or a lottery ticket out of the day job. Either way itâs tough to beat the thrill of seeing other people using your creation.
Most people though, wonât plough much money in to marketing their idea, at least until the point it starts to show real promise or generate itâs own income.
This therefore is a guide to promoting your idea with little or no money.
1.0 Set objectives
Before you start, decide what you actually want to achieve with the project e.g.
Test whether an idea has commercial potential before quitting your job
Show employers what you can do
Generate traffic and page-views to earn affiliate or ad revenue
Deciding this gives you a clear direction for your marketing.
This is the point where you normally find the three other tools doing your totally original idea. It’s annoying but don’t let it deter you; it’s all about marketing and execution, just ask Tom:
MySpace Tom
2.2 Setup a holder
Well before you launch, setup a holder page with:
A line to describe your project and itâs benefits e.g. An app that helps accountants file taxes faster
A more detailed (but still concise) explanation
A way to register for more interest
Social share prompts
If a representative user canât tell you in a couple of seconds what the project is about, you need to tweak some more.
Having a good holder site will ensure:
Your SEO will hit the ground running
You can direct anyone interested there for more info
You gain a bit more credibility
You can capture the details of anyone who shows up in the mean-time
Launch Rock popularised and systematised this approach, but a basic HTML template + Mail Chimp is also free and can be a bit less fiddly in practice.
As an example hereâs the pre-launch screenshot of holder page for Mustard Threads.
Click for the full page
2.3 Make friends
If there are any authorities in your niche, now is the time to start making friends with them on Twitter, LinkedIn or wherever they interact mainly. Building relationships with bloggers, popular Twitter users and journalists now will pay dividends later.
Search the keywords that you have identified in your keyword research and see who shows-up in Google. Identify Twitter Hastags used by your audience e.g. for menâs fashion startup Mustard Threads, #GentsChat attracts an audience likely to be interested. Be genuine and gracious and it will work out. You can often also get new insights in to your projectâs functionality.
LinkedIn can be a great way to reach people. Use RecruitEm to find journalists, industry influencers etc, connect with a personalised message.
Register all your social accounts (Twitter handle, Facebook Page name etc) at this point.
3.0 Launch!
So your creation is ready and youâve replaced your holder with the real thing. Many people like to throw everything at a big day one launch. That can sometimes be a good idea, like when you’re gunning for an app store ‘Most Popular’ or ‘Top Rated’, but in my experience both corporate and personal, a gradual escalation of marketing is almost always better.
A âsoftâ / beta launch allows you to:
Identify functional and UX bugs without alienating your most valuable users
Test and improve your messaging
Spend whatever money you do allocate wisely
It will differ by project but hereâs my suggested running order:
3.1 You and the rest of the team
If youâre even remotely target market, you should definitely be using (or âdogfoodingâ) what youâve created. This will prevent user generated content projects from being a ghost town for early users and flag up any UX / functional issues.
Yum!
3.2 Your long-suffering friends and family
Your friends and family (presumably) like you, hopefully enough to give your project a go. Obviously itâs tricky if youâve created something thatâs a real niche; (your .htacess generator might be a bit lost on your grandma), but chances are youâll have some people in your close circle who would enjoy or benefit from it.
At this point youâll want to stay close to your new users to get any feedback as to how you can improve the UX and functionality.
Tools like Doorbell and Podio offer free feedback functionality you can embed on in your site or app.
3.3Â Use your wider network
Find a way to promote your project on your every social profile without spamming everyone to death. A nice Facebook share with a request for feedback seems to work well.
Most people, inevitably, will be disinterested, but you might be surprised at the people who become users and even advocate for you.
On Twitter, unless youâre a terrible pop star with millions of followers, then make sure you do a few tweets at different times of the day, with hashtags relevant to your target audience.
If you have a personal blog, write an announcement post. Later on you can reach a wider audience on Medium.com which is the derigeur place for startups to communicate these days.
Obviously also consider, Google+ , Pinterest etc etc. as suits your project and goals.
3.4 Send those emails
Hopefully during the course of building your project, some judicious social sharing, personal networking and random SEO will have generated a few sign-ups on your holder page.
Now is the time to cash this in, email them an let them know youâre good to go, let them know theyâre among the first to use it and ask for as much feedback as they are prepared to give.
3.5Â Accelerate SEO
SEO is a whole topic, and generally a slow burn but at minimum should have:
Identified the keyword to rank for
Included optimised metadata (and Meta Desc Tags)
Built links from wherever you can e.g.
Other personal projects and personal blogs
Other blogs and news sites writing about your project
3.6 Test paid search
Google throws around vouchers to encourage new advertisers on itâs AdWords program like confetti. Voucher values are usually up to around ÂŁ120, so if for example youâre paying ÂŁ0.30 per click will get you around 350 possible new users.
To get a voucher you can wither join up to their âpartner programâ in which case they will start mailing you vouchers periodically, or if youâre desperate go and flip through the web development magazines in your news agent.
You can use the results of the keyword research to build out your campaign. If it turns out all the AdWords traffic bounces, you probably picked the wrong keywords.
3.7 In person networking
Check Meetup for events related to your project. Practice a little description of what you do before you go so it sounds slick when youâre mingling.
If there is nothing specific to what you do, there will usually be a generic âstartupâ event you can attend, they do tend to be full of people too objectionable to hold down a job, but sometimes youâll strike gold.
3.8 Press & blogger outreach
If your service is genuinely interesting, new, or a timesaver; or at least thereâs an interesting angle on it (it uses a trendy a gadget, a cult celebrity uses it, you built it while in prison etc), you can usually get someone to write about it.
In my experience itâs nearly impossible to get the mainstream media (newspapers etc) to write about you, but if you fancy a go, try Muckrack. However blogs within a niche e.g. recruitment, SEO etc, will often be happy to write about you; sending you quality links (SEO win) and traffic.
3.9 Staying in touch
Encourage users to sign up to your Twitter feed, like your Facebook page and/or sign up to email alerts to encourage repeat visits.
3.10 Social Sharing
Make sure users can easily share on social. Consider what usually makes people share:
Ego â something about the user that flatters their ego
Inherent reward â get 10 extra points on your gamification system
Humour â Users share something funny so people think they are funny and like them more
Controversy â Tricky to pull off, but people do share causes etc
4.0 What not to do
This isnât a blog post for well-funded startups working full time on their next ‘unicorn‘, itâs for those creating projects in their spare time. Donât lean on work contacts or resources to help; itâs probably your day job that pays the rent so donât jepordise that.
Moreover though laws differ country-to-country, if youâre using work time, computers, contacts etc to work on your project, then should it actually become commercially valuable, your employer will have a strong case to assert ownership.
5.0 Next steps
After youâve got a solid base of users for your idea keep soliciting feedback, checking analytics (Google Analytics, Pwick) etc, doing Guerilla UX tests and improving it.
Hopefully your service should see a steady stream of new users, retain it’s existing users, and hit all the objectives youâve set for the project.
When Google regularly swaps the organic sitelinks under your brand, it can be a pain checking multiple sites / markets to make sure itâs the way you want it.
An example of sitelinks, for eBay on Google UK.
To solve this problem (for me), Iâve written a simple PHP script which when run daily will check this out, and email you if there any changes.
Keyword research is not only crucial for SEO, a powerful methodology for understanding the intentions and language used by your market, but by clustering the results you can also plan a website’s structure. This ensures:
Optimal search engine visibility
A taxonomy aligned to your market’s mental model
Selection of terminology understood by your market
In this practical step-by-step guide, I’ve used the example of planning a new job board, however the methodology is valid for all industries.
1. Getting started
Assuming we’ve already used Google Trends or market knowledge to identify it as the correct seed term, here’s the downloaded results for a query of ‘jobs‘ in the Google Keyword Planner. It’s important to select the correct market (in this case the UK) and turn ‘Only show ideas closely related to my search terms’ on, otherwise you’ll spend much longer sorting through irrelevant keywords.
The CSV download of a query for; Searches similar to ‘jobs’ in the UK
Delete the additional columns created by default (Competition, Suggested Bids etc.) leaving only the, Keyword and Average Monthly Search Volume sorted high to low.
2. Cleaning the list
As with any keyword research it’s important to check each term against these three criteria, ranked in order of importance;
Is it relevant? Do I have content on my site which relates to this?
Is there sufficient search volume? Do enough people search for it to make it worthwhile?
Is it achievable? Will my site, now or in the future realistically have enough authority to rank for this?
First remove the irrelevant terms e.g. imagine your hypothetical job board doesn’t;
Recruit for specific employers e.g. ‘tesco jobs‘ or ‘mcdonalds jobs‘
Wish to compete for big competitor brand terms like ‘guardian jobs‘
Recruit for jobs overseas
Want any vague or irrelevant terms like ‘good jobs‘ or ‘boob jobs‘
Afterwards you will be left with a reduced list with only those terms relevant to your business. In this example 85% of the terms we started with, it will vary for you based on how focussed on a specific niche your business is.
3. Identifying user intent
Next you need to understand exactly what solutions people are looking for. This is as much an art as a science, and while our example refined list has every possible segmentation of the jobs market e.g.
Salary ‘100k jobs‘
Educational level e.g. ‘graduate jobs‘
Industry e.g. ‘jobs in sport‘
Location e.g. ‘jobs in kent‘
This market is most obviously divided between those looking for specific skills and industries and those looking for jobs in locations, particularly the latter. Approximately 25% of the the terms with a cumulative 1.6m search volume relate to a finding a job in a specific location.
201 of the the terms with a cumulative 1.6m search volume related to a specific location.
The reminder are largely searches for function e.g. ‘marketing jobs‘ pr industry e.g. ‘music jobs‘.
4. Clustering the keywords
Post Google’s Hummingbird update there is much more focus on clustering of keywords, however it has always been the case that the same user intent has been represented by multiple keywords and that these should be grouped during the planning phase of a new site. The only real difference is that now we can rely on Google being somewhat better at identifying user intents so our groups can be broader.
In the location segment, we can clearly see many keywords with same intent and similar strings e.g.
jobs in glasgow (27100)
glasgow jobs (8100)
jobs glasgow (8100)
Which should be clustered together in Excel e.g.
Grouping keywords by user intent – jobs in London and in Glasgow.
It gets more interesting when we look at the professions and industries segment. As Google has improved at understanding concepts, we can now legitimately group together keywords that are semantically linked but with dissimilar literal strings, for example;
driving jobs (14800)
hgv jobs (12100)
delivery driver jobs (5400)
delivery jobs (5400)
chauffeur jobs (5400)
bus driver jobs (4400)
hgv driving jobs (3600)
van driving jobs (2900)
All these terms show a similar user intent, whether you choose to break out a term in to it’s own page is a judgment call you should make based on it’s importance to your business. In this example it’s arguable that ‘hgv jobs‘ is sufficiently distinct and popular to deserve it’s own page.
This needs to be completed for all the major segments you identified, which will probably take around a day, depending on the size of your niche and your mastery of Excel shortcuts. As you progress you will see patterns emerge and get a sense of the language and requirements of your market.
5. Building the sitemap
As you group the keywords in Excel you will see the sitemap emerge, with each page optimised for it’s most popular keyword but referencing the other keywords in the group.
A simplified example of a sitemap made by clustering keywords
When producing copy for these pages it’s ideal if you can, while keeping the user first in mind, use all or most of the keywords in the cluster.
6. Conclusion and more reading
By following this methodology you will produce an intuitive and search optimised sitemap for your site. For more information on clustering keywords, watch this video on modern keyword research from Moz’s Rand Fishkin.
Note: As March 2016 Google is no longer passing this information in the referral string.
As of September 2013 Google prevented site owners from seeing all organic referring keyword data in the referral string.
However there is still plenty of data to be gleaned from the string. For quick testing the HttpFox Firefox plugin is excellent. Systematically capturing the data is easily done in any web analytics tool or server log parser using simple Regex.
It’s important to note that this data appears not to be passed from mobile searches which may somewhat skew any conclusions.
1) The rank of the link that the user clicked
To understand the rank of the result the user clicked to arrive at your site, look at the âcdâ key / value pair. e.g.
cd=1 indicates the clicked listing was in first place, cd=3 third place etc.
It does however get more complex when authority links and universal search are included on the Search Engine Result Page (âSERPâ), which will happen in most cases.
In this case the universal search results are counted in the SERP and must be considered e.g. in this case itâs possible to have up to a cd value of 16 on page 1.
Orange numbers represent the ‘cd’ value
2) The type of link clicked (search, news, image etc)
The âvedâ parameter indicates what type of result has referred a visitor to your site.
Hereâs a marginally more verbose version of Timâs table. Note these are substrings of the total value;
VED Value
This means
QFj
A normal organic search result
QqQIw
A news OneBox link (e.g. 11, 12 & 13 above)
QpwI
A news OneBox image (e.g. 11 above)
Q9QEw
Video OneBox link
Qtw1w
Video OneBox image
QjB
An authority link (e.g. #2 â 4 on the screenshot)
BEPwd
Knowledge graph image
BEP4d
A secondary Knowledge Graph image
3) The local version of Google searched by the user
This is straightforward, you can clearly see the Top Level Domain (TLD) of the Google search that referred the visitor. In this example you can see Google UK;
Note the address itself is character encoded hence; http%3A%2F%2 represents http://.
5) Is the user logged in to Google?
Finally the âsig2â parameter only appears whe a users is logged in to Google, therefore you can determine the proportion of users arriving at your site authenticated with Google.
Obviously the loss of the referring keyword is a blow to the accuracy of any SEO reporting. But the above will at least allow site owners to answer questions like;
Does traffic from different ranks convert at different rates?
Does traffic from different types of search result behave differently?
What proportion of visitors arrive at your site from different local versions of Google?
This blog is about the light that the cumulative searches of hundreds of millions of individuals can shine on the world in a way that traditional sources of insight cannot.
So what makes keyword research better than other research methodologies? Itâs primary strength lies in it’s lack of bias. This impartiality is born of the intimacy that exists between a searcher and their search box that simply canât be replicated at scale any other way.
For example itâs unlikely that if asked in a survey, many of the 165,000 global searchers using Google to find information about âflatulenceâ in July 2012 would admit that it was their primary concern. Perhaps they might instead choose to align themselves with the more socially concerned (and fragrant) 60,500 people searching for âcure for cancerâ in the same month.
When a user enters their search they are speaking to a machine, they have a need and, as best they are able, they clearly and explicitly state that need.
These searches range from the mundane; âwhere can I buy Nespresso capsulesâ to the hilarious: âwhy does my mom smellâ, to the potentially tragic: âtest for aidsâ.
Whatever a searchers intention, every time a search is made it is added to aggregate statistics for the informed researcher to mine.
The strength of this new source of understanding is not only in itâs candour, it is also unprecedented in terms of itâs scale. Google with around 66% of the search engine market is queried 400 million times per day. Extrapolated to the whole search market thatâs around 600 million searches, a sample size that few other research methodologies can hope to match.
Search data versus social data (Part 1)
Social networks such as (in Anglo-Saxon countries) Twitter, Facebook and LinkedIn are often portrayed as the modern mirror of the people.
The immense data held, particularly by Facebook, is often quoted as having the key to understanding people on a macro and individual level.
For example, Facebook knows where you live, who your friends, colleagues, family are, where you go for fun, where you go on holiday and your favourite TV shows. Surely this is the ultimate data set for understanding humanity on a grand scale?
Well, no, and hereâs why.
Perception versus reality
When an individual creates content on a social network, particularly those where real names are encouraged such as Facebook or Google+, they are typically at least as conscious of the impact this will have on others perception of them as they would be talking in person with people they know.
The reason for this is that the average Facebook user has around 130 friends, but 7 close ‘real life’ friends, therefore any statement on Facebook is likely to reach a much wider and more diverse audience than one made in person.
As such, most social media users will screen themselves, conscious that their content may reach the eyes of family, co-workers, less close acquaintances and quite probably strangers and that each group of people may react in different ways.
For example political opinion expressed to 4 or 5 close friends is less likely to be challenged than one made to a diverse group of more than a hundred people from separate parts of oneâs life.
Beyond the user’s direct connections, the ability for a particular piece of content to be shared is virtually limitless as a number of individuals writing indiscreet Twitter updates or posting Facebook photos have found.
Instead, individuals conduct themselves on social networks in the way they wish to be perceived by this broad community of people, rather than as they truly are.
In social, users update with socially acceptable facets of their life.
âIâm on the trainâ
âIâm looking forward to my holidayâ
âMy cat is adorableâ.
It would be an unusual breach of convention for users of social media to ask where they can find at some good pornography, and yet that demand clearly exists, there are 277 million porn related searches from the comfortable anonymity of the search box every month.
And itâs not only sexual interests and personal hygiene problems that are directed at search and not social. If you are in need of information regarding a specific purchase, letâs say the purchase of a lamp or refrigerator, you are much more likely to start your search with a Google search rather than ask your friends who are relatively unlikely to have specialist knowledge about specific products.
If the average Facebook user has 160 friends, compared to tens of billions of indexed pages in Google and Bing, many of them written by niche experts and specialist retailers, itâs clear that online search is a more effective way of researching your needs.
This post is about the number of searches about the iPhone not the number of searches from an iphone which is a wholly different (and even more interesting) subject.
The annual release of a new iPhone model has become a significant media event and speculation over new features a favorite of topic of technology journalists, enthusiasts and consumers.
Apple follow a relatively consistent annual pattern of releases and it is therefore possible to compare over the five years since the launch of the original iPhone in 2007 just how excited the public are about each subsequent iPhone release by looking at search data.
To do this we can look first at the number of searches including âiphoneâ since 2007.
This graph shows the number of Google searches containing the term âiphoneâ made within the United States as reported by the Google Insights for Search tool. You can view the original data here.
We can see unmistakable peaks in interest around the time that each new iPhone is introduced.
However to truly understand the level of consumer interest we must factor in that the user base of iPhones has grown significantly, Apple have sold over 243 million units worldwide as of September 2012. These growing absolute numbers will result in cumulatively more searches that relate to care and maintenance rather than the interest in the coming model that we are trying to measure.
Therefore we can compare the variance in searches for the two weeks prior to the launch, with the two weeks of the launch to isolate consumer interest from the background noise.
The ‘Week of launch’ table show the number of Google searches including the term âiphoneâ made within the U.S. for each of the launch dates, where the greatest number (approximately 37 million) is represented by 100.
Model
Announcement
date
Preceeding two
week average
Week of
launch
% Change
iPhone 1st Generation
09/01/2007
0
20
iPhone 2nd Generation
03/03/2008
9.5
11
16%
iPhone 3G
09/06/2008
11.5
25
117%
iPhone 3Gs
08/06/2009
15
29
93%
iPhone 4
07/07/2010
42.5
37
-13%
iPhone 4s
04/08/2011
50
82
64%
iPhone 5
12/09/2012
42
100.00
138%
The figures reveal a very mixed picture for each of the models.
The 1st generation model was quite popular, when it was announced to Apple’s already loyal user base, wheras the very similar 2nd Generation was largely unnoticed.
The iPhone 3G was the first device to generate significant consumer interest, with the 3GS following close behind.
From the figures above it would seem that the iPhone 4 was received indifferently by consumers, however this is actually misleading. Search volume for âiphoneâ actually doubled on the underlying base around one month prior to the announcement (difficult but possible to see if you look closely at the graph above), hence the lack of change or even slight drop in % versus the preceding two weeks. This nuance is what led to the wrong conclusion in this otherwise clever piece of investment analysis.
The iPhone 4s was a new record in absolute numbers, more akin to the sharp peak and fall for the 3GS, with the strong search volume corresponding to record numbers of pre-orders for this model.
The clear winner here however is the iPhone 5, despite arguably being a simple iteration on previous iPhones is in fact by far the phone that consumers are most excited about. Early sales data backs this up with search volume being more than double the 4s.
What we can therefore conclude;
That pre-announcement search volume correlates with sales in the days immediately following the launch
Even as the models become arguably more generic, consumers are not tiring of the iPhone range
Although absolute numbers of searches are increasing over time the level of consumer interest in each new model varies significantly
It will be interesting to see how long Apple can continue to generate the kind of fevered anticipation for it’s mobile releases that other consumer brands can only envy.
This blog is about Internet search and how billions of searches by hundreds of millions of people can help us understand the world in interesting and useful new ways.
Although it has hitherto been used almost exclusively by online marketers (my own background), keyword research is equally useful for entrepreneurs, politicians, policy makers, academics, market researchers and product managers.
This blog will therefore attempt to;
Demonstrate exactly how keyword research can be used for more than just online marketing
Explain how anyone can conduct keyword research
Share interesting examples of the insights keyword research on society, business, celebrity, technology and politics.
My name is Chris Reynolds and I have been conducting keyword research professionally since 2003. I currently work as Global Digital Strategy Manager for a large international corporation based in ZĂŒrich, Switzerland. Iâm also co-founder of UK based Clever Biscuit Ltd. Any views in this blog are my own and not those of my employer or company.
I hope you find this blog interesting and useful. Thanks for reading!