How we collect data and analyze it to create the most accurate hosting market share report.
HostAdvice Market Share is a result of 4 years of extensive research and development. It employs several key technologies to generate comprehensive reports of unprecedented accuracy and quality by processing data collected using public domain tools and utilities. We have developed several web crawlers to gather our statistics data. They are:
- Directory crawler – We have crawled the dozens of business directories all over the world and in various languages. We made sure to crawl directories from different regions so that we have even coverage of website from all over the world, and so that our statistics are as accurate as possible. Our advanced crawlers crawl between 5-7 directory pages every second. The importance of getting database from business directories is that we are able to notice shifting trends. New website owners tend to try different hosting solutions more than older website owners that tend to stick more with their past solution provider. To make sure our statistics are not biased we’ve removed free hosted sites in platforms like Tumblr, Blogspot, Livejournal, etc. since these sites are not hosted on web hosting companies and many times are free of charge and not equal to the paid service. This means, for example, that we will count a site paying to Wix and hosted on its own domain, but not a free site that was created (maybe never even completed and hosted on Wix platform as subdomain).
- URL crawler – For every site we add to our database we check their DNS records, Geo, Whois information and other important pieces of data. This crawler checks between 3-5 websites every second. For this crawler we’ve developed our own Whois service, which we also use for the HostAdvice “Who is Hosting” browser extension , which gathers the most accurate whois information about any domain or IP. We are crawling all TLDs without exceptions, including local TLDs like .es or .ru, and category TLDs like .edu .net .org etc. We are not excluding .gov domains, despite noticing that they are typically not hosted on a public available platform, but rather on internal servers of the government. To help remove biased data from our system, we are excluding parked/forwarded domains from our market share figures – Only live hosted website count will make the cut.
- Linking script – By taking dozens of factors into consideration we are able to link each site to the web hosting company that hosts it. This script uses HostAdvice’s algorithm and comprehensive database of web hosts to make the link. At the moment, we examine 25-50 million new websites monthly. We constantly make sure that every website in our database is a “standalone” website, which is why we exclude social networks (like Facebook) and subdomains. For better understanding, we will not parse website.example.com but we will parse domains like example.com.
Not every hosted site is hosted on a hosting company server. Many companies host their own sites on internal servers (Google). To filter out these situations, we are only counting on our market share sites hosted on a service that includes at least 10 domains, and we are manually excluding services that we find are self-hosted on a daily basis. Wherever we mention Country in the report, we are referring to Country of the Hosting Server and not Country of the Domain Name or Hosting Company. For instance, example.co.uk maybe hosted with a Hosting Company in the US despite being a UK site. In some cases, a hosting company might have servers in a few countries. Since the client always picks the server location, we thought this parameter is important enough to be the one parameter we are showing in our research.
HostAdvice Market Share Research is the only updated non-bias market share analysis and is
considered the industry standard among hosting companies and experts. You are welcome to share it on your site. Latest market share data update: July 10, 2015