Categories
SEO

Baidu spider. Learn all the secrets the Baidu Crawler

With over 70% of the Chinese market, Baidu is in the undisputed search leader. If you want to create a digital asset in this market, you need to make sure you learn all the secrets of the Baidu Spider. 

What is the Baidu Spider ?

The Baidu Spider is the official name web crawler for the Baidu search engine. It indexes pages on the internet to provide search results for the Baidu Search Engine.

Technical Details about Baidu Spider

In the following section we will cover the technical specifications related to the Baidu Spider.

User Agent

The first important aspect to note about Baidu is that it has multiple user agents. Each user agent will crawl the internet for different features of the Baidu Search Engine. 

ServiceBaidu User Agent
Mobile SearchBaiduspider
Desktop searchBaiduspider
Business Search (Advertisements)Baiduspider-ads
Baidu UnionBaiduspider-cpro
Baidu FavoritesBaiduspider-favo
Image SearchBaiduspider-image
News SearchBaiduspider-news
Video SearchBaiduspider-video

The advantage of such an approach is that you have rules in the robot.txt to target different agents.  

Robot Txt Rules and Meta tags

You can use standard rules and meta-tags to control the behavior of the different Baidu spiders.

Baidu spider DNS

You can validate the spider origin by using a reverse DNS and validate the subnet matches *.baidu.com.

Javascript

Baidu does is not very good an interpreting Javascript. It is important to keep in mind that any content that needs Javascript rendering is not processed by the Baidu crawler. Make sure that your site degrades gracefully when and if Javascript is not loaded. 

Optimizations for Baidu Spider

In this section, we will cover the SEO optimizations required to rank in the Baidu search results.

Domain Extension (.CN)

In order to it is rank in very important that you have a domain with the .cn extension. You can easily buy a domain from a domain registrar like GoDaddy. As part of the domain registration process, you will be required register your personal or the company information with the CNNIC. 

Hosting

Hosting is also a very important aspect when it comes to the Baidu search engine. You must host in mainland China. The main reason being that the site will sit behind The Great Firewall of China (CFC). The Great Firewall of China is China’s internet censorship system. Anything behind the firewall is at risk of being blocked by the Chinese Government. 

The other important factor when it comes to hosting is to not opt for cloud sharing options. If you share your i.p with other sites, there is a high risk that one of your neighbours sharing the cloud resource might be out of line. You may end up with the I.P. being blocked and forced to start from scratch. Make sure you opt for dedicated hosting.

Using a Chinese I.P. address is also very important for ranking. Baidu favors I.P. in the Chinese I.P. range. Performance is also a ranking factor. Hosting the site in China will also provide a performance boost for the site. Which means content will load faster for the users residing in China.

Content Publishing License (CPL)

Before you can operate a site in China, you have to apply for a License with the Chinese Government. If you are not a Chinese company, there is little chance to get hold of this license. Your best option is that you partner with a Chinese company that will apply for the license using their company name.

Crawl budget

As opposed to Googlebot the Baidu spider has a very short crawl budget. It is advised to implement a very flat structure were the homepage links to main category pages. Any other sub-categories and other pages should be linked from the main category pages.

The home page is the most powerful page on the site. The distance of a page from the homepage is a ranking factor to the Baidu search engine. The deeper you go in the site structure, the less important is that page in the eyes of the Baidu search engine.  

Baidu Webmaster documentation explicitly states that the page will rank better if you keep the URL structure short.

Page Size 

You must limit the size of the pages to 128KB. Due to limited crawl budgets, Baudi crawler will limit that amount of content pulled per page. 

Mobile First

The number of mobile internet users reached 1.16 bn in Apr 2020 with an average monthly time spent on mobile internet was 144.8 hours. In 2018 61% of the Chinese internet traffic was from Mobile. If you build a site for the Chinese market, make sure that the design is Mobile first. Make sure that the user experience is optimized when accessed from Mobile.

Culture

Goes without saying, make sure that you tailor the content of the website to the Chinese customers. Do proper market research to understand the cultural and social references. When it comes to language, make sure that the language is optimal for the region you are trying to target. China is a vast country. Different parts of the country have different languages. simplified Chinese is used in Mainland China, while in Hong Kong, Taiwan, and Macao use traditional Chinese is used. Make sure that you employ native copywriters.

Final Thoughts and Consideration

The Chinese market is very hard to conquer. You have to line up all your ducks to ensure that you stand a chance in this market and rank on page one in Baidu. The reward of this market is to gain access to 800 million people in China with internet access.