What is technical SEO?
Technical SEO is the area of SEO that focuses on identifying and fixing technical issues on websites and servers that negatively affect search engine spiders to crawl and index sites efficiently and effectively. Technical SEOs help search engines access, crawl, interpret and index websites correctly.
What is the difference between Crawling and Indexing?
Google follows different steps to find your pages and show them in Search results.
- The first step is crawling, which can be defined as the process where Googlebot discovers new or updated pages to be added to Google’s index. Googlebot uses an algorithmic process to determine which sites to crawl, how often and how many pages to fetch from each site. Google’s crawl process begins with a list of web page URLs, generated form previous crawl processes, augmented by sitemaps provided by webmasters in Google Search Console. As Googlebot visits each of these pages it detects links on each page and adds them to the queue of pages it needs to crawl.
- The second step is indexing, which is when Google tries to understand what the page is about. When Google finds a page Google’s systems render the content of the page just as a browser does. Google processes each page it crawls in order to compile a massive index of all the words it sees, content freshness, location, among other things in order to understand the page. The information is stored in Google Index, a huge database stored in many servers.
- The third step is serving and ranking, which is when a user does a search and Google tries to find the most relevant answer from its index based on many factors. In a fraction of a second, Google sorts through billions of websites in its index and determines the best results for a search by considering things such as the user’s location, language and device.
What is a robots.txt file?
The robots.txt file tells search engine robots which pages on the site it shouldn’t crawl. If the robots.txt includes a link to the sitemap, it’ll help the robot identify the pages it needs to crawl.
How to access robots.txt?
The robots.txt file should always be located at the root of your domain, for example: https://www.example.com/robots.txt
How to check whether your robots.txt is working or not?
In order to test your robots.txt, you can use Google’s robots tester tool. You need to have your website verified in Google Search Console in order to use this tool. The tool will detect the robots.txt in your site and then you can use form to test if the URL you want to block is actually blocked:
What should be in a robots.txt file?
You should include any directory or page that you do not want search engines to crawl. For example:
- Admin pages
- Staging pages / website
- Internal search result pages
- Paid advertisement
There is a limit to the robots.txt file, it should not be larger than 500 KB.
What is an XML sitemap?
A good XML sitemap lists all of the canonical URLs on your website, helping search engine bots find the most important pages on your site.
Accepted file types on sitemaps:
- It should include a maximum of 50,000 URLs/50MBs. If you have more, you can use multiple sitemaps to list all of your URLs.
- You can include alternate language versions of URLs by using hreflang annotations.
- Must be UTF-8 encoded.
- Do not include no indexed pages
- Do not include pages that redirect or respond with 404.
- Do not include canonicalized pages: these are pages that include a canonical tag pointing to a different page.
- Don’t include session IDs from URLs
- Use consistent, fully qualified URLs. This means that if you are using https and www in your site, every URL should follow that format. For example: https://www.example.com, https://www.example.com/page1.html
How to create a sitemap (any site vs wordpress)
There are many tools that can help you create your sitemap, see full list on this article. If you are using a CMS such as WordPress there are several plugins that can create and maintain your sitemaps updated as you publish new content.
How to submit a sitemap to Google?
- Sign in to Google Search Console.
- If you have not added your site, you should add your site and verify it following Google’s instructions.
- Select your site from the dropdown menu on the top left.
- Click on the Sitemaps from the left menu.
- Add your sitemap and click on submit.
How to find the sitemap.xml on the site?
- Check your robots.txt file to see if your sitemap’s URL has been included. It is best practice to link to the sitemap from the robots.txt file.
- You can also check Google Search Console to see if a sitemap has been submitted.
- Perform this search in Google “site:yourdomain.com filetype:xml”
What is first contentful paint (FCP)?
First Contentful Paint or FCP is a metric for measuring load speed, which measures the time from navigation to the time when the browser renders the first bit of content from the DOM.
What is first meaningful paint (FMP)?
First Meaningful Paint or FMP is a metric for measuring load speed, which measures the time from navigation to the time where the page’s primary content appears on the screen.
What is time to interactive ?
Time to Interactive or TTI is a metric used to measure load speed, that measures how long it takes a page to become ready for users to interact with it.
What is time to first byte (TTF)?
Time to first byte is a measure of how long it takes a browser to receive the first byte or initial response from your server.
When a user visits your site, that user’s browser will send a request to the server where your website is hosted. The amount of time the browser is waiting for the server to respond is TTF.
What is cache?
Cache is a small amount of memory used to temporarily store data that will be reused. There are different types of cache:
- Browser Cache – helps users quickly navigate through websites they have recently visited. It will store files such as images of a site so that they do not need to be loaded each time you visit the website.
- Data Caching – usually used by CMS solutions, helps websites or applications load faster giving users a better experience. It does this by avoiding extra trips to the websites database to retrieve data sets that do not change frequently by storing it in local memory on the server.
- Output Caching – can store rendered pages. This discards the need of having to generate pages again and again for each request.
- Distributed Caching – stores data across multiple nodes or servers within a network. It makes resources available quicker and globally, allowing scalability and steady performance.
What is a CDN?
CDN stands for content delivery network. As the name suggests, it is a network of servers that are geographically distributed and work together to deliver pages and other web content to users based on that user’s location. This means that the server closes to that user’s location, will respond and load the content making the website load faster.
How does a CDN work?
CDNs reduce the time between the user’s browser submitting a request for a web page to a server and the web page fully loading on by reducing the physical distance that the request has to travel.
What is a link (hyperlink)?
A link or hyperlink is a reference that connects pages together. Users and search engine bots can follow hyperlinks to navigate from one page to another. Users can click or tap on links that can be highlighted words or images on the screen and usually point to a whole page or to a specific element in a page.
What is deep linking?
A deep link is a hyperlink that links to a specific piece of content on a website rather than the website’s home page. This SEO strategy can help with user engagement and conversions as well as improve the crawl efficiency of the website.
What is a nofollow link?
Nofollow links include a rel=”nofollow” html tag applied to them. These tags tell search engines to ignore the link.
What is an affiliate link?
Affiliates promote a product or service and earn a commission whenever a sale is made. Affiliate links use a specific URL that contains the affiliate’s ID so that any purchases done through those links are attributed to the affiliate.
What is anchor text?
Anchor text is the clickable text in a hyperlink.
What does URL stand for?
URL stands for Uniform Resource Locator. It is the address of a webpage, which specifies the location of the content.
What is a canonical URL?
A canonical URL is the URL that has been selected as the preferred version from a group of similar pages. Adding canonical tags on URLs helps webmasters prevent duplicate content issues in SEO by specifying which is the preferred version of a web page. Search engines will understand that certain similar URLs are actually the same. The canonical URL can be specified by using a rel=canonical tag in the header of the html page.
What do the different server response codes mean?
Below we’ve included the most common server response codes with their definitions:
Any code starting with a 2 are successful server responses.
- 200 (OK) – Standard response for a successful HTTP request.
Any code starting with a 3 are redirection messages.
- 301 (Moved Permanently) – The URL has changed permanently and the new URL is given in the response.
- 302 (Found) – This response code means that the URI requested has been changed temporarily.
- 307 (Temporary Redirect) – The server sends this response to direct clients to get the resource requested at another page. This has the same semantics as 302 HTTP response code.
Any code starting with a 4 are error responses.
- 403 (Forbidden) – The client does not have access rights to the content so the server refuses to give the requested resource.
- 404 (Not Found) – The server can not find the requested resource.
- 410 (Gone) – This response is sent when the requested content has been permanently deleted from the server, with no forwarding address.
Any code starting with a 5 are server errors.
- 500 (Internal Server Error) – The server has encountered a situation it doesn’t know how to handle.
- 503 (Service Unavailable) – The server is not ready to handle the request. Common causes are a server that is down for maintenance or that is overloaded.
What is the htaccess file?
htaccess is a configuration file used by Apache-based servers. The main functionality is to configure different website-access rules such as redirections, access control and more.
Where is the htaccess file?
The htaccess should be located in the public_html folder. You can access your htaccess file in a few different ways:
- FTP client
- Hosting account’s file management (such as cPanel)