Technical SEO - Part 1: Site Structure
Search Engine Optimization (SEO) is a expansive subject that covers a multitude of disciplines and facets of your website. A key part of high-quality SEO is ensuring you are following technical SEO best practices and implementations. In this post, we will be doing a deep dive on an important component of technical SEO: structure.
Structure, in this case, refers to the HTML rendered by your web browser and web crawlers. The majority—but not all—of technical SEO that involves structure is invisible to your site visitors. The primary focus of technical SEO is on providing search engines information and context about the content of your website. Additionally, technical SEO can reduce potential ambiguity that requires more effort from search engine crawlers (e.g. adding a 301 redirect to a page you removed clearly tells search engine crawlers that you intentionally removed that page).
We will be outlining best practices for structure-related technical SEO that we believe are important for all websites. While it may not be possible to implement all of these enhancements on your site, we encourage you to tackle as many of them as possible on your site.
Table Of Contents
Hierarchical headings
Hierarchical headings are not only a technical SEO benefit, but also an accessibility improvement—double bonus! Headings are semantic HTML broken up into levels 1-6 (h1-h6) which should be used in descending order based on the sections of the webpage. Ensuring that your site is properly using hierarchical headings allows search engine crawlers and accessibility tools like screen readers, to easily differentiate sections of your page.
Let's say you have a page for selling dog products, your markup might look something like this:
<h1>Our Awesome Dog Products</h1>
<h2>Toys
<p>Info on our excellent toys</p>
<h3>Ropes</h3>
<p>A list of our rope toys</p>
<h3>Balls</h3>
<p>A list of our ball toys</p>
<h2>Beds</h2>
<p>Info on our luxurious dog beds</p>
<h3>Small beds</h3>
<p>A list of our small beds</p>
By structuring your headings this way, it makes it clear where one section of content stops and another starts (Toys and Beds) while also providing additional context that the level 3 heading of Ropes is associated with the Toys level 2 heading.
It's important to decouple the style of your headings from their hierarchical structure. If you want an h2 to look like an h1, you should apply a helper CSS class to get the desired style without affecting the proper heading structure. For all of our sites, we add simple helper classes like .h1, .h2, .h3, etc. to easily apply the heading level style we want without affecting the use of the appropriate heading level.
As mentioned at the start of this section, there are a maximum of 6 heading levels available in HTML elements. When implementing hierarchical headings, you will rarely find the need to go deeper than an h4; if you find yourself reaching this level or beyond, it is a good opportunity to evaluate if you should be breaking your content up into larger sections that don't require nesting so many heading levels.
Bonus Tip
You should only use one h1 per page. While search engines can handle multiple h1s per page, it's best to keep it limited to one because h1s should indicate what the page is about. If you find yourself semantically needing multiple h1's then that is an indicator that you may want to break the content up into multiple pages.
Canonicalization and Duplicate Content
Besides being a gigantic and awkward word, canonicalization is the process of letting search engines know which URL is the authoritative source for a piece of content. This concept is important to eliminate potential duplicate content issues which could dilute the ranking of a specific page when the same content is present on multiple pages.
A good example of when this could be an issue is if you have a staging site that isn't password protected and a search engine crawler finds it. In this instance, the sites would have nearly identical content and search engines will have to try to determine which is the correct page to display in search results. By applying a canonical link annotation, you explicitly tell search engines which page should be the authority for that content.
Implementation
Adding a canonical link to your pages is fairly straightforward and is achieved with a simple tag in the <head>
section of your website.
<link rel="canonical" href="https://www.domainname.com/pagepath/">
When implementing a canonical tag, always use the absolute URL of the page (that means having the domain name included) in case your site has multiple domains that don't redirect to your canonical page.
Examples of when to use canonical tags
- Parameters added to the URL for tracking or setting options like filtering (e.g. https://www.domainname.com/pagepath/?gclid=1234 should have a canonical value of https://www.domainname.com/pagepath/)
- Staging sites that aren't password protected or hidden from search engines
- Ecommerce sites with product pages appearing in multiple catalogs that result in multiple URLs for the same product
Redirects
Sometimes you need to change the URL or remove a page altogether. On a healthy website, you want to ensure that your old URLs redirect to keep your site visitors from experiencing the dreaded "404 not found" error page. The appropriate type of redirect is determined by your intentions when changing the URL or removing the page. Redirects will generally fall under one of the following status codes: 301 - Permanent Redirect, 302 - Temporary Redirect, and 410 - Gone.
301 - Permanent Redirect
A 301 Permanent Redirect is the most commonly used redirect that informs the browser (and web crawlers) that the old URL is no longer used and the new destination will be taking over for that page. In situations were the new URL covers a similar topic as the old URL (such as changing the URL of an existing page) your existing SEO benefits will carry through. In situations were the destination page is not the same topic (such as removing a product you aren't selling anymore), the SEO benefits you previously had from that page won't carry over, but your visitors will still be redirected to a valid page of your choosing.
302 - Temporary Redirect
A 302 Temporary Redirect is similar to a 301 in that it will redirect your users to a new URL with the difference being it tells web crawlers that this should be considered a temporary redirect that will be going away at some point in the near future. Some common use cases would be A/B testing, sections of your site that are temporarily removed while being revamped, seasonal pages, etc.
410 - Gone
A 410 - Gone status code isn't necessarily a redirect but can be configured as such on many hosting providers (we use Netlify which supports this pattern). A 410 status code is similar to a 404 in that it tells web crawlers the page no longer exists. By using a 410 instead of a 404 it sends an unambiguous signal to the crawlers that you intentionally removed that page and they should remove it from their index. Using a 404 will achieve the same results, but will be a longer process since the web crawlers want to ensure you didn't accidentally delete that page before removing it from their index.
Do you have questions about SEO? We are here to help.
Reach out to us with any questions you have about SEO and site structure. Our team is ready to help with any SEO or site structure questions you are wondering about.
Contact ValiticsInternal Linking
Internal links are links on your website that direct visitors to other pages on your website. You can add these links as either absolute URLs with the domain + pagepath (https://www.valitics.com/blog/) or as relative URLs (/blog/) that only use the pagepath. We highly recommend using only relative URLs for internal links as when the browser sees a relative link it will go to the pagepath relative to the current domain name. Using relative links ensures if you change your domain name in the future, you won't have to worry about individual page redirects because the links will still work correctly.
HEADS UP
This only works for internal links on your own site. For external links (aka links to pages on other websites), you must use the absolute link because the browser has no way of knowing what domain that pagepath lives on otherwise.
Sitemap
A sitemap is a file that web crawlers can utilize to efficiently discover all the pages on your site that you want displayed in search results. As a sitemap is an XML file, it is not something you will be manually creating. Most CMS platforms and static site generators provide easy ways to automatically generate a sitemap for your site (ours is generated via 11ty and automatically excludes pages we mark as noindex).
One important consideration for generating your sitemap is to ensure only pages that you want indexed by search engines are present on it. You probably don't want the thanks page your users see after subscribing to your newsletter to show up in search results.
While most web crawlers will automatically look for a sitemap.xml file, you can also submit your sitemap via Google Search Console or Bing Webmaster Tools to speed up the process of discovering your pages.
Bonus Tip
Ensure the last modified dates in your sitemap only changes when there are substantial updates to the page. If your sitemap lastmod date is updating when you make typo fixes or template level changes, eventually crawlers will start ignoring the dates.
Robots.txt
A robots.txt file is instructions specifically intended for robots (i.e. web crawlers) hence the very clever name. The most common use case is to tell web crawlers what not to crawl. You can see in our robots.txt file that we tell crawlers they should ignore all pages in the /thanks/ directory because this directory contains pages that visitors should only see after filling out forms on our site. You can also include a link to your sitemap file in the robots.txt file to tell web crawlers that you have a sitemap file that exists for them to use. If you don't have access to your robts.txt file, you can achieve a similar result by adding the noindex meta tag to the <head> section of your website.
HEADS UP
The robots.txt file is not a way to hide content on your site. It is simply a suggestion to web crawlers to not index that content which they may ignore. If you have content that you don't want crawlable or discoverable, you should put in a password-protected area of your site.
Social Sharing with Open Graph (OG) tags
While not technically technical SEO related, Open Graph (OG) tags are part of the structure of your site. OG tags are meta tags used to control how URLs shared from your site are displayed on social media sites like Facebook, Twitter (X) and Pinterest. Using OG tags allows you to customize the preview image, title, description and URL displayed on social media sites that support OG tags. We will have a future blog post doing a deep dive on OG setup, so stay tuned!
Final Thoughts
The proper implementation of the structure elements outlined in this post can help your technical SEO initiatives and keep your site in good shape for both robots and your visitors.