What is the difference between crawling and indexing?

Crawling describes a search engine bot fetching a page. Indexing describes the subsequent inclusion in the search index. A page can be crawled but not indexed, for example because of a noindex tag or a canonical that points to a different address.

Do small websites even need a sitemap.xml?

A sitemap is useful for small sites too, because it gives search engines a complete list of the important addresses. It is not mandatory, since bots also follow internal links. With good internal linking the effect on small sites is minor, but so is the effort.

Is load time a direct ranking factor?

Google confirms the Core Web Vitals as a ranking signal within page experience. They are one factor among many, not the main one. Among comparably relevant pages the faster one may earn the better position, yet relevant content remains the basis.

Technical SEO, the Foundations of Findable Websites

Crawlability, indexing, sitemap, robots.txt, structured data, canonicals, hreflang and load time. What technical SEO covers and how to implement it cleanly.

Published on November 25, 20257 min read

What Technical SEO Does and Does Not Do

Search engine optimization is often reduced to content and keywords. Before content can rank at all, a search engine first has to find it, fetch it and understand it. That is exactly the job of technical SEO. It creates the conditions for good content to become visible.

Three steps run in sequence. A search engine discovers an address, fetches the content (crawling) and adds it to the index (indexing). Only then can the page appear for a query. If one step breaks, even the best text is wasted.

Technical SEO does not replace good content. It gives good content a chance.

We treat these foundations as part of "thinking further". Clarifying early how a page is found and read means building on a foundation rather than on hope. How this approach connects across the four movements is described on our Mission page.

Crawlability, the Access for Search Engines

Crawlability means that a bot can reach a page and read its content. The most common obstacles are self-inflicted. Content that appears only through JavaScript after load, without server-side preparation. Important pages that no internal link points to. Endless filter URLs that consume crawl budget without delivering new content.

A flat, clear site structure helps the most. As a rough rule, every relevant page should be reachable from the homepage in a few clicks. Internal links are not only navigation but also the path along which a bot discovers a page.

robots.txt Controls Access

The robots.txt file sits in the root directory of a domain and tells bots which areas they should not fetch. It is a directive to cooperating crawlers, not a security mechanism and not an indexing command.

A common and costly mistake is confusing robots.txt with noindex. If a page is blocked via robots.txt, the bot cannot fetch it and therefore cannot read a noindex on the page either. An accidentally blocked page can then still surface in the results, only without useful content. To remove a page from the index, the noindex belongs on the page, and the bot must be allowed to fetch it.

Indexing, the Inclusion in the Search Index

Only indexed pages can appear in the results. Whether a page gets indexed depends on several signals that need to agree. The table below maps the most important tools to their effect.

Tool	Effect	Typical use
robots.txt Disallow	Prevents fetching, not indexing	Admin areas, internal search results
Meta noindex	Prevents inclusion in the index	Thank-you pages, thin filter pages
Canonical	Consolidates duplicates onto one main address	Sorting and filter variants
sitemap.xml	Suggests important addresses for crawling	All indexable pages

A contradiction between these signals leads to unpredictable results. A page that is listed in the sitemap but carries a noindex sends two opposing statements. Clean technical SEO means that all signals tell the same story.

sitemap.xml and How to Maintain It

An XML sitemap is a machine-readable list of the addresses that should go into the index. It does not replace good internal linking, but it complements it, especially on large or new sites whose content would otherwise be discovered late.

What matters is that a sitemap stays consistent. It should contain only addresses that are actually indexable. A few rules keep it clean.

Include only pages that return status 200 and are indexable.
Do not list addresses excluded by noindex or canonical.
At most 50,000 addresses or 50 MB uncompressed per sitemap, otherwise split into several files with an index sitemap.
Reference the sitemap in robots.txt and submit it in Search Console.

The lastmod field is only a useful signal if it is maintained honestly. A date that is updated blanket-style on every build loses its meaning, because it no longer indicates a real change in content.

Structured Data with Schema.org

Structured data describes the content of a page in a machine-readable format. The widely used vocabulary for this is Schema.org, and the recommended notation is JSON-LD in a script block in the source. With it a section can be marked up as an article, an organization, a product or an FAQ.

The benefit is twofold. Search engines understand the context more reliably, and suitable markup can enable rich results, such as rating stars or expandable questions. There is no guarantee of such displays, since the search engine decides on its own whether and how to show them.

Two principles are binding here. Only mark up what is actually visible on the page. And the markup must match the real content. Structured data that claims something other than the visible page counts as a guideline violation and can lead to manual actions.

Canonicals and hreflang, Order for Duplicates and Languages

The same or very similar content under several addresses dilutes the assessment, because signals scatter across variants. The canonical tag (rel="canonical") names the preferred address and consolidates the signals there. Typical cases are sorting and filter parameters, print views or the same page with and without a trailing slash.

A canonical is a hint, not a command. It works reliably only when the remaining signals agree with it, meaning internal links point to the canonical address and that address is in the sitemap. A page that points to another via canonical should not additionally set itself to noindex, otherwise the statements contradict each other.

hreflang for Multilingual Sites

With several language or country versions, hreflang tells the search engine which version belongs to which language and region. That way a user in a German-speaking region gets the German variant and an English-speaking one gets the English variant, instead of both competing as duplicates. Three points decide the effect.

The values follow the language code per ISO 639-1, optionally with a country code per ISO 3166-1, for example de or en-US.
The references are reciprocal. If the German page points to the English one, the English page must point back to the German one.
An entry with x-default names the default version for languages not covered.

A bilingual site with a German and an English version, like the ones we build, needs a complete, mutual hreflang set for every indexable page. If the return reference is missing, search engines ignore the annotation.

Load Time as a Ranking Factor

Speed is a confirmed ranking signal, measured through the Core Web Vitals. Three metrics capture the user experience while loading and interacting. The thresholds below mark the "good" range according to Google.

Metric	Measures	Good threshold
LCP (Largest Contentful Paint)	Load time of the largest visible element	under 2.5 seconds
INP (Interaction to Next Paint)	Response time to input	under 200 milliseconds
CLS (Cumulative Layout Shift)	Visual stability during load	under 0.1

In March 2024 INP replaced the older metric FID as a Core Web Vital and measures responsiveness across the whole session, not only on the first interaction. The biggest levers are usually unspectacular. Images at the right size and in a modern format, less and smaller JavaScript, server-side or edge rendering instead of heavy client logic, and reserving space for elements that load later to prevent layout shifts.

The data basis matters. Lab values from a test show the potential under controlled conditions, field data from real visits show the actual experience. For an assessment the field data count, because they reflect real devices and networks.

How These Building Blocks Work Together

Technical SEO is not a one-off project but a state that is maintained. Crawlability opens the access, indexing signals determine inclusion, sitemap and internal links steer discovery, structured data sharpens understanding, canonical and hreflang bring order to duplicates and languages, and load time improves the position when content is otherwise equal.

The most common mistake is not a missing detail but a contradiction between signals. A page where robots.txt, noindex, canonical, sitemap and internal links make the same statement is unambiguous to a search engine. Unambiguity is the real ranking advantage here.

For a concrete inventory, an existing site can be checked against exactly these points. A technical SEO audit surfaces contradictions and blocked areas before they cost visibility. We clarify the starting point in a conversation about the project.

How cleanly is a website set up on the technical side? We check crawlability, indexing and load time and point out concrete starting points.