---
title: "What is Markdown and why LLMs prefer it over HTML"
description: "What markdown is, why AI systems prefer it over HTML, and how serving your content as clean markdown makes your website visible to ChatGPT, Claude, and Perplexity."
image: https://www.mo.agency/hubfs/02%20-%20MO%20-%20Blogs%202026%20-%20What%20is%20markdown%20and%20why%20LLMs%20prefer%20it%20over%20HTML%20-%20V2.png
canonical: https://www.mo.agency/blog/what-is-markdown-why-llms-prefer-it
url: https://ai.mo.agency/blog/what-is-markdown-why-llms-prefer-it.md
last_converted: 2026-05-25T21:14:17.492Z
---

```json
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "BlogPosting",
      "headline": "What is Markdown and why LLMs prefer it over HTML",
      "description": "What markdown is, why AI systems prefer it over HTML, and how serving your content as clean markdown makes your website visible to ChatGPT, Claude, and Perplexity.",
      "url": "https://www.mo.agency/blog/what-is-markdown-why-llms-prefer-it",
      "datePublished": "2026-04-15T13:01:01+02:00",
      "dateModified": "2026-04-20T10:09:21+02:00",
      "image": "https://2697939.fs1.hubspotusercontent-na1.net/hubfs/2697939/02%20-%20MO%20-%20Blogs%202026%20-%20What%20is%20markdown%20and%20why%20LLMs%20prefer%20it%20over%20HTML%20-%20V2.png",
      "author": {
        "@type": "Person",
        "name": "Luke Marthinusen"
      },
      "publisher": {
        "@type": "Organization",
        "name": "MO Agency",
        "logo": {
          "@type": "ImageObject",
          "url": "https://www.mo.agency/hubfs/MO%20-%20Logo%20-%20Dark%20Blue.svg"
        }
      },
      "mainEntityOfPage": {
        "@type": "WebPage",
        "@id": "https://www.mo.agency/blog/what-is-markdown-why-llms-prefer-it"
      }
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://www.mo.agency/"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Blog",
          "item": "https://www.mo.agency/blog"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "What Is Markdown Why Llms Prefer It"
        }
      ]
    }
  ]
}
```

[Artificial Intelligence](https://www.mo.agency/blog/topic/artificial-intelligence)

# What is Markdown and why LLMs prefer it over HTML

Apr 15, 2026

·

![Luke Marthinusen](https://www.mo.agency/hs-fs/hubfs/MO%20-%20New%20Profile%20Picture%20Designs%20-%20Luke%20-%2020240528.png?width=36&height=36&name=MO%20-%20New%20Profile%20Picture%20Designs%20-%20Luke%20-%2020240528.png)

Luke Marthinusen

![what is markdown or .md](https://www.mo.agency/hs-fs/hubfs/02%20-%20MO%20-%20Blogs%202026%20-%20What%20is%20markdown%20and%20why%20LLMs%20prefer%20it%20over%20HTML%20-%20V2.png?width=1200&height=600&name=02%20-%20MO%20-%20Blogs%202026%20-%20What%20is%20markdown%20and%20why%20LLMs%20prefer%20it%20over%20HTML%20-%20V2.png)

Share

If you work in marketing, content, or business, you may have heard that AI systems prefer markdown. But what does that actually mean? What is markdown, why do large language models prefer it, and what does it look like in practice?

This article explains markdown from the ground up - no developer background required - and makes the case for why it's the most important format shift happening in web content today.

## What markdown actually is

Markdown is a lightweight text formatting language created by John Gruber in 2004. It uses simple characters to indicate structure:

- `#` for a heading

- `##` for a subheading

- `**text**` for bold

- `-` for a bullet point

- `link text` for a hyperlink

That's essentially it. Markdown is designed to be readable as plain text *and* parseable by machines. You've probably already used it without knowing - it's the formatting system behind GitHub, Notion, Slack messages, Reddit posts, and most developer documentation.

Here's a paragraph in HTML:

```


    Our services
    We provide end-to-end
    HubSpot implementation for growth companies.



```

Here's the same content in markdown:

```

## Our services

We provide end-to-end HubSpot implementation for growth companies.
```

Same information. One tenth of the characters. Zero ambiguity about structure.

## Why LLMs prefer markdown: the economics

Large language models process text in units called tokens. Every token costs compute - processing power, memory, electricity. When a model like ChatGPT or Claude evaluates a web page to decide whether to cite it in an answer, every wasted token is money spent on noise instead of signal.

A typical CMS page - whether it's HubSpot, WordPress, Webflow, or Shopify - wraps your actual content in layers of HTML that serve the visual layout:

| Component | Typical token cost |
| --- | --- |
| Navigation menus | 800 – 2,000 tokens |
| Footer with links and scripts | 500 – 1,500 tokens |
| CSS classes and data attributes | 2,000 – 4,000 tokens |
| SVG icons and inline styles | 500 – 2,000 tokens |
| Nested div structures | 1,000 – 3,000 tokens |
| Your actual content | 2,000 – 4,000 tokens |
| Total HTML page | ~16,000 tokens |

The same content in clean markdown: roughly 3,000 tokens. That's an 80% reduction.

For an AI system evaluating millions of pages to answer a query, this difference is enormous. A page that delivers pure content in 3,000 tokens wins over one that buries the same content in 16,000 tokens of layout chrome. It's faster to process, cheaper to consume, and clearer to understand.

## Why LLMs prefer markdown: signal clarity

Beyond raw efficiency, markdown gives AI systems clearer structural signals.

In HTML, a heading might be any of these:

```
Our services
Our services
Our services
Our services

```

An AI system has to *infer* that all four of these are headings - using CSS class names, inline styles, or surrounding context as clues.

In markdown, a heading is always explicit:

```

## Our services
```

There's no ambiguity. The `##` means "this is a second-level heading." Period. Lists are always `-` or `1.`. Links are always `text`. Bold is always `**text**`. The structure is semantic by default.

This matters because AI systems extract information more confidently from clearly structured content. The easier your page is to parse, the more likely it appears in AI summaries and the more accurately it's represented.

## What a .md file actually looks like

When your website serves a page as a `.md` file, it looks like this:

```
---
title: "HubSpot Solutions"
description: "Migration, implementation, integrations, support,
and rescue & rehab — everything you need to maximise HubSpot."
canonical: https://www.mo.agency/solutions/hubspot
url: https://ai.mo.agency/solutions/hubspot.md
last_converted: 2026-04-03T09:05:42.573Z
---
```

That block at the top is called YAML frontmatter. It's metadata that the AI system reads *before* the content itself:

- **title** tells the AI what the page is about before reading a single paragraph

- **description** provides context

- **canonical** points back to the original HTML page - this is not cloaking; the relationship is explicit

- **url** is the markdown endpoint itself

- **last_converted** is a freshness signal showing when the content was last updated

Below the frontmatter is the actual content - clean markdown with headings, paragraphs, links, and lists. No navigation, no scripts, no tracking pixels, no nested divs. Just content.

## How the discovery mechanism works

Your HTML pages can tell AI agents that a markdown version exists using a standard HTML tag called a discovery tag:

```

```

This goes in the section of every page on your site. It tells AI agents: "there's a clean markdown version of this page available at this URL."

This isn't new technology. It's the same mechanism that's been used for RSS feed discovery since 2003. It's a [proven web standard](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel#alternate) applied to a new use case. When an AI agent reads your HTML page, sees the alternate link, and fetches the markdown version instead, it gets clean content at a fraction of the token cost.

The discovery tag works alongside your [llms.txt file](https://www.mo.agency/blog/what-is-llms-txt), which provides a site-level index. Together, they create a complete discovery layer: llms.txt tells the AI what your site contains, and discovery tags on each page tell it where to find the markdown version.

## The per-page advantage

A single llms.txt file at your site root is a good start - it gives AI agents a curated overview. But the real depth comes from [per-page .md files](https://www.mo.agency/blog/per-page-markdown-files-gold-standard-ai-readability) that deliver the full content of every page in clean markdown.

Each page on your site gets its own `.md` endpoint. Your homepage becomes `/index.md`. Your about page becomes `/about.md`. Your blog post about HubSpot terminology becomes `/blog/hubspot-terminology.md`. Every page, always available, always current.

This is the gold standard for AI readability. And it goes far deeper than a static llms.txt file ever can.

## What this means for your website

The shift to AI-readable content isn't theoretical. It's happening now. AI systems already make retrieval decisions millions of times per day, and the pages that are easiest to consume - clean, structured, token-efficient - get an advantage.

Making your website AI-readable starts with understanding that markdown is the format AI systems want. Everything else - [llms.txt](https://www.mo.agency/blog/what-is-llms-txt), [per-page .md files](https://www.mo.agency/blog/per-page-markdown-files-gold-standard-ai-readability), [discovery tags](https://www.mo.agency/blog/per-page-markdown-files-gold-standard-ai-readability), [unblocked crawlers](https://www.mo.agency/blog/robots-txt-ai-audit) - is about delivering your content in that format and making sure AI agents can find it.

The good news: you don't have to rebuild your website. Your HTML site continues serving human visitors as it always has. The markdown layer sits alongside it - a parallel version of your content optimised for AI consumption. Tools like [GetMD.ai](https://www.getmd.ai) create this layer automatically, converting your pages on the fly and serving them from a dedicated subdomain.

But whether you use a tool or build your own pipeline, the principle is the same: give AI systems the clean, structured, token-efficient content they need to understand, process, and cite your work.

---

*This article is part of our series on [making your website AI-readable](https://www.mo.agency/blog/making-your-website-ai-readable). Next: [What is llms.txt?](https://www.mo.agency/blog/what-is-llms-txt) · [Per-page .md files](https://www.mo.agency/blog/per-page-markdown-files-gold-standard-ai-readability) · [The robots.txt audit](https://www.mo.agency/blog/robots-txt-ai-audit) · [Content structure for AI](https://www.mo.agency/blog/how-to-structure-content-for-ai-citation) · [Content Signals](https://www.mo.agency/blog/content-signals-ai-content-governance) · [How to track LLM indexing](https://www.mo.agency/blog/how-to-know-pages-llm-indexed)*

[Scroll to top](#top-banner)