Typora中LateX的Pandoc渲染与重编译

 2025/11/25 

渲染环境

格式：Markdown采用Typora编译格式，在source中的.md文件优先保证Typora渲染的有效性
插件版本
- hexo: 8.1.1
- hexo-cli: 4.3.2
- hexo-renderer-pandoc@0.5.0

## 行内公式

我们首先要明白的是，Pandoc的渲染支持非常的蛋疼，某些公式即使不严格遵从我们下述的条件也是能够完美渲染的，我们只是总结了实践中基本上大多数时候都能渲染成功的一种模式，从而将Typora能够渲染的格式替换为Pandoc能够大概率渲染成功的模式。

渲染格式

Pandoc对于Typora格式的行内公式渲染是正常的，但是需要保证美元符号 $ $ 与内部的代码直接没有空格（实测表明，添加了空格会让简单的公式渲染得到比较好的结果，也不用更改样式，但是无法渲染复杂公式），例如：

$ u_t $
$u_t $
$ u_t$
$u_t$

以上四种格式对于简单公式都能够正常渲染，但是前三种不支持复杂公式，例如当前的渲染效果如下：

$ u_t $

$u_t $

$ u_t$

u_t

复杂一些的公式的渲染效果：

$ p(u_t|_t) $

$p(u_t|_t) $

$ p(u_t|_t)$

p(u_t|x_t)

因此我们采用第四种格式用于渲染

渲染样式

采用第四种格式，通过Pandoc渲染后，Hexo会默认使得行内公式的字体与正文相同，例如：

$u_t$

渲染得到的效果是这样的：$ u_t $

这个字体并不好看，为了得到我们想要的斜体，我们还需要对公式进行手动斜体，加粗也是同理，在我们的实测中，如果采用了\boldsymbol加粗，就无需再使用斜体，例如：

1	$\\\textit{p}(\\\textit{u}_t\|\\\boldsymbol{x}_t)$

值得一提的是，所有的样式代码，以及特殊符号，简而言之，所有带有\的部分，我们全部对其进行转义，这样虽然很麻烦，但是一定不会出错。经过实测，具有上下标的内容只需要对_前面的内容添加斜体、加粗等样式即可，以上内容渲染得到的结果为：

$\\\textit{p}(\\\textit{u}_t|\\\boldsymbol{x}_t,\\\boldsymbol{y}^*)$

对于更加复杂的行内公式也可以支持：

$\\\left(\\\dot{\\\textit{f}}(t)-\\\frac{\\\textit{f}(\\\textit{t})\\\dot{\\\textit{g}}(\\\textit{t})}{\\\textit{g}(\\\textit{t})}\\\right) \\\underbrace{\\\int \\\textit{x}_1 \\\textit{p}_t(\\\textit{x}_1|\\\textit{x})\\\textit{dx}_1}_{\\\mathbb{E}[\\\textit{x}_1|\\\textit{x}]} + \\\frac{\\\dot{\\\textit{g}}(\\\textit{t})}{\\\textit{g}(\\\textit{t})}\\\textit{x} \\\underbrace{\\\int \\\textit{p}_t(\\\textit{x}_1|\\\textit{x})\\\textit{dx}_1}_{1}$

这里的公式：

$\\\left(\\\dot{\\\textit{f}}(t)-\\\frac{\\\textit{f}(\\\textit{t})\\\dot{\\\textit{g}}(\\\textit{t})}{\\\textit{g}(\\\textit{t})}\\\right) \\\underbrace{\\\int \\\textit{x}_1 \\\textit{p}_t(\\\textit{x}_1|\\\textit{x})\\\textit{dx}_1}_{\\\mathbb{E}[\\\textit{x}_1|\\\textit{x}]} + \\\frac{\\\dot{\\\textit{g}}(\\\textit{t})}{\\\textit{g}(\\\textit{t})}\\\textit{x} \\\underbrace{\\\int \\\textit{p}_t(\\\textit{x}_1|\\\textit{x})\\\textit{dx}_1}_{1}$

这里最恶心的点在于，斜体不能直接在最外层给，否则编译不出来，只能一个字母一个字母给

正则表达式

综上所述的几条规律，我们能够设计出一条通用的针对行内公式的正则化编译：

格式审查

消除所有美元符号前后的空格，将所有 $ | $ 和全部替换为$
字母抓取
- 这一步我们要提取所有公式中的字母部分（非特殊符号部分）以为其添加斜体，首先我们忽略所有\\后接任意个数字母的形式\\[a-z]+，在这其中，我们还要排除掉\\boldsymbol\{[a-z]+\}的形式。此时，公式中只剩下了字母，数字和符号，然后我们忽略掉数字和符号，其中，\_[a-zA-Z]+这样的形式也要忽略，因为我们不希望对下划线后的部分添加斜体（会报错，因为需要花括号），最后直接在其中直接提取字母[a-z]+就可以了。
字母斜体的添加
- 提取除了加粗字母以外的字母之后，我们就可以对字母进行斜体添加了，将这些部分替换为\\\\\\textit{·}即可
转义
- 这个很简单，将\\替换为\\\\\\即可

行间公式

经过我们的测试，行间公式只有在出现\label \notag \quad时可能会出问题，需要转义，直接全部替换即可

下面给出我们的替换代码，这里为什么用Python没用C++，因为实验室这台电脑疑似没有配C++环境，所以我就先拿Python应急处理了一下。

import os
import re

class Compiler:
    def __init__(self, path):
        """
        :str path: path of markdowns to be compiled
        """
        self.path = path

    @property
    def md_file_paths(self):
        return [
            os.path.join(root, file)
            for root, _, files in os.walk(self.path)
            for file in files
            if file.lower().endswith(".md")
        ]

    def rectify(self, content):
        """Remove spaces around $ unless inside backticks"""
        return re.sub(r'(?<!`)\$ | \$ (?!`)', '$', content)

    def readMarkdown(self, md_file_path):
        with open(md_file_path, "r", encoding="utf-8") as f:
            content = f.read()
        return self.rectify(content)

    def getSingleEquations(self, markdown_content):
        """Return list of inline equations"""
        cleaned = re.sub(r'`.*?`', '', markdown_content, flags=re.DOTALL)
        return re.findall(r'(?<!\$)\$(?!\$)(.*?)(?<!\$)\$(?!\$)', cleaned, flags=re.DOTALL)

    def innerExpressionCompile(self, expression, backslash_replacement="\\\\\\"):
        """Compile inline math expression"""
        regexEscape = r"(\\boldsymbol\{[^\}]+\}|\\[a-zA-Z]+)"
        escape_iter = list(re.finditer(regexEscape, expression))
        escape_spans = [(m.start(), m.end(), m.group()) for m in escape_iter]

        regexSub = r"_[a-zA-Z]+"
        sub_spans = []
        for m in re.finditer(regexSub, expression):
            s, e = m.start(), m.end()
            if not any(es <= s < ee for es, ee, _ in escape_spans):
                sub_spans.append((s, e, m.group()))

        regexOther = r"[^\\\sA-Za-z_]"
        other_spans = []
        for m in re.finditer(regexOther, expression):
            s, e = m.start(), m.end()
            if not any(es <= s < ee for es, ee, _ in escape_spans) and \
               not any(ss <= s < se for ss, se, _ in sub_spans):
                other_spans.append((s, e, m.group()))

        regexLetters = r"[a-zA-Z]+"
        taken = set()
        for s, e, _ in escape_spans + other_spans:
            taken.update(range(s, e))
        letters_spans = []
        for m in re.finditer(regexLetters, expression):
            s, e = m.start(), m.end()
            if all(i not in taken for i in range(s, e)):
                letters_spans.append((s, e, m.group()))
                taken.update(range(s, e))

        all_spans = []
        for s, e, t in escape_spans:
            all_spans.append((s, e, 'escape', t))
        for s, e, t in sub_spans:
            all_spans.append((s, e, 'sub', t))
        for s, e, t in other_spans:
            all_spans.append((s, e, 'other', t))
        for s, e, t in letters_spans:
            all_spans.append((s, e, 'letters', t))
        all_spans.sort(key=lambda x: x[0])

        out_parts = []
        idx = 0
        span_idx = 0
        n = len(expression)
        while idx < n:
            if span_idx < len(all_spans) and all_spans[span_idx][0] == idx:
                s, e, typ, text = all_spans[span_idx]
                if typ == 'escape':
                    out_parts.append(text.replace("\\", backslash_replacement))
                elif typ == 'sub' or typ == 'other':
                    out_parts.append(text)
                else:
                    out_parts.append("\\\\\\textit{" + text + "}")
                idx = e
                span_idx += 1
            else:
                out_parts.append(expression[idx])
                idx += 1

        return "".join(out_parts)

    def compile(self):
        md_paths = self.md_file_paths
        output_dir = os.path.join(os.path.dirname(self.path), "output")
        os.makedirs(output_dir, exist_ok=True)

        for md_path in md_paths:
            content = self.readMarkdown(md_path)

            
            # Inline equations
            inline_eqs = self.getSingleEquations(content)
            for eq in inline_eqs:
                compiled = self.innerExpressionCompile(eq)
                content = content.replace(f"${eq}$", f"${compiled}$")
            
            content = content.replace(r'\label', r'\\\label')
            content = content.replace(r'\quad', r'\\\quad')
            content = content.replace(r'\notag', r'\\\notag')
            
            
            # Write output
            out_path = os.path.join(output_dir, os.path.basename(md_path))
            with open(out_path, "w", encoding="utf-8") as f:
                f.write(content)

        return output_dir


# -------------------------------
# Example usage
# -------------------------------
if __name__ == "__main__":
    compiler = Compiler("source")
    out_dir = compiler.compile()
    print("Compiled markdown output to:", out_dir)

语法规则

Typora能够编译的一般情况都不会出问题
在行内公式中尽量不要使用，尽量手动打tab，如果用了，可能需要你把转义字符更改回来
label和tag最好写在公式equation与aligned之间
使用aligned而不是align
equation一定要打，如果有aligned，也要在外层包裹equation
尽量不要换出空行，保持紧凑
以下是我们的测试代码，编译结果放在下一篇博客中

Thus we can provide a posterior distribution merely dependent on the joint distribution $p(\boldsymbol{u}_t,\boldsymbol{v}_t)$, which can be sampled easily and precisely. Next, we consider how to formulate the mapping from the joint distribution to the posterior distribution. Typically used methods are Bayesian estimation or classifier-free guidance. Inevitably, the maximum posterior estimation required for Bayesian estimation is straightly upon with the prior knowledge of different tasks, while the classifier-free guidance is related to the quality of the representation extracted from joint distribution, which fails to accurately extract features from the source images cause its high correlation with noise standard in diffusion process. Fortunately, benefit from the analytical nature of our $p(\boldsymbol{u}_t, \boldsymbol{v}_t)$, it's possible to provide a close-form of posterior distribution. Likewise, we should employ the previously introduced neighborhood condition in this step. Firstly, according to Bayes' theorem

$$
\begin{equation}
\begin{aligned}
&\nabla_{\boldsymbol{x_t}} \log p(\boldsymbol{x}_t|\hat{\boldsymbol{u}}_t,\hat{\boldsymbol{v}}_t) \notag\\
&= \nabla_{\boldsymbol{x_t}} \log p(\boldsymbol{x}_t) + \nabla_{\boldsymbol{x_t}} \log p(\hat{\boldsymbol{u}}_t,\hat{\boldsymbol{v}}_t|\boldsymbol{x}_t),
\end{aligned}
\end{equation}
$$

it is worth noting that in the neighborhood $\Theta$, $x_t, \hat{\boldsymbol{u}}_t, \hat{\boldsymbol{v}}_t$ can be approximate expressed as

$$
\begin{equation}
\begin{aligned}
&\nabla_{\boldsymbol{x}_t} \log p(\hat{\boldsymbol{u}}_t, \hat{\boldsymbol{v}}_t|\boldsymbol{x}_t) \\
&=\nabla_{\boldsymbol{x}_t + \boldsymbol{\delta}} \log p(\hat{\boldsymbol{u}}_t, \hat{\boldsymbol{v}}_t|\boldsymbol{x}_t + \boldsymbol{\delta})\notag\\
&= \nabla_{\hat{\boldsymbol{u}}_t} \log p(\hat{\boldsymbol{u}}_t|\hat{\boldsymbol{u}}_t) + \nabla_{\hat{\boldsymbol{v}}_t} \log p(\hat{\boldsymbol{v}}_t|\hat{\boldsymbol{v}}_t)\notag\\
&= \nabla_{\hat{\boldsymbol{u}}_t} \log p(\hat{\boldsymbol{u}}_t) + \nabla_{\hat{\boldsymbol{v}}_t} \log p(\hat{\boldsymbol{v}}_t),
\end{aligned}
\label{neighbor}
\end{equation}
$$

then

$$
\begin{equation}
\begin{aligned}
&\nabla_{\boldsymbol{x_t}} \log p(\boldsymbol{x}_t|\hat{\boldsymbol{u}}_t,\hat{\boldsymbol{v}}_t) \notag\\
&= \nabla_{\boldsymbol{x_t}} \log p(\boldsymbol{x}_t) + \nabla_{\hat{\boldsymbol{u}}_t} \log p(\hat{\boldsymbol{u}}_t) + \nabla_{\hat{\boldsymbol{v}}_t} \log p(\hat{\boldsymbol{v}}_t).
\end{aligned}
\label{fsrvs}
\end{equation}
$$

Consequently, a closed-form posterior distribution can be derived, relying only on the unconditional generative distribution and the source image distributions along the forward diffusion process. The unconditional distribution is obtained via a pre-trained DDPM, whereas the source image distributions are computed directly from the forward process.