如何确定文本字符串是否显示为命名html标记的子级

2022-03-23 00:00:00 php preg-replace

在下面的doReplace函数中，如何确定$keyword的实例不是从关键字出现在内容中的替换点开始的命名html标记数组(h1、h2、h3、h4、h5、h6、b、u、i等)的子级？此时我不想检查嵌套标记。

我认为deReplace函数内部会涉及一些递归。

function doReplace($keyword)
{
 //if(!is_keyword_in_named_tag())
    return ' <b>'.trim($keyword).'</b>';
}

function init()
{
    $content = "This will be some xhtml formatted 
    content that will be resident on the page in memory";
    $theContent = 
      preg_replace_callback("/('my test string')/i","doReplace", $content);
    return $theContent;
}

因此，如果$Content变量包含.

<h1>This is my test string</h1>

那么字符串"My test string"将不会被替换。

但如果#content变量包含.

<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>

那么替换的内容将是.

<h1>This is my test string</h1>
<div>This is <b>my test string</b> too <b>my test string 3</b></div>

解决方案

尝试使用DOMDocument和DOMXPath：

<?php

function doReplace($html)
{
    $dom = new DOMDocument();
    // loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
    $dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

    $xpath = new DOMXPath($dom);

    foreach ($xpath->query('//text()[
        not(ancestor::h1) and
        not(ancestor::h2) and
        not(ancestor::h3) and
        not(ancestor::h4) and
        not(ancestor::h5) and
        not(ancestor::h6) and
        not(ancestor::b) and
        not(ancestor::u) and
        not(ancestor::i)
        ]') as $node)
    {
        $replaced = str_ireplace('my test string', '<b>my test string</b>', $node->wholeText);
        $newNode = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }

    // get only the body tag with its contents, then trim the body tag itself to get only the original content
    echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}

$html = '<h1>This is my test string</h1>
<h2><span>Nested my test string</span></h2>
<div>This is my test string too <b>my test string 3</b></div>';

echo doReplace($html);

相关文章