如何确定文本字符串是否显示为命名html标记的子级
在下面的doReplace函数中,如何确定$keyword的实例不是从关键字出现在内容中的替换点开始的命名html标记数组(h1、h2、h3、h4、h5、h6、b、u、i等)的子级?此时我不想检查嵌套标记。
我认为deReplace函数内部会涉及一些递归。
function doReplace($keyword)
{
//if(!is_keyword_in_named_tag())
return ' <b>'.trim($keyword).'</b>';
}
function init()
{
$content = "This will be some xhtml formatted
content that will be resident on the page in memory";
$theContent =
preg_replace_callback("/('my test string')/i","doReplace", $content);
return $theContent;
}
因此,如果$Content变量包含.
<h1>This is my test string</h1>
那么字符串"My test string"将不会被替换。
但如果#content变量包含.
<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>
那么替换的内容将是.
<h1>This is my test string</h1>
<div>This is <b>my test string</b> too <b>my test string 3</b></div>
解决方案
尝试使用DOMDocument和DOMXPath:
<?php
function doReplace($html)
{
$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//text()[
not(ancestor::h1) and
not(ancestor::h2) and
not(ancestor::h3) and
not(ancestor::h4) and
not(ancestor::h5) and
not(ancestor::h6) and
not(ancestor::b) and
not(ancestor::u) and
not(ancestor::i)
]') as $node)
{
$replaced = str_ireplace('my test string', '<b>my test string</b>', $node->wholeText);
$newNode = $dom->createDocumentFragment();
$newNode->appendXML($replaced);
$node->parentNode->replaceChild($newNode, $node);
}
// get only the body tag with its contents, then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}
$html = '<h1>This is my test string</h1>
<h2><span>Nested my test string</span></h2>
<div>This is my test string too <b>my test string 3</b></div>';
echo doReplace($html);
相关文章