TechHUB - 技术文章 - 网页编程 - PHP技术 - 使用 PHP 的 DOM 方法来遍历和获取 HTML 中的 table 元素

首页技术教程实用代码注册破解正则表达式网址导航电子书籍

技术文章 >> 网页编程 >> PHP技术

≡ 分类 ≡

≡ 推荐 ≡

≡ 热点 ≡

<

>

使用 PHP 的 DOM 方法来遍历和获取 HTML 中的 table 元素

作者：未知， 来源：网络， 阅读：102， 发布时间：2025-04-11

方法 1：使用 `getElementsByTagName` 获取所有 `<table>`

$html = <<<HTML
<div id="out">
    <span id="oddsTable"></span>
    <table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第一个表格...</table>
    <table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第二个表格...</table>
</div>
HTML;

$dom = new DOMDocument();
libxml_use_internal_errors(true); // 忽略 HTML 解析错误
$dom->loadHTML($html);
libxml_clear_errors();

// 获取所有 table 元素
$tables = $dom->getElementsByTagName('table');

foreach ($tables as $table) {
    echo $dom->saveHTML($table) . "\n";
}

输出：

<table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第一个表格...</table>
<table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第二个表格...</table>

方法 2：获取 `div#out` 下的所有 `<table>`

如果只想获取 div#out 下的 table，可以这样：

$dom = new DOMDocument();
$dom->loadHTML($html);

$outDiv = $dom->getElementById('out'); // 获取 id="out" 的 div

if ($outDiv) {
    $tables = $outDiv->getElementsByTagName('table');
    foreach ($tables as $table) {
        echo $dom->saveHTML($table) . "\n";
    }
}

方法 3：遍历子节点查找 `<table>`

如果 DOM 结构更复杂，可以递归遍历子节点：

function findTables(DOMNode $node) {
    $tables = [];
    foreach ($node->childNodes as $child) {
        if ($child instanceof DOMElement) {
            if ($child->tagName === 'table') {
                $tables[] = $child;
            }
            // 递归查找子节点
            $tables = array_merge($tables, findTables($child));
        }
    }
    return $tables;
}

$dom = new DOMDocument();
$dom->loadHTML($html);

$tables = findTables($dom);
foreach ($tables as $table) {
    echo $dom->saveHTML($table) . "\n";
}

总结

方法	适用场景	优点	缺点
`getElementsByTagName`	获取所有 `<table>`	简单直接	无法限定父级
`getElementById + getElementsByTagName`	获取特定容器下的 `<table>`	更精准	需要知道父级 ID
递归遍历	复杂 DOM 结构	灵活可控	代码稍复杂

使用 PHP 的 DOM 和 XPath 从给定的 HTML 中获取 table 元素

// 加载 HTML 内容
$html = '<div id="out">
        <span id="oddsTable"></span>
        <table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第一个表格内容...</table>
        <table cellpadding="0" cellspacing="0" border="0" width="900" align="center">第二个表格内容...</table>
</div>';

// 创建 DOMDocument 对象
$dom = new DOMDocument();
libxml_use_internal_errors(true); // 禁止显示因HTML不规范而产生的警告
$dom->loadHTML($html);
libxml_clear_errors();

// 创建 XPath 对象
$xpath = new DOMXPath($dom);

// 方法1: 获取所有 table 元素
$tables = $xpath->query('//table');
foreach ($tables as $table) {
    echo $dom->saveHTML($table) . "\n";
}

// 方法2: 获取 div#out 下的所有 table 元素
$tables = $xpath->query('//div[@id="out"]/table');
foreach ($tables as $table) {
    echo $dom->saveHTML($table) . "\n";
}

// 方法3: 获取特定属性的 table 元素
$tables = $xpath->query('//table[@width="900" and @align="center"]');
foreach ($tables as $table) {
    echo $dom->saveHTML($table) . "\n";
}

说明

加载 HTML：首先将 HTML 内容加载到 DOMDocument 对象中。
创建 XPath：使用 DOMXPath 来查询 DOM 文档。
查询方法：
- //table：获取文档中所有 table 元素
- //div[@id="out"]/table：获取 id 为 "out" 的 div 下的直接 table 子元素
- //table[@width="900" and @align="center"]：获取具有特定属性(width=900 且 align=center)的 table 元素
处理结果：使用 foreach 遍历查询结果，并使用 saveHTML 方法输出每个 table 的 HTML 内容。

注意事项

如果 HTML 不规范，建议使用 libxml_use_internal_errors(true) 来抑制警告
查询后记得使用 libxml_clear_errors() 清除错误
可以根据需要调整 XPath 表达式来精确获取你需要的 table 元素

标签：DOM 遍历

以下是用户评论查看全部评论

使用 PHP 的 DOM 方法来遍历和获取 HTML 中的 table 元素

方法 1：使用 getElementsByTagName 获取所有 <table>

方法 2：获取 div#out 下的所有 <table>

方法 3：遍历子节点查找 <table>

总结

使用 PHP 的 DOM 和 XPath 从给定的 HTML 中获取 table 元素

说明

注意事项

方法 1：使用 `getElementsByTagName` 获取所有 `<table>`

方法 2：获取 `div#out` 下的所有 `<table>`

方法 3：遍历子节点查找 `<table>`