|
|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
引言
XML(可扩展标记语言)作为一种通用的数据交换格式,在现代软件开发中扮演着至关重要的角色。无论是配置文件、数据传输还是文档存储,XML都提供了一种结构化、自描述的方式来表示信息。而XML DOM(文档对象模型)则是处理XML文档的标准接口,它将XML文档表示为一个树形结构,允许开发者通过编程语言动态访问和修改文档的内容、结构和样式。
掌握XML DOM属性获取技术对于开发者来说至关重要,它不仅能够帮助我们高效地操作文档节点,还能解决复杂的数据访问难题。本文将从基础语法到高级应用,全面解析XML DOM属性获取技术,帮助开发者深入理解并灵活运用这一技术。
XML DOM基础概念
什么是XML DOM
XML DOM(Document Object Model)是一个与平台和语言无关的接口,它允许程序和脚本动态地访问和更新XML文档的内容、结构和样式。DOM将XML文档表示为一个树形结构,其中每个节点代表文档中的一个部分(如元素、属性、文本等)。
DOM树结构
在DOM中,XML文档被表示为一个层次结构的树,包含以下几种主要节点类型:
1. 文档节点(Document):整个XML文档的根节点
2. 元素节点(Element):表示XML元素
3. 属性节点(Attribute):表示元素的属性
4. 文本节点(Text):表示元素或属性中的文本内容
5. 注释节点(Comment):表示XML注释
6. 处理指令节点(Processing Instruction):表示XML处理指令
例如,对于以下XML文档:
- <?xml version="1.0" encoding="UTF-8"?>
- <bookstore>
- <book category="fiction">
- <title lang="en">Harry Potter</title>
- <author>J.K. Rowling</author>
- <year>2005</year>
- <price>29.99</price>
- </book>
- <book category="children">
- <title lang="en">The Wonderful Wizard of Oz</title>
- <author>L. Frank Baum</author>
- <year>1900</year>
- <price>15.99</price>
- </book>
- </bookstore>
复制代码
其对应的DOM树结构如下:
- Document
- └── Element: bookstore
- ├── Element: book (attribute: category="fiction")
- │ ├── Element: title (attribute: lang="en")
- │ │ └── Text: Harry Potter
- │ ├── Element: author
- │ │ └── Text: J.K. Rowling
- │ ├── Element: year
- │ │ └── Text: 2005
- │ └── Element: price
- │ └── Text: 29.99
- └── Element: book (attribute: category="children")
- ├── Element: title (attribute: lang="en")
- │ └── Text: The Wonderful Wizard of Oz
- ├── Element: author
- │ └── Text: L. Frank Baum
- ├── Element: year
- │ └── Text: 1900
- └── Element: price
- └── Text: 15.99
复制代码
DOM接口的基本组成
DOM接口由多个部分组成,主要包括:
1. Core DOM:定义了所有文档类型共用的基本接口
2. XML DOM:定义了专门针对XML文档的接口
3. HTML DOM:定义了专门针对HTML文档的接口
在本文中,我们主要关注XML DOM,它提供了处理XML文档的特定方法和属性。
DOM属性获取的基础语法
获取DOM对象
在开始操作XML DOM之前,首先需要获取DOM对象。不同的编程语言有不同的方式来加载XML文档并创建DOM对象。以下是几种常见语言的示例:
- // 在浏览器环境中
- let parser = new DOMParser();
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
- // 或者加载XML文件
- let xhttp = new XMLHttpRequest();
- xhttp.onreadystatechange = function() {
- if (this.readyState == 4 && this.status == 200) {
- let xmlDoc = this.responseXML;
- // 操作DOM
- }
- };
- xhttp.open("GET", "books.xml", true);
- xhttp.send();
复制代码- import javax.xml.parsers.DocumentBuilder;
- import javax.xml.parsers.DocumentBuilderFactory;
- import org.w3c.dom.Document;
- import java.io.File;
- DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
- DocumentBuilder builder = factory.newDocumentBuilder();
- Document document = builder.parse(new File("books.xml"));
复制代码- from xml.dom.minidom import parse
- # 解析XML文件
- dom = parse("books.xml")
- # 或者从字符串解析
- from xml.dom.minidom import parseString
- dom = parseString(xmlString)
复制代码- using System.Xml;
- // 加载XML文件
- XmlDocument xmlDoc = new XmlDocument();
- xmlDoc.Load("books.xml");
- // 或者从字符串加载
- xmlDoc.LoadXml(xmlString);
复制代码
基本属性访问
一旦获得了DOM对象,就可以开始访问和操作文档的属性。以下是一些基本的属性访问方法:
- // JavaScript
- let rootElement = xmlDoc.documentElement;
复制代码- // Java
- Element rootElement = document.getDocumentElement();
复制代码- # Python
- root_element = dom.documentElement
复制代码- // C#
- XmlElement rootElement = xmlDoc.DocumentElement;
复制代码- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- let category = bookElement.getAttribute("category");
复制代码- // Java
- NodeList bookList = document.getElementsByTagName("book");
- Element bookElement = (Element) bookList.item(0);
- String category = bookElement.getAttribute("category");
复制代码- # Python
- book_elements = dom.getElementsByTagName("book")
- book_element = book_elements[0]
- category = book_element.getAttribute("category")
复制代码- // C#
- XmlNodeList bookList = xmlDoc.GetElementsByTagName("book");
- XmlElement bookElement = (XmlElement)bookList[0];
- string category = bookElement.GetAttribute("category");
复制代码- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- if (bookElement.hasAttribute("category")) {
- console.log("Category attribute exists");
- }
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- if (bookElement.hasAttribute("category")) {
- System.out.println("Category attribute exists");
- }
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- if book_element.hasAttribute("category"):
- print("Category attribute exists")
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- if (bookElement.HasAttribute("category"))
- {
- Console.WriteLine("Category attribute exists");
- }
复制代码
常用DOM属性和方法
获取属性值
getAttribute方法是最常用的获取属性值的方法,它接受属性名作为参数,返回对应的属性值。
- // JavaScript
- let titleElement = xmlDoc.getElementsByTagName("title")[0];
- let lang = titleElement.getAttribute("lang");
- console.log(lang); // 输出: en
复制代码- // Java
- Element titleElement = (Element) document.getElementsByTagName("title").item(0);
- String lang = titleElement.getAttribute("lang");
- System.out.println(lang); // 输出: en
复制代码- # Python
- title_element = dom.getElementsByTagName("title")[0]
- lang = title_element.getAttribute("lang")
- print(lang) # 输出: en
复制代码- // C#
- XmlElement titleElement = (XmlElement)xmlDoc.GetElementsByTagName("title")[0];
- string lang = titleElement.GetAttribute("lang");
- Console.WriteLine(lang); // 输出: en
复制代码
getAttributeNode方法返回一个属性节点,而不是直接返回属性值。这在需要进一步操作属性节点时很有用。
- // JavaScript
- let titleElement = xmlDoc.getElementsByTagName("title")[0];
- let langAttr = titleElement.getAttributeNode("lang");
- console.log(langAttr.value); // 输出: en
- console.log(langAttr.name); // 输出: lang
复制代码- // Java
- Element titleElement = (Element) document.getElementsByTagName("title").item(0);
- Attr langAttr = titleElement.getAttributeNode("lang");
- System.out.println(langAttr.getValue()); // 输出: en
- System.out.println(langAttr.getName()); // 输出: lang
复制代码- # Python
- title_element = dom.getElementsByTagName("title")[0]
- lang_attr = title_element.getAttributeNode("lang")
- print(lang_attr.value) # 输出: en
- print(lang_attr.name) # 输出: lang
复制代码- // C#
- XmlElement titleElement = (XmlElement)xmlDoc.GetElementsByTagName("title")[0];
- XmlAttribute langAttr = titleElement.GetAttributeNode("lang");
- Console.WriteLine(langAttr.Value); // 输出: en
- Console.WriteLine(langAttr.Name); // 输出: lang
复制代码
attributes属性返回一个包含元素所有属性的NamedNodeMap或类似集合,可以通过属性名或索引访问。
- // JavaScript
- let titleElement = xmlDoc.getElementsByTagName("title")[0];
- let attributes = titleElement.attributes;
- // 通过属性名访问
- let lang = attributes.getNamedItem("lang").value;
- console.log(lang); // 输出: en
- // 遍历所有属性
- for (let i = 0; i < attributes.length; i++) {
- let attr = attributes[i];
- console.log(attr.name + ": " + attr.value);
- }
复制代码- // Java
- Element titleElement = (Element) document.getElementsByTagName("title").item(0);
- NamedNodeMap attributes = titleElement.getAttributes();
- // 通过属性名访问
- Node langAttr = attributes.getNamedItem("lang");
- System.out.println(langAttr.getNodeValue()); // 输出: en
- // 遍历所有属性
- for (int i = 0; i < attributes.getLength(); i++) {
- Node attr = attributes.item(i);
- System.out.println(attr.getNodeName() + ": " + attr.getNodeValue());
- }
复制代码- # Python
- title_element = dom.getElementsByTagName("title")[0]
- attributes = title_element.attributes
- # 通过属性名访问
- lang_attr = attributes.getNamedItem("lang")
- print(lang_attr.value) # 输出: en
- # 遍历所有属性
- for i in range(attributes.length):
- attr = attributes.item(i)
- print(f"{attr.name}: {attr.value}")
复制代码- // C#
- XmlElement titleElement = (XmlElement)xmlDoc.GetElementsByTagName("title")[0];
- XmlAttributeCollection attributes = titleElement.Attributes;
- // 通过属性名访问
- XmlAttribute langAttr = attributes["lang"];
- Console.WriteLine(langAttr.Value); // 输出: en
- // 遍历所有属性
- foreach (XmlAttribute attr in attributes)
- {
- Console.WriteLine(attr.Name + ": " + attr.Value);
- }
复制代码
设置属性值
setAttribute方法用于设置元素的属性值,如果属性不存在则创建该属性。
- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- bookElement.setAttribute("category", "fantasy");
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- bookElement.setAttribute("category", "fantasy");
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- book_element.setAttribute("category", "fantasy")
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- bookElement.SetAttribute("category", "fantasy");
复制代码
setAttributeNode方法用于添加一个新的属性节点到元素上。
- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- let newAttr = xmlDoc.createAttribute("id");
- newAttr.value = "b001";
- bookElement.setAttributeNode(newAttr);
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- Attr newAttr = document.createAttribute("id");
- newAttr.setValue("b001");
- bookElement.setAttributeNode(newAttr);
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- new_attr = dom.createAttribute("id")
- new_attr.value = "b001"
- book_element.setAttributeNode(new_attr)
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- XmlAttribute newAttr = xmlDoc.CreateAttribute("id");
- newAttr.Value = "b001";
- bookElement.SetAttributeNode(newAttr);
复制代码
删除属性
removeAttribute方法用于删除元素的指定属性。
- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- bookElement.removeAttribute("category");
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- bookElement.removeAttribute("category");
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- book_element.removeAttribute("category")
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- bookElement.RemoveAttribute("category");
复制代码
removeAttributeNode方法用于删除指定的属性节点,并返回被删除的节点。
- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- let categoryAttr = bookElement.getAttributeNode("category");
- let removedAttr = bookElement.removeAttributeNode(categoryAttr);
- console.log("Removed attribute: " + removedAttr.name);
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- Attr categoryAttr = bookElement.getAttributeNode("category");
- Attr removedAttr = bookElement.removeAttributeNode(categoryAttr);
- System.out.println("Removed attribute: " + removedAttr.getName());
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- category_attr = book_element.getAttributeNode("category")
- removed_attr = book_element.removeAttributeNode(category_attr)
- print(f"Removed attribute: {removed_attr.name}")
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- XmlAttribute categoryAttr = bookElement.GetAttributeNode("category");
- XmlAttribute removedAttr = bookElement.RemoveAttributeNode(categoryAttr);
- Console.WriteLine("Removed attribute: " + removedAttr.Name);
复制代码
节点遍历技术
父子节点访问
- // JavaScript
- let titleElement = xmlDoc.getElementsByTagName("title")[0];
- let parentElement = titleElement.parentNode;
- console.log(parentElement.tagName); // 输出: book
复制代码- // Java
- Element titleElement = (Element) document.getElementsByTagName("title").item(0);
- Node parentElement = titleElement.getParentNode();
- System.out.println(parentElement.getNodeName()); // 输出: book
复制代码- # Python
- title_element = dom.getElementsByTagName("title")[0]
- parent_element = title_element.parentNode
- print(parent_element.tagName) # 输出: book
复制代码- // C#
- XmlElement titleElement = (XmlElement)xmlDoc.GetElementsByTagName("title")[0];
- XmlNode parentElement = titleElement.ParentNode;
- Console.WriteLine(parentElement.Name); // 输出: book
复制代码- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- let childNodes = bookElement.childNodes;
- // 遍历子节点
- for (let i = 0; i < childNodes.length; i++) {
- let node = childNodes[i];
- if (node.nodeType === Node.ELEMENT_NODE) {
- console.log(node.tagName + ": " + node.textContent);
- }
- }
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- NodeList childNodes = bookElement.getChildNodes();
- // 遍历子节点
- for (int i = 0; i < childNodes.getLength(); i++) {
- Node node = childNodes.item(i);
- if (node.getNodeType() == Node.ELEMENT_NODE) {
- System.out.println(node.getNodeName() + ": " + node.getTextContent());
- }
- }
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- child_nodes = book_element.childNodes
- # 遍历子节点
- for i in range(child_nodes.length):
- node = child_nodes.item(i)
- if node.nodeType == node.ELEMENT_NODE:
- print(f"{node.tagName}: {node.firstChild.data}")
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- XmlNodeList childNodes = bookElement.ChildNodes;
- // 遍历子节点
- foreach (XmlNode node in childNodes)
- {
- if (node.NodeType == XmlNodeType.Element)
- {
- Console.WriteLine(node.Name + ": " + node.InnerText);
- }
- }
复制代码- // JavaScript
- let bookElement = xmlDoc.getElementsByTagName("book")[0];
- let firstChild = bookElement.firstChild;
- let lastChild = bookElement.lastChild;
- // 注意:firstChild和lastChild可能包括文本节点、注释节点等
- // 如果只想获取元素节点,可以使用firstElementChild和lastElementChild(如果支持)
- let firstElementChild = bookElement.firstElementChild || null;
- let lastElementChild = bookElement.lastElementChild || null;
复制代码- // Java
- Element bookElement = (Element) document.getElementsByTagName("book").item(0);
- Node firstChild = bookElement.getFirstChild();
- Node lastChild = bookElement.getLastChild();
- // 获取第一个元素子节点
- Node firstElementChild = null;
- NodeList children = bookElement.getChildNodes();
- for (int i = 0; i < children.getLength(); i++) {
- if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
- firstElementChild = children.item(i);
- break;
- }
- }
复制代码- # Python
- book_element = dom.getElementsByTagName("book")[0]
- first_child = book_element.firstChild
- last_child = book_element.lastChild
- # 获取第一个元素子节点
- first_element_child = None
- for child in book_element.childNodes:
- if child.nodeType == child.ELEMENT_NODE:
- first_element_child = child
- break
复制代码- // C#
- XmlElement bookElement = (XmlElement)xmlDoc.GetElementsByTagName("book")[0];
- XmlNode firstChild = bookElement.FirstChild;
- XmlNode lastChild = bookElement.LastChild;
- // 获取第一个元素子节点
- XmlNode firstElementChild = null;
- foreach (XmlNode node in bookElement.ChildNodes)
- {
- if (node.NodeType == XmlNodeType.Element)
- {
- firstElementChild = node;
- break;
- }
- }
复制代码
兄弟节点访问
- // JavaScript
- let titleElement = xmlDoc.getElementsByTagName("title")[0];
- let previousSibling = titleElement.previousSibling;
- let nextSibling = titleElement.nextSibling;
- // 注意:previousSibling和nextSibling可能包括文本节点、注释节点等
- // 如果只想获取元素节点,可以使用previousElementSibling和nextElementSibling(如果支持)
- let previousElementSibling = titleElement.previousElementSibling || null;
- let nextElementSibling = titleElement.nextElementSibling || null;
复制代码- // Java
- Element titleElement = (Element) document.getElementsByTagName("title").item(0);
- Node previousSibling = titleElement.getPreviousSibling();
- Node nextSibling = titleElement.getNextSibling();
- // 获取前一个元素兄弟节点
- Node previousElementSibling = null;
- Node node = titleElement.getPreviousSibling();
- while (node != null) {
- if (node.getNodeType() == Node.ELEMENT_NODE) {
- previousElementSibling = node;
- break;
- }
- node = node.getPreviousSibling();
- }
复制代码- # Python
- title_element = dom.getElementsByTagName("title")[0]
- previous_sibling = title_element.previousSibling
- next_sibling = title_element.nextSibling
- # 获取前一个元素兄弟节点
- previous_element_sibling = None
- node = title_element.previousSibling
- while node:
- if node.nodeType == node.ELEMENT_NODE:
- previous_element_sibling = node
- break
- node = node.previousSibling
复制代码- // C#
- XmlElement titleElement = (XmlElement)xmlDoc.GetElementsByTagName("title")[0];
- XmlNode previousSibling = titleElement.PreviousSibling;
- XmlNode nextSibling = titleElement.NextSibling;
- // 获取前一个元素兄弟节点
- XmlNode previousElementSibling = null;
- XmlNode node = titleElement.PreviousSibling;
- while (node != null)
- {
- if (node.NodeType == XmlNodeType.Element)
- {
- previousElementSibling = node;
- break;
- }
- node = node.PreviousSibling;
- }
复制代码
节点查找
getElementsByTagName方法返回一个包含所有指定标签名的元素列表。
- // JavaScript
- let titles = xmlDoc.getElementsByTagName("title");
- for (let i = 0; i < titles.length; i++) {
- console.log(titles[i].textContent);
- }
复制代码- // Java
- NodeList titles = document.getElementsByTagName("title");
- for (int i = 0; i < titles.getLength(); i++) {
- Element title = (Element) titles.item(i);
- System.out.println(title.getTextContent());
- }
复制代码- # Python
- titles = dom.getElementsByTagName("title")
- for i in range(titles.length):
- title = titles.item(i)
- print(title.firstChild.data)
复制代码- // C#
- XmlNodeList titles = xmlDoc.GetElementsByTagName("title");
- foreach (XmlNode title in titles)
- {
- Console.WriteLine(title.InnerText);
- }
复制代码
getElementById方法返回具有指定ID的元素。注意:要使此方法正常工作,XML文档必须有一个DTD或Schema定义了ID属性。
- // JavaScript
- // 假设XML文档中有一个元素有id="b001"
- let book = xmlDoc.getElementById("b001");
- if (book) {
- console.log(book.getAttribute("category"));
- }
复制代码- // Java
- // 假设XML文档中有一个元素有id="b001"
- Element book = document.getElementById("b001");
- if (book != null) {
- System.out.println(book.getAttribute("category"));
- }
复制代码- # Python
- # 假设XML文档中有一个元素有id="b001"
- book = dom.getElementById("b001")
- if book:
- print(book.getAttribute("category"))
复制代码- // C#
- // 假设XML文档中有一个元素有id="b001"
- XmlElement book = xmlDoc.GetElementById("b001");
- if (book != null)
- {
- Console.WriteLine(book.GetAttribute("category"));
- }
复制代码
注意:getElementsByClassName方法在XML DOM中可能不被所有实现支持,它更常用于HTML DOM。
- // JavaScript
- // 假设XML文档中有一些元素有class="fiction"
- let fictionBooks = xmlDoc.getElementsByClassName("fiction");
- for (let i = 0; i < fictionBooks.length; i++) {
- console.log(fictionBooks[i].textContent);
- }
复制代码
这些方法允许使用CSS选择器语法来查找元素,但请注意它们在XML DOM中的支持可能有限。
- // JavaScript
- // 查找所有category属性为"fiction"的book元素
- let fictionBooks = xmlDoc.querySelectorAll('book[category="fiction"]');
- for (let i = 0; i < fictionBooks.length; i++) {
- console.log(fictionBooks[i].textContent);
- }
复制代码
高级属性获取技术
XPath查询
XPath是一种在XML文档中查找信息的语言,它提供了强大的查询能力,远超基本的DOM方法。
XPath使用路径表达式来选取XML文档中的节点或节点集。以下是一些基本的XPath表达式:
• /bookstore/book:选取根元素bookstore下的所有book元素
• //book:选取所有book元素,无论它们在文档中的位置
• //@lang:选取所有名为lang的属性
• /bookstore/book[1]:选取属于bookstore子元素的第一个book元素
• /bookstore/book[last()]:选取属于bookstore子元素的最后一个book元素
• /bookstore/book[price>35.00]:选取bookstore元素的所有book元素,且其中的price元素的值须大于35.00
• //title[@lang]:选取所有拥有名为lang的属性的title元素
• //title[@lang='en']:选取所有title元素,且这些元素拥有值为en的lang属性
在JavaScript中,可以使用evaluate方法来执行XPath查询:
- // 创建XPath评估器
- let xpathResult = xmlDoc.evaluate('//book[@category="fiction"]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- let nodes = [];
- let node = xpathResult.iterateNext();
- while (node) {
- nodes.push(node);
- node = xpathResult.iterateNext();
- }
- // 输出结果
- nodes.forEach(function(node) {
- console.log(node.getElementsByTagName("title")[0].textContent);
- });
复制代码
在Java中,可以使用XPath API:
- import javax.xml.xpath.*;
- // 创建XPath工厂
- XPathFactory xpathFactory = XPathFactory.newInstance();
- XPath xpath = xpathFactory.newXPath();
- try {
- // 编译XPath表达式
- XPathExpression expr = xpath.compile("//book[@category='fiction']");
-
- // 执行查询
- NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
-
- // 输出结果
- for (int i = 0; i < nodes.getLength(); i++) {
- Element book = (Element) nodes.item(i);
- Element title = (Element) book.getElementsByTagName("title").item(0);
- System.out.println(title.getTextContent());
- }
- } catch (XPathExpressionException e) {
- e.printStackTrace();
- }
复制代码
在Python中,可以使用xpath方法(需要lxml库)或findall方法:
- # 使用lxml
- from lxml import etree
- # 解析XML文档
- tree = etree.parse("books.xml")
- root = tree.getroot()
- # 执行XPath查询
- books = root.xpath("//book[@category='fiction']")
- # 输出结果
- for book in books:
- title = book.find("title")
- print(title.text)
复制代码
在C#中,可以使用SelectNodes或SelectSingleNode方法:
- using System.Xml.XPath;
- // 执行XPath查询
- XmlNodeList nodes = xmlDoc.SelectNodes("//book[@category='fiction']");
- // 输出结果
- foreach (XmlNode node in nodes)
- {
- XmlNode titleNode = node.SelectSingleNode("title");
- Console.WriteLine(titleNode.InnerText);
- }
复制代码
XPath的强大之处在于它能够执行复杂的查询。以下是一些更复杂的XPath示例:
- // JavaScript示例
- // 查找价格大于20的书籍
- let expensiveBooks = xmlDoc.evaluate('//book[price>20]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- // 查找作者为"J.K. Rowling"的书籍
- let rowlingBooks = xmlDoc.evaluate('//book[author="J.K. Rowling"]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- // 查找标题包含"Potter"的书籍
- let potterBooks = xmlDoc.evaluate('//book[contains(title, "Potter")]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- // 查找lang属性为"en"且category属性为"fiction"的书籍
- let specificBooks = xmlDoc.evaluate('//book[@category="fiction" and title/@lang="en"]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- // 使用XPath函数
- // 查找价格最高的书籍
- let mostExpensiveBook = xmlDoc.evaluate('//book[price = max(//book/price)]', xmlDoc, null, XPathResult.ANY_TYPE, null);
复制代码
命名空间处理
当XML文档使用命名空间时,属性获取会变得稍微复杂。以下是如何处理带有命名空间的XML文档:
- <?xml version="1.0" encoding="UTF-8"?>
- <bookstore xmlns:bs="http://www.example.com/bookstore"
- xmlns:b="http://www.example.com/book">
- <b:book bs:category="fiction">
- <b:title b:lang="en">Harry Potter</b:title>
- <b:author>J.K. Rowling</b:author>
- <b:year>2005</b:year>
- <b:price>29.99</b:price>
- </b:book>
- <b:book bs:category="children">
- <b:title b:lang="en">The Wonderful Wizard of Oz</b:title>
- <b:author>L. Frank Baum</b:author>
- <b:year>1900</b:year>
- <b:price>15.99</b:price>
- </b:book>
- </bookstore>
复制代码- // 创建命名空间解析器
- function nsResolver(prefix) {
- var ns = {
- 'bs': 'http://www.example.com/bookstore',
- 'b': 'http://www.example.com/book'
- };
- return ns[prefix] || null;
- }
- // 使用命名空间执行XPath查询
- let xpathResult = xmlDoc.evaluate('//b:book[@bs:category="fiction"]', xmlDoc, nsResolver, XPathResult.ANY_TYPE, null);
- let nodes = [];
- let node = xpathResult.iterateNext();
- while (node) {
- nodes.push(node);
- node = xpathResult.iterateNext();
- }
- // 输出结果
- nodes.forEach(function(node) {
- console.log(node.getElementsByTagNameNS("http://www.example.com/book", "title")[0].textContent);
- });
复制代码- import javax.xml.namespace.NamespaceContext;
- import java.util.Iterator;
- // 创建命名空间上下文
- NamespaceContext nsContext = new NamespaceContext() {
- @Override
- public String getNamespaceURI(String prefix) {
- if (prefix.equals("bs")) {
- return "http://www.example.com/bookstore";
- } else if (prefix.equals("b")) {
- return "http://www.example.com/book";
- }
- return null;
- }
- @Override
- public String getPrefix(String namespaceURI) {
- return null;
- }
- @Override
- public Iterator<String> getPrefixes(String namespaceURI) {
- return null;
- }
- };
- // 创建XPath工厂
- XPathFactory xpathFactory = XPathFactory.newInstance();
- XPath xpath = xpathFactory.newXPath();
- xpath.setNamespaceContext(nsContext);
- try {
- // 编译XPath表达式
- XPathExpression expr = xpath.compile("//b:book[@bs:category='fiction']");
-
- // 执行查询
- NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
-
- // 输出结果
- for (int i = 0; i < nodes.getLength(); i++) {
- Element book = (Element) nodes.item(i);
- NodeList titles = book.getElementsByTagNameNS("http://www.example.com/book", "title");
- Element title = (Element) titles.item(0);
- System.out.println(title.getTextContent());
- }
- } catch (XPathExpressionException e) {
- e.printStackTrace();
- }
复制代码- # 使用lxml处理命名空间
- from lxml import etree
- # 定义命名空间
- namespaces = {
- 'bs': 'http://www.example.com/bookstore',
- 'b': 'http://www.example.com/book'
- }
- # 解析XML文档
- tree = etree.parse("books_ns.xml")
- root = tree.getroot()
- # 使用命名空间执行XPath查询
- books = root.xpath('//b:book[@bs:category="fiction"]', namespaces=namespaces)
- # 输出结果
- for book in books:
- title = book.find('b:title', namespaces=namespaces)
- print(title.text)
复制代码- // 创建XmlNamespaceManager
- XmlNamespaceManager nsManager = new XmlNamespaceManager(xmlDoc.NameTable);
- nsManager.AddNamespace("bs", "http://www.example.com/bookstore");
- nsManager.AddNamespace("b", "http://www.example.com/book");
- // 使用命名空间执行XPath查询
- XmlNodeList nodes = xmlDoc.SelectNodes("//b:book[@bs:category='fiction']", nsManager);
- // 输出结果
- foreach (XmlNode node in nodes)
- {
- XmlNode titleNode = node.SelectSingleNode("b:title", nsManager);
- Console.WriteLine(titleNode.InnerText);
- }
复制代码
属性值转换和验证
在实际应用中,我们经常需要将属性值转换为特定的数据类型,并验证其有效性。
- // 获取价格属性并转换为数字
- let priceElement = xmlDoc.getElementsByTagName("price")[0];
- let priceText = priceElement.textContent;
- let price = parseFloat(priceText);
- if (!isNaN(price)) {
- console.log("Price is: " + price);
- } else {
- console.log("Invalid price format");
- }
- // 获取年份属性并转换为整数
- let yearElement = xmlDoc.getElementsByTagName("year")[0];
- let yearText = yearElement.textContent;
- let year = parseInt(yearText, 10);
- if (!isNaN(year)) {
- console.log("Year is: " + year);
- } else {
- console.log("Invalid year format");
- }
复制代码- // 获取价格元素并转换为double
- Element priceElement = (Element) document.getElementsByTagName("price").item(0);
- String priceText = priceElement.getTextContent();
- double price;
- try {
- price = Double.parseDouble(priceText);
- System.out.println("Price is: " + price);
- } catch (NumberFormatException e) {
- System.out.println("Invalid price format");
- }
- // 获取年份元素并转换为整数
- Element yearElement = (Element) document.getElementsByTagName("year").item(0);
- String yearText = yearElement.getTextContent();
- int year;
- try {
- year = Integer.parseInt(yearText);
- System.out.println("Year is: " + year);
- } catch (NumberFormatException e) {
- System.out.println("Invalid year format");
- }
复制代码- # 获取价格元素并转换为浮点数
- price_element = dom.getElementsByTagName("price")[0]
- price_text = price_element.firstChild.data
- try:
- price = float(price_text)
- print(f"Price is: {price}")
- except ValueError:
- print("Invalid price format")
- # 获取年份元素并转换为整数
- year_element = dom.getElementsByTagName("year")[0]
- year_text = year_element.firstChild.data
- try:
- year = int(year_text)
- print(f"Year is: {year}")
- except ValueError:
- print("Invalid year format")
复制代码- // 获取价格元素并转换为double
- XmlNode priceNode = xmlDoc.GetElementsByTagName("price")[0];
- string priceText = priceNode.InnerText;
- double price;
- if (double.TryParse(priceText, out price))
- {
- Console.WriteLine("Price is: " + price);
- }
- else
- {
- Console.WriteLine("Invalid price format");
- }
- // 获取年份元素并转换为整数
- XmlNode yearNode = xmlDoc.GetElementsByTagName("year")[0];
- string yearText = yearNode.InnerText;
- int year;
- if (int.TryParse(yearText, out year))
- {
- Console.WriteLine("Year is: " + year);
- }
- else
- {
- Console.WriteLine("Invalid year format");
- }
复制代码- // 验证价格是否为正数
- function validatePrice(price) {
- return !isNaN(price) && price > 0;
- }
- // 验证年份是否在合理范围内
- function validateYear(year) {
- const currentYear = new Date().getFullYear();
- return !isNaN(year) && year >= 1800 && year <= currentYear + 1;
- }
- // 使用验证函数
- let priceElement = xmlDoc.getElementsByTagName("price")[0];
- let price = parseFloat(priceElement.textContent);
- if (validatePrice(price)) {
- console.log("Valid price: " + price);
- } else {
- console.log("Invalid price: " + price);
- }
- let yearElement = xmlDoc.getElementsByTagName("year")[0];
- let year = parseInt(yearElement.textContent, 10);
- if (validateYear(year)) {
- console.log("Valid year: " + year);
- } else {
- console.log("Invalid year: " + year);
- }
复制代码- // 验证价格是否为正数
- public static boolean validatePrice(double price) {
- return !Double.isNaN(price) && price > 0;
- }
- // 验证年份是否在合理范围内
- public static boolean validateYear(int year) {
- int currentYear = java.time.Year.now().getValue();
- return year >= 1800 && year <= currentYear + 1;
- }
- // 使用验证方法
- Element priceElement = (Element) document.getElementsByTagName("price").item(0);
- double price = Double.parseDouble(priceElement.getTextContent());
- if (validatePrice(price)) {
- System.out.println("Valid price: " + price);
- } else {
- System.out.println("Invalid price: " + price);
- }
- Element yearElement = (Element) document.getElementsByTagName("year").item(0);
- int year = Integer.parseInt(yearElement.getTextContent());
- if (validateYear(year)) {
- System.out.println("Valid year: " + year);
- } else {
- System.out.println("Invalid year: " + year);
- }
复制代码- import datetime
- # 验证价格是否为正数
- def validate_price(price):
- return price > 0
- # 验证年份是否在合理范围内
- def validate_year(year):
- current_year = datetime.datetime.now().year
- return 1800 <= year <= current_year + 1
- # 使用验证函数
- price_element = dom.getElementsByTagName("price")[0]
- price = float(price_element.firstChild.data)
- if validate_price(price):
- print(f"Valid price: {price}")
- else:
- print(f"Invalid price: {price}")
- year_element = dom.getElementsByTagName("year")[0]
- year = int(year_element.firstChild.data)
- if validate_year(year):
- print(f"Valid year: {year}")
- else:
- print(f"Invalid year: {year}")
复制代码- // 验证价格是否为正数
- public static bool ValidatePrice(double price)
- {
- return !double.IsNaN(price) && price > 0;
- }
- // 验证年份是否在合理范围内
- public static bool ValidateYear(int year)
- {
- int currentYear = DateTime.Now.Year;
- return year >= 1800 && year <= currentYear + 1;
- }
- // 使用验证方法
- XmlNode priceNode = xmlDoc.GetElementsByTagName("price")[0];
- double price = double.Parse(priceNode.InnerText);
- if (ValidatePrice(price))
- {
- Console.WriteLine("Valid price: " + price);
- }
- else
- {
- Console.WriteLine("Invalid price: " + price);
- }
- XmlNode yearNode = xmlDoc.GetElementsByTagName("year")[0];
- int year = int.Parse(yearNode.InnerText);
- if (ValidateYear(year))
- {
- Console.WriteLine("Valid year: " + year);
- }
- else
- {
- Console.WriteLine("Invalid year: " + year);
- }
复制代码
实际应用案例
案例1:配置文件解析
假设我们有一个应用程序配置文件,需要读取并解析其中的设置:
- <?xml version="1.0" encoding="UTF-8"?>
- <config>
- <database>
- <host type="string" required="true">localhost</host>
- <port type="integer" required="true" min="1" max="65535">3306</port>
- <username type="string" required="true">admin</username>
- <password type="string" required="true" encrypted="true">s3cr3t</password>
- <connectionTimeout type="integer" required="false" default="30">30</connectionTimeout>
- </database>
- <logging>
- <level type="string" required="true" values="debug,info,warn,error">info</level>
- <file type="string" required="false">app.log</file>
- <maxSize type="integer" required="false" default="10485760">10485760</maxSize>
- </logging>
- </config>
复制代码
以下是解析这个配置文件的代码示例:
- // 解析配置文件
- function parseConfig(xmlString) {
- let parser = new DOMParser();
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
- let config = {};
-
- // 解析数据库配置
- let databaseConfig = {};
- let databaseNode = xmlDoc.getElementsByTagName("database")[0];
-
- let hostNode = databaseNode.getElementsByTagName("host")[0];
- databaseConfig.host = {
- value: hostNode.textContent,
- type: hostNode.getAttribute("type"),
- required: hostNode.getAttribute("required") === "true"
- };
-
- let portNode = databaseNode.getElementsByTagName("port")[0];
- databaseConfig.port = {
- value: parseInt(portNode.textContent, 10),
- type: portNode.getAttribute("type"),
- required: portNode.getAttribute("required") === "true",
- min: parseInt(portNode.getAttribute("min"), 10),
- max: parseInt(portNode.getAttribute("max"), 10)
- };
-
- let usernameNode = databaseNode.getElementsByTagName("username")[0];
- databaseConfig.username = {
- value: usernameNode.textContent,
- type: usernameNode.getAttribute("type"),
- required: usernameNode.getAttribute("required") === "true"
- };
-
- let passwordNode = databaseNode.getElementsByTagName("password")[0];
- databaseConfig.password = {
- value: passwordNode.textContent,
- type: passwordNode.getAttribute("type"),
- required: passwordNode.getAttribute("required") === "true",
- encrypted: passwordNode.getAttribute("encrypted") === "true"
- };
-
- let connectionTimeoutNode = databaseNode.getElementsByTagName("connectionTimeout")[0];
- databaseConfig.connectionTimeout = {
- value: parseInt(connectionTimeoutNode.textContent, 10),
- type: connectionTimeoutNode.getAttribute("type"),
- required: connectionTimeoutNode.getAttribute("required") === "true",
- default: parseInt(connectionTimeoutNode.getAttribute("default"), 10)
- };
-
- config.database = databaseConfig;
-
- // 解析日志配置
- let loggingConfig = {};
- let loggingNode = xmlDoc.getElementsByTagName("logging")[0];
-
- let levelNode = loggingNode.getElementsByTagName("level")[0];
- loggingConfig.level = {
- value: levelNode.textContent,
- type: levelNode.getAttribute("type"),
- required: levelNode.getAttribute("required") === "true",
- values: levelNode.getAttribute("values").split(",")
- };
-
- let fileNode = loggingNode.getElementsByTagName("file")[0];
- loggingConfig.file = {
- value: fileNode.textContent,
- type: fileNode.getAttribute("type"),
- required: fileNode.getAttribute("required") === "true"
- };
-
- let maxSizeNode = loggingNode.getElementsByTagName("maxSize")[0];
- loggingConfig.maxSize = {
- value: parseInt(maxSizeNode.textContent, 10),
- type: maxSizeNode.getAttribute("type"),
- required: maxSizeNode.getAttribute("required") === "true",
- default: parseInt(maxSizeNode.getAttribute("default"), 10)
- };
-
- config.logging = loggingConfig;
-
- return config;
- }
- // 使用示例
- let configXml = `<?xml version="1.0" encoding="UTF-8"?>
- <config>
- <database>
- <host type="string" required="true">localhost</host>
- <port type="integer" required="true" min="1" max="65535">3306</port>
- <username type="string" required="true">admin</username>
- <password type="string" required="true" encrypted="true">s3cr3t</password>
- <connectionTimeout type="integer" required="false" default="30">30</connectionTimeout>
- </database>
- <logging>
- <level type="string" required="true" values="debug,info,warn,error">info</level>
- <file type="string" required="false">app.log</file>
- <maxSize type="integer" required="false" default="10485760">10485760</maxSize>
- </logging>
- </config>`;
- let config = parseConfig(configXml);
- console.log(JSON.stringify(config, null, 2));
复制代码- import org.w3c.dom.*;
- import javax.xml.parsers.*;
- import java.io.*;
- public class ConfigParser {
- public static Config parseConfig(String xmlString) throws Exception {
- DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
- DocumentBuilder builder = factory.newDocumentBuilder();
- Document document = builder.parse(new ByteArrayInputStream(xmlString.getBytes()));
-
- Config config = new Config();
-
- // 解析数据库配置
- DatabaseConfig databaseConfig = new DatabaseConfig();
- Element databaseNode = (Element) document.getElementsByTagName("database").item(0);
-
- Element hostNode = (Element) databaseNode.getElementsByTagName("host").item(0);
- databaseConfig.setHost(new ConfigValue(
- hostNode.getTextContent(),
- hostNode.getAttribute("type"),
- Boolean.parseBoolean(hostNode.getAttribute("required"))
- ));
-
- Element portNode = (Element) databaseNode.getElementsByTagName("port").item(0);
- databaseConfig.setPort(new ConfigValue(
- Integer.parseInt(portNode.getTextContent()),
- portNode.getAttribute("type"),
- Boolean.parseBoolean(portNode.getAttribute("required")),
- Integer.parseInt(portNode.getAttribute("min")),
- Integer.parseInt(portNode.getAttribute("max"))
- ));
-
- Element usernameNode = (Element) databaseNode.getElementsByTagName("username").item(0);
- databaseConfig.setUsername(new ConfigValue(
- usernameNode.getTextContent(),
- usernameNode.getAttribute("type"),
- Boolean.parseBoolean(usernameNode.getAttribute("required"))
- ));
-
- Element passwordNode = (Element) databaseNode.getElementsByTagName("password").item(0);
- databaseConfig.setPassword(new ConfigValue(
- passwordNode.getTextContent(),
- passwordNode.getAttribute("type"),
- Boolean.parseBoolean(passwordNode.getAttribute("required")),
- Boolean.parseBoolean(passwordNode.getAttribute("encrypted"))
- ));
-
- Element connectionTimeoutNode = (Element) databaseNode.getElementsByTagName("connectionTimeout").item(0);
- databaseConfig.setConnectionTimeout(new ConfigValue(
- Integer.parseInt(connectionTimeoutNode.getTextContent()),
- connectionTimeoutNode.getAttribute("type"),
- Boolean.parseBoolean(connectionTimeoutNode.getAttribute("required")),
- Integer.parseInt(connectionTimeoutNode.getAttribute("default"))
- ));
-
- config.setDatabase(databaseConfig);
-
- // 解析日志配置
- LoggingConfig loggingConfig = new LoggingConfig();
- Element loggingNode = (Element) document.getElementsByTagName("logging").item(0);
-
- Element levelNode = (Element) loggingNode.getElementsByTagName("level").item(0);
- loggingConfig.setLevel(new ConfigValue(
- levelNode.getTextContent(),
- levelNode.getAttribute("type"),
- Boolean.parseBoolean(levelNode.getAttribute("required")),
- levelNode.getAttribute("values").split(",")
- ));
-
- Element fileNode = (Element) loggingNode.getElementsByTagName("file").item(0);
- loggingConfig.setFile(new ConfigValue(
- fileNode.getTextContent(),
- fileNode.getAttribute("type"),
- Boolean.parseBoolean(fileNode.getAttribute("required"))
- ));
-
- Element maxSizeNode = (Element) loggingNode.getElementsByTagName("maxSize").item(0);
- loggingConfig.setMaxSize(new ConfigValue(
- Integer.parseInt(maxSizeNode.getTextContent()),
- maxSizeNode.getAttribute("type"),
- Boolean.parseBoolean(maxSizeNode.getAttribute("required")),
- Integer.parseInt(maxSizeNode.getAttribute("default"))
- ));
-
- config.setLogging(loggingConfig);
-
- return config;
- }
-
- // 配置类
- public static class Config {
- private DatabaseConfig database;
- private LoggingConfig logging;
-
- // getters and setters
- }
-
- public static class DatabaseConfig {
- private ConfigValue host;
- private ConfigValue port;
- private ConfigValue username;
- private ConfigValue password;
- private ConfigValue connectionTimeout;
-
- // getters and setters
- }
-
- public static class LoggingConfig {
- private ConfigValue level;
- private ConfigValue file;
- private ConfigValue maxSize;
-
- // getters and setters
- }
-
- public static class ConfigValue {
- private Object value;
- private String type;
- private boolean required;
- private Object[] constraints;
-
- public ConfigValue(Object value, String type, boolean required, Object... constraints) {
- this.value = value;
- this.type = type;
- this.required = required;
- this.constraints = constraints;
- }
-
- // getters
- }
-
- public static void main(String[] args) {
- try {
- String configXml = "<?xml version="1.0" encoding="UTF-8"?>\n" +
- "<config>\n" +
- " <database>\n" +
- " <host type="string" required="true">localhost</host>\n" +
- " <port type="integer" required="true" min="1" max="65535">3306</port>\n" +
- " <username type="string" required="true">admin</username>\n" +
- " <password type="string" required="true" encrypted="true">s3cr3t</password>\n" +
- " <connectionTimeout type="integer" required="false" default="30">30</connectionTimeout>\n" +
- " </database>\n" +
- " <logging>\n" +
- " <level type="string" required="true" values="debug,info,warn,error">info</level>\n" +
- " <file type="string" required="false">app.log</file>\n" +
- " <maxSize type="integer" required="false" default="10485760">10485760</maxSize>\n" +
- " </logging>\n" +
- "</config>";
-
- Config config = parseConfig(configXml);
- System.out.println(config.toString());
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
复制代码- from xml.dom.minidom import parseString
- def parse_config(xml_string):
- dom = parseString(xml_string)
- config = {}
-
- # 解析数据库配置
- database_config = {}
- database_node = dom.getElementsByTagName("database")[0]
-
- host_node = database_node.getElementsByTagName("host")[0]
- database_config['host'] = {
- 'value': host_node.firstChild.data,
- 'type': host_node.getAttribute("type"),
- 'required': host_node.getAttribute("required") == "true"
- }
-
- port_node = database_node.getElementsByTagName("port")[0]
- database_config['port'] = {
- 'value': int(port_node.firstChild.data),
- 'type': port_node.getAttribute("type"),
- 'required': port_node.getAttribute("required") == "true",
- 'min': int(port_node.getAttribute("min")),
- 'max': int(port_node.getAttribute("max"))
- }
-
- username_node = database_node.getElementsByTagName("username")[0]
- database_config['username'] = {
- 'value': username_node.firstChild.data,
- 'type': username_node.getAttribute("type"),
- 'required': username_node.getAttribute("required") == "true"
- }
-
- password_node = database_node.getElementsByTagName("password")[0]
- database_config['password'] = {
- 'value': password_node.firstChild.data,
- 'type': password_node.getAttribute("type"),
- 'required': password_node.getAttribute("required") == "true",
- 'encrypted': password_node.getAttribute("encrypted") == "true"
- }
-
- connection_timeout_node = database_node.getElementsByTagName("connectionTimeout")[0]
- database_config['connectionTimeout'] = {
- 'value': int(connection_timeout_node.firstChild.data),
- 'type': connection_timeout_node.getAttribute("type"),
- 'required': connection_timeout_node.getAttribute("required") == "true",
- 'default': int(connection_timeout_node.getAttribute("default"))
- }
-
- config['database'] = database_config
-
- # 解析日志配置
- logging_config = {}
- logging_node = dom.getElementsByTagName("logging")[0]
-
- level_node = logging_node.getElementsByTagName("level")[0]
- logging_config['level'] = {
- 'value': level_node.firstChild.data,
- 'type': level_node.getAttribute("type"),
- 'required': level_node.getAttribute("required") == "true",
- 'values': level_node.getAttribute("values").split(",")
- }
-
- file_node = logging_node.getElementsByTagName("file")[0]
- logging_config['file'] = {
- 'value': file_node.firstChild.data,
- 'type': file_node.getAttribute("type"),
- 'required': file_node.getAttribute("required") == "true"
- }
-
- max_size_node = logging_node.getElementsByTagName("maxSize")[0]
- logging_config['maxSize'] = {
- 'value': int(max_size_node.firstChild.data),
- 'type': max_size_node.getAttribute("type"),
- 'required': max_size_node.getAttribute("required") == "true",
- 'default': int(max_size_node.getAttribute("default"))
- }
-
- config['logging'] = logging_config
-
- return config
- # 使用示例
- config_xml = """<?xml version="1.0" encoding="UTF-8"?>
- <config>
- <database>
- <host type="string" required="true">localhost</host>
- <port type="integer" required="true" min="1" max="65535">3306</port>
- <username type="string" required="true">admin</username>
- <password type="string" required="true" encrypted="true">s3cr3t</password>
- <connectionTimeout type="integer" required="false" default="30">30</connectionTimeout>
- </database>
- <logging>
- <level type="string" required="true" values="debug,info,warn,error">info</level>
- <file type="string" required="false">app.log</file>
- <maxSize type="integer" required="false" default="10485760">10485760</maxSize>
- </logging>
- </config>"""
- config = parse_config(config_xml)
- import json
- print(json.dumps(config, indent=2))
复制代码- using System;
- using System.Xml;
- using System.Collections.Generic;
- public class ConfigParser
- {
- public static Config ParseConfig(string xmlString)
- {
- XmlDocument xmlDoc = new XmlDocument();
- xmlDoc.LoadXml(xmlString);
-
- Config config = new Config();
-
- // 解析数据库配置
- DatabaseConfig databaseConfig = new DatabaseConfig();
- XmlNode databaseNode = xmlDoc.SelectSingleNode("//database");
-
- XmlNode hostNode = databaseNode.SelectSingleNode("host");
- databaseConfig.Host = new ConfigValue(
- hostNode.InnerText,
- hostNode.Attributes["type"].Value,
- bool.Parse(hostNode.Attributes["required"].Value)
- );
-
- XmlNode portNode = databaseNode.SelectSingleNode("port");
- databaseConfig.Port = new ConfigValue(
- int.Parse(portNode.InnerText),
- portNode.Attributes["type"].Value,
- bool.Parse(portNode.Attributes["required"].Value),
- int.Parse(portNode.Attributes["min"].Value),
- int.Parse(portNode.Attributes["max"].Value)
- );
-
- XmlNode usernameNode = databaseNode.SelectSingleNode("username");
- databaseConfig.Username = new ConfigValue(
- usernameNode.InnerText,
- usernameNode.Attributes["type"].Value,
- bool.Parse(usernameNode.Attributes["required"].Value)
- );
-
- XmlNode passwordNode = databaseNode.SelectSingleNode("password");
- databaseConfig.Password = new ConfigValue(
- passwordNode.InnerText,
- passwordNode.Attributes["type"].Value,
- bool.Parse(passwordNode.Attributes["required"].Value),
- bool.Parse(passwordNode.Attributes["encrypted"].Value)
- );
-
- XmlNode connectionTimeoutNode = databaseNode.SelectSingleNode("connectionTimeout");
- databaseConfig.ConnectionTimeout = new ConfigValue(
- int.Parse(connectionTimeoutNode.InnerText),
- connectionTimeoutNode.Attributes["type"].Value,
- bool.Parse(connectionTimeoutNode.Attributes["required"].Value),
- int.Parse(connectionTimeoutNode.Attributes["default"].Value)
- );
-
- config.Database = databaseConfig;
-
- // 解析日志配置
- LoggingConfig loggingConfig = new LoggingConfig();
- XmlNode loggingNode = xmlDoc.SelectSingleNode("//logging");
-
- XmlNode levelNode = loggingNode.SelectSingleNode("level");
- loggingConfig.Level = new ConfigValue(
- levelNode.InnerText,
- levelNode.Attributes["type"].Value,
- bool.Parse(levelNode.Attributes["required"].Value),
- levelNode.Attributes["values"].Value.Split(',')
- );
-
- XmlNode fileNode = loggingNode.SelectSingleNode("file");
- loggingConfig.File = new ConfigValue(
- fileNode.InnerText,
- fileNode.Attributes["type"].Value,
- bool.Parse(fileNode.Attributes["required"].Value)
- );
-
- XmlNode maxSizeNode = loggingNode.SelectSingleNode("maxSize");
- loggingConfig.MaxSize = new ConfigValue(
- int.Parse(maxSizeNode.InnerText),
- maxSizeNode.Attributes["type"].Value,
- bool.Parse(maxSizeNode.Attributes["required"].Value),
- int.Parse(maxSizeNode.Attributes["default"].Value)
- );
-
- config.Logging = loggingConfig;
-
- return config;
- }
-
- // 配置类
- public class Config
- {
- public DatabaseConfig Database { get; set; }
- public LoggingConfig Logging { get; set; }
- }
-
- public class DatabaseConfig
- {
- public ConfigValue Host { get; set; }
- public ConfigValue Port { get; set; }
- public ConfigValue Username { get; set; }
- public ConfigValue Password { get; set; }
- public ConfigValue ConnectionTimeout { get; set; }
- }
-
- public class LoggingConfig
- {
- public ConfigValue Level { get; set; }
- public ConfigValue File { get; set; }
- public ConfigValue MaxSize { get; set; }
- }
-
- public class ConfigValue
- {
- public object Value { get; set; }
- public string Type { get; set; }
- public bool Required { get; set; }
- public object[] Constraints { get; set; }
-
- public ConfigValue(object value, string type, bool required, params object[] constraints)
- {
- Value = value;
- Type = type;
- Required = required;
- Constraints = constraints;
- }
- }
-
- public static void Main(string[] args)
- {
- string configXml = @"<?xml version=""1.0"" encoding=""UTF-8""?>
- <config>
- <database>
- <host type=""string"" required=""true"">localhost</host>
- <port type=""integer"" required=""true"" min=""1"" max=""65535"">3306</port>
- <username type=""string"" required=""true"">admin</username>
- <password type=""string"" required=""true"" encrypted=""true"">s3cr3t</password>
- <connectionTimeout type=""integer"" required=""false"" default=""30"">30</connectionTimeout>
- </database>
- <logging>
- <level type=""string"" required=""true"" values=""debug,info,warn,error"">info</level>
- <file type=""string"" required=""false"">app.log</file>
- <maxSize type=""integer"" required=""false"" default=""10485760"">10485760</maxSize>
- </logging>
- </config>";
-
- Config config = ParseConfig(configXml);
- Console.WriteLine(Newtonsoft.Json.JsonConvert.SerializeObject(config, Newtonsoft.Json.Formatting.Indented));
- }
- }
复制代码
案例2:XML数据转换
假设我们需要将一个XML格式的产品目录转换为另一种格式,例如从内部格式转换为外部交换格式:
原始XML格式:
- <?xml version="1.0" encoding="UTF-8"?>
- <catalog>
- <product id="p001" category="electronics">
- <name>Smartphone</name>
- <description>A high-end smartphone with advanced features</description>
- <price currency="USD">699.99</price>
- <stock>50</stock>
- <specifications>
- <spec name="display">6.1 inch OLED</spec>
- <spec name="storage">128GB</spec>
- <spec name="ram">6GB</spec>
- <spec name="camera">12MP dual camera</spec>
- </specifications>
- </product>
- <product id="p002" category="electronics">
- <name>Laptop</name>
- <description>Powerful laptop for professionals</description>
- <price currency="USD">1299.99</price>
- <stock>25</stock>
- <specifications>
- <spec name="display">15.6 inch Full HD</spec>
- <spec name="storage">512GB SSD</spec>
- <spec name="ram">16GB</spec>
- <spec name="processor">Intel Core i7</spec>
- </specifications>
- </product>
- </catalog>
复制代码
目标XML格式:
- <?xml version="1.0" encoding="UTF-8"?>
- <productList xmlns="http://www.example.com/products">
- <product sku="p001" type="electronics">
- <title>Smartphone</title>
- <details>A high-end smartphone with advanced features</details>
- <cost currency="USD">699.99</cost>
- <inventory>50</inventory>
- <features>
- <feature key="display">6.1 inch OLED</feature>
- <feature key="storage">128GB</feature>
- <feature key="ram">6GB</feature>
- <feature key="camera">12MP dual camera</feature>
- </features>
- </product>
- <product sku="p002" type="electronics">
- <title>Laptop</title>
- <details>Powerful laptop for professionals</details>
- <cost currency="USD">1299.99</cost>
- <inventory>25</inventory>
- <features>
- <feature key="display">15.6 inch Full HD</feature>
- <feature key="storage">512GB SSD</feature>
- <feature key="ram">16GB</feature>
- <feature key="processor">Intel Core i7</feature>
- </features>
- </product>
- </productList>
复制代码
以下是实现这种转换的代码示例:
- // 转换XML格式
- function transformProductCatalog(inputXmlString) {
- // 解析输入XML
- let parser = new DOMParser();
- let inputDoc = parser.parseFromString(inputXmlString, "text/xml");
-
- // 创建输出XML文档
- let outputDoc = document.implementation.createDocument("http://www.example.com/products", "productList", null);
- let productList = outputDoc.documentElement;
-
- // 获取所有产品节点
- let products = inputDoc.getElementsByTagName("product");
-
- // 遍历每个产品
- for (let i = 0; i < products.length; i++) {
- let inputProduct = products[i];
-
- // 创建产品节点
- let outputProduct = outputDoc.createElementNS("http://www.example.com/products", "product");
-
- // 设置属性
- outputProduct.setAttribute("sku", inputProduct.getAttribute("id"));
- outputProduct.setAttribute("type", inputProduct.getAttribute("category"));
-
- // 添加子元素
- let name = inputProduct.getElementsByTagName("name")[0];
- let title = outputDoc.createElementNS("http://www.example.com/products", "title");
- title.appendChild(outputDoc.createTextNode(name.textContent));
- outputProduct.appendChild(title);
-
- let description = inputProduct.getElementsByTagName("description")[0];
- let details = outputDoc.createElementNS("http://www.example.com/products", "details");
- details.appendChild(outputDoc.createTextNode(description.textContent));
- outputProduct.appendChild(details);
-
- let price = inputProduct.getElementsByTagName("price")[0];
- let cost = outputDoc.createElementNS("http://www.example.com/products", "cost");
- cost.setAttribute("currency", price.getAttribute("currency"));
- cost.appendChild(outputDoc.createTextNode(price.textContent));
- outputProduct.appendChild(cost);
-
- let stock = inputProduct.getElementsByTagName("stock")[0];
- let inventory = outputDoc.createElementNS("http://www.example.com/products", "inventory");
- inventory.appendChild(outputDoc.createTextNode(stock.textContent));
- outputProduct.appendChild(inventory);
-
- // 处理规格/特性
- let specifications = inputProduct.getElementsByTagName("specifications")[0];
- let specs = specifications.getElementsByTagName("spec");
- let features = outputDoc.createElementNS("http://www.example.com/products", "features");
-
- for (let j = 0; j < specs.length; j++) {
- let spec = specs[j];
- let feature = outputDoc.createElementNS("http://www.example.com/products", "feature");
- feature.setAttribute("key", spec.getAttribute("name"));
- feature.appendChild(outputDoc.createTextNode(spec.textContent));
- features.appendChild(feature);
- }
-
- outputProduct.appendChild(features);
-
- // 将产品添加到产品列表
- productList.appendChild(outputProduct);
- }
-
- // 序列化输出XML
- let serializer = new XMLSerializer();
- return serializer.serializeToString(outputDoc);
- }
- // 使用示例
- let inputXml = `<?xml version="1.0" encoding="UTF-8"?>
- <catalog>
- <product id="p001" category="electronics">
- <name>Smartphone</name>
- <description>A high-end smartphone with advanced features</description>
- <price currency="USD">699.99</price>
- <stock>50</stock>
- <specifications>
- <spec name="display">6.1 inch OLED</spec>
- <spec name="storage">128GB</spec>
- <spec name="ram">6GB</spec>
- <spec name="camera">12MP dual camera</spec>
- </specifications>
- </product>
- <product id="p002" category="electronics">
- <name>Laptop</name>
- <description>Powerful laptop for professionals</description>
- <price currency="USD">1299.99</price>
- <stock>25</stock>
- <specifications>
- <spec name="display">15.6 inch Full HD</spec>
- <spec name="storage">512GB SSD</spec>
- <spec name="ram">16GB</spec>
- <spec name="processor">Intel Core i7</spec>
- </specifications>
- </product>
- </catalog>`;
- let outputXml = transformProductCatalog(inputXml);
- console.log(outputXml);
复制代码- import org.w3c.dom.*;
- import javax.xml.parsers.*;
- import javax.xml.transform.*;
- import javax.xml.transform.dom.DOMSource;
- import javax.xml.transform.stream.StreamResult;
- import java.io.*;
- public class ProductCatalogTransformer {
- public static String transformProductCatalog(String inputXmlString) throws Exception {
- // 解析输入XML
- DocumentBuilderFactory inputFactory = DocumentBuilderFactory.newInstance();
- DocumentBuilder inputBuilder = inputFactory.newDocumentBuilder();
- Document inputDoc = inputBuilder.parse(new ByteArrayInputStream(inputXmlString.getBytes()));
-
- // 创建输出XML文档
- DocumentBuilderFactory outputFactory = DocumentBuilderFactory.newInstance();
- outputFactory.setNamespaceAware(true);
- DocumentBuilder outputBuilder = outputFactory.newDocumentBuilder();
- Document outputDoc = outputBuilder.newDocument();
-
- // 创建根元素
- Element productList = outputDoc.createElementNS("http://www.example.com/products", "productList");
- outputDoc.appendChild(productList);
-
- // 获取所有产品节点
- NodeList products = inputDoc.getElementsByTagName("product");
-
- // 遍历每个产品
- for (int i = 0; i < products.getLength(); i++) {
- Element inputProduct = (Element) products.item(i);
-
- // 创建产品节点
- Element outputProduct = outputDoc.createElementNS("http://www.example.com/products", "product");
-
- // 设置属性
- outputProduct.setAttribute("sku", inputProduct.getAttribute("id"));
- outputProduct.setAttribute("type", inputProduct.getAttribute("category"));
-
- // 添加子元素
- Element name = (Element) inputProduct.getElementsByTagName("name").item(0);
- Element title = outputDoc.createElementNS("http://www.example.com/products", "title");
- title.appendChild(outputDoc.createTextNode(name.getTextContent()));
- outputProduct.appendChild(title);
-
- Element description = (Element) inputProduct.getElementsByTagName("description").item(0);
- Element details = outputDoc.createElementNS("http://www.example.com/products", "details");
- details.appendChild(outputDoc.createTextNode(description.getTextContent()));
- outputProduct.appendChild(details);
-
- Element price = (Element) inputProduct.getElementsByTagName("price").item(0);
- Element cost = outputDoc.createElementNS("http://www.example.com/products", "cost");
- cost.setAttribute("currency", price.getAttribute("currency"));
- cost.appendChild(outputDoc.createTextNode(price.getTextContent()));
- outputProduct.appendChild(cost);
-
- Element stock = (Element) inputProduct.getElementsByTagName("stock").item(0);
- Element inventory = outputDoc.createElementNS("http://www.example.com/products", "inventory");
- inventory.appendChild(outputDoc.createTextNode(stock.getTextContent()));
- outputProduct.appendChild(inventory);
-
- // 处理规格/特性
- Element specifications = (Element) inputProduct.getElementsByTagName("specifications").item(0);
- NodeList specs = specifications.getElementsByTagName("spec");
- Element features = outputDoc.createElementNS("http://www.example.com/products", "features");
-
- for (int j = 0; j < specs.getLength(); j++) {
- Element spec = (Element) specs.item(j);
- Element feature = outputDoc.createElementNS("http://www.example.com/products", "feature");
- feature.setAttribute("key", spec.getAttribute("name"));
- feature.appendChild(outputDoc.createTextNode(spec.getTextContent()));
- features.appendChild(feature);
- }
-
- outputProduct.appendChild(features);
-
- // 将产品添加到产品列表
- productList.appendChild(outputProduct);
- }
-
- // 序列化输出XML
- TransformerFactory transformerFactory = TransformerFactory.newInstance();
- Transformer transformer = transformerFactory.newTransformer();
- transformer.setOutputProperty(OutputKeys.INDENT, "yes");
- transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
-
- StringWriter writer = new StringWriter();
- transformer.transform(new DOMSource(outputDoc), new StreamResult(writer));
-
- return writer.toString();
- }
-
- public static void main(String[] args) {
- try {
- String inputXml = "<?xml version="1.0" encoding="UTF-8"?>\n" +
- "<catalog>\n" +
- " <product id="p001" category="electronics">\n" +
- " <name>Smartphone</name>\n" +
- " <description>A high-end smartphone with advanced features</description>\n" +
- " <price currency="USD">699.99</price>\n" +
- " <stock>50</stock>\n" +
- " <specifications>\n" +
- " <spec name="display">6.1 inch OLED</spec>\n" +
- " <spec name="storage">128GB</spec>\n" +
- " <spec name="ram">6GB</spec>\n" +
- " <spec name="camera">12MP dual camera</spec>\n" +
- " </specifications>\n" +
- " </product>\n" +
- " <product id="p002" category="electronics">\n" +
- " <name>Laptop</name>\n" +
- " <description>Powerful laptop for professionals</description>\n" +
- " <price currency="USD">1299.99</price>\n" +
- " <stock>25</stock>\n" +
- " <specifications>\n" +
- " <spec name="display">15.6 inch Full HD</spec>\n" +
- " <spec name="storage">512GB SSD</spec>\n" +
- " <spec name="ram">16GB</spec>\n" +
- " <spec name="processor">Intel Core i7</spec>\n" +
- " </specifications>\n" +
- " </product>\n" +
- "</catalog>";
-
- String outputXml = transformProductCatalog(inputXml);
- System.out.println(outputXml);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
复制代码- from xml.dom.minidom import parseString, Document
- def transform_product_catalog(input_xml_string):
- # 解析输入XML
- input_doc = parseString(input_xml_string)
-
- # 创建输出XML文档
- output_doc = Document()
-
- # 创建根元素并添加命名空间
- product_list = output_doc.createElementNS("http://www.example.com/products", "productList")
- output_doc.appendChild(product_list)
-
- # 获取所有产品节点
- products = input_doc.getElementsByTagName("product")
-
- # 遍历每个产品
- for i in range(products.length):
- input_product = products.item(i)
-
- # 创建产品节点
- output_product = output_doc.createElementNS("http://www.example.com/products", "product")
-
- # 设置属性
- output_product.setAttribute("sku", input_product.getAttribute("id"))
- output_product.setAttribute("type", input_product.getAttribute("category"))
-
- # 添加子元素
- name = input_product.getElementsByTagName("name")[0]
- title = output_doc.createElementNS("http://www.example.com/products", "title")
- title.appendChild(output_doc.createTextNode(name.firstChild.data))
- output_product.appendChild(title)
-
- description = input_product.getElementsByTagName("description")[0]
- details = output_doc.createElementNS("http://www.example.com/products", "details")
- details.appendChild(output_doc.createTextNode(description.firstChild.data))
- output_product.appendChild(details)
-
- price = input_product.getElementsByTagName("price")[0]
- cost = output_doc.createElementNS("http://www.example.com/products", "cost")
- cost.setAttribute("currency", price.getAttribute("currency"))
- cost.appendChild(output_doc.createTextNode(price.firstChild.data))
- output_product.appendChild(cost)
-
- stock = input_product.getElementsByTagName("stock")[0]
- inventory = output_doc.createElementNS("http://www.example.com/products", "inventory")
- inventory.appendChild(output_doc.createTextNode(stock.firstChild.data))
- output_product.appendChild(inventory)
-
- # 处理规格/特性
- specifications = input_product.getElementsByTagName("specifications")[0]
- specs = specifications.getElementsByTagName("spec")
- features = output_doc.createElementNS("http://www.example.com/products", "features")
-
- for j in range(specs.length):
- spec = specs.item(j)
- feature = output_doc.createElementNS("http://www.example.com/products", "feature")
- feature.setAttribute("key", spec.getAttribute("name"))
- feature.appendChild(output_doc.createTextNode(spec.firstChild.data))
- features.appendChild(feature)
-
- output_product.appendChild(features)
-
- # 将产品添加到产品列表
- product_list.appendChild(output_product)
-
- # 返回格式化的XML字符串
- return output_doc.toprettyxml(indent=" ")
- # 使用示例
- input_xml = """<?xml version="1.0" encoding="UTF-8"?>
- <catalog>
- <product id="p001" category="electronics">
- <name>Smartphone</name>
- <description>A high-end smartphone with advanced features</description>
- <price currency="USD">699.99</price>
- <stock>50</stock>
- <specifications>
- <spec name="display">6.1 inch OLED</spec>
- <spec name="storage">128GB</spec>
- <spec name="ram">6GB</spec>
- <spec name="camera">12MP dual camera</spec>
- </specifications>
- </product>
- <product id="p002" category="electronics">
- <name>Laptop</name>
- <description>Powerful laptop for professionals</description>
- <price currency="USD">1299.99</price>
- <stock>25</stock>
- <specifications>
- <spec name="display">15.6 inch Full HD</spec>
- <spec name="storage">512GB SSD</spec>
- <spec name="ram">16GB</spec>
- <spec name="processor">Intel Core i7</spec>
- </specifications>
- </product>
- </catalog>"""
- output_xml = transform_product_catalog(input_xml)
- print(output_xml)
复制代码- using System;
- using System.Xml;
- public class ProductCatalogTransformer
- {
- public static string TransformProductCatalog(string inputXmlString)
- {
- // 解析输入XML
- XmlDocument inputDoc = new XmlDocument();
- inputDoc.LoadXml(inputXmlString);
-
- // 创建输出XML文档
- XmlDocument outputDoc = new XmlDocument();
-
- // 添加命名空间
- XmlNamespaceManager nsManager = new XmlNamespaceManager(outputDoc.NameTable);
- nsManager.AddNamespace("ns", "http://www.example.com/products");
-
- // 创建根元素
- XmlElement productList = outputDoc.CreateElement("ns", "productList", "http://www.example.com/products");
- outputDoc.AppendChild(productList);
-
- // 获取所有产品节点
- XmlNodeList products = inputDoc.SelectNodes("//product");
-
- // 遍历每个产品
- foreach (XmlNode inputProductNode in products)
- {
- XmlElement inputProduct = (XmlElement)inputProductNode;
-
- // 创建产品节点
- XmlElement outputProduct = outputDoc.CreateElement("ns", "product", "http://www.example.com/products");
-
- // 设置属性
- outputProduct.SetAttribute("sku", inputProduct.GetAttribute("id"));
- outputProduct.SetAttribute("type", inputProduct.GetAttribute("category"));
-
- // 添加子元素
- XmlNode nameNode = inputProduct.SelectSingleNode("name");
- XmlElement title = outputDoc.CreateElement("ns", "title", "http://www.example.com/products");
- title.InnerText = nameNode.InnerText;
- outputProduct.AppendChild(title);
-
- XmlNode descriptionNode = inputProduct.SelectSingleNode("description");
- XmlElement details = outputDoc.CreateElement("ns", "details", "http://www.example.com/products");
- details.InnerText = descriptionNode.InnerText;
- outputProduct.AppendChild(details);
-
- XmlNode priceNode = inputProduct.SelectSingleNode("price");
- XmlElement cost = outputDoc.CreateElement("ns", "cost", "http://www.example.com/products");
- cost.SetAttribute("currency", priceNode.Attributes["currency"].Value);
- cost.InnerText = priceNode.InnerText;
- outputProduct.AppendChild(cost);
-
- XmlNode stockNode = inputProduct.SelectSingleNode("stock");
- XmlElement inventory = outputDoc.CreateElement("ns", "inventory", "http://www.example.com/products");
- inventory.InnerText = stockNode.InnerText;
- outputProduct.AppendChild(inventory);
-
- // 处理规格/特性
- XmlNode specificationsNode = inputProduct.SelectSingleNode("specifications");
- XmlNodeList specs = specificationsNode.SelectNodes("spec");
- XmlElement features = outputDoc.CreateElement("ns", "features", "http://www.example.com/products");
-
- foreach (XmlNode specNode in specs)
- {
- XmlElement spec = (XmlElement)specNode;
- XmlElement feature = outputDoc.CreateElement("ns", "feature", "http://www.example.com/products");
- feature.SetAttribute("key", spec.GetAttribute("name"));
- feature.InnerText = spec.InnerText;
- features.AppendChild(feature);
- }
-
- outputProduct.AppendChild(features);
-
- // 将产品添加到产品列表
- productList.AppendChild(outputProduct);
- }
-
- // 格式化输出XML
- outputDoc.PreserveWhitespace = true;
-
- // 创建XML编写器设置
- XmlWriterSettings settings = new XmlWriterSettings();
- settings.Indent = true;
- settings.IndentChars = " ";
- settings.NewLineOnAttributes = false;
- settings.OmitXmlDeclaration = false;
-
- // 使用StringWriter和XmlWriter来格式化XML
- using (System.IO.StringWriter stringWriter = new System.IO.StringWriter())
- {
- using (XmlWriter xmlWriter = XmlWriter.Create(stringWriter, settings))
- {
- outputDoc.Save(xmlWriter);
- }
- return stringWriter.ToString();
- }
- }
-
- public static void Main(string[] args)
- {
- string inputXml = @"<?xml version=""1.0"" encoding=""UTF-8""?>
- <catalog>
- <product id=""p001"" category=""electronics"">
- <name>Smartphone</name>
- <description>A high-end smartphone with advanced features</description>
- <price currency=""USD"">699.99</price>
- <stock>50</stock>
- <specifications>
- <spec name=""display"">6.1 inch OLED</spec>
- <spec name=""storage"">128GB</spec>
- <spec name=""ram"">6GB</spec>
- <spec name=""camera"">12MP dual camera</spec>
- </specifications>
- </product>
- <product id=""p002"" category=""electronics"">
- <name>Laptop</name>
- <description>Powerful laptop for professionals</description>
- <price currency=""USD"">1299.99</price>
- <stock>25</stock>
- <specifications>
- <spec name=""display"">15.6 inch Full HD</spec>
- <spec name=""storage"">512GB SSD</spec>
- <spec name=""ram"">16GB</spec>
- <spec name=""processor"">Intel Core i7</spec>
- </specifications>
- </product>
- </catalog>";
-
- string outputXml = TransformProductCatalog(inputXml);
- Console.WriteLine(outputXml);
- }
- }
复制代码
性能优化和最佳实践
性能优化技巧
DOM操作通常是昂贵的,特别是对于大型XML文档。尽量减少DOM操作的次数可以显著提高性能。
- // 不好的做法:多次查询DOM
- let titles = xmlDoc.getElementsByTagName("title");
- for (let i = 0; i < titles.length; i++) {
- let title = titles[i];
- let lang = title.getAttribute("lang");
- console.log(lang);
- }
- // 好的做法:一次性查询并缓存结果
- let titles = xmlDoc.getElementsByTagName("title");
- let titleLangs = [];
- for (let i = 0; i < titles.length; i++) {
- titleLangs.push(titles[i].getAttribute("lang"));
- }
- console.log(titleLangs);
复制代码
对于复杂的查询,使用XPath通常比多次DOM遍历更高效。
- // 不好的做法:多次DOM遍历
- let books = xmlDoc.getElementsByTagName("book");
- let expensiveBooks = [];
- for (let i = 0; i < books.length; i++) {
- let priceNode = books[i].getElementsByTagName("price")[0];
- let price = parseFloat(priceNode.textContent);
- if (price > 30) {
- expensiveBooks.push(books[i]);
- }
- }
- // 好的做法:使用XPath
- let expensiveBooks = xmlDoc.evaluate('//book[price>30]', xmlDoc, null, XPathResult.ANY_TYPE, null);
- let nodes = [];
- let node = expensiveBooks.iterateNext();
- while (node) {
- nodes.push(node);
- node = expensiveBooks.iterateNext();
- }
复制代码
当需要向DOM添加多个节点时,使用DocumentFragment可以减少重绘和回流次数。
- // 不好的做法:多次直接添加到DOM
- let container = xmlDoc.createElement("container");
- for (let i = 0; i < 100; i++) {
- let item = xmlDoc.createElement("item");
- item.setAttribute("id", "item" + i);
- container.appendChild(item);
- }
- // 好的做法:使用DocumentFragment
- let container = xmlDoc.createElement("container");
- let fragment = xmlDoc.createDocumentFragment();
- for (let i = 0; i < 100; i++) {
- let item = xmlDoc.createElement("item");
- item.setAttribute("id", "item" + i);
- fragment.appendChild(item);
- }
- container.appendChild(fragment);
复制代码
重复访问相同的属性或节点会增加开销,应该缓存这些值。
- // 不好的做法:重复访问相同的属性
- let book = xmlDoc.getElementsByTagName("book")[0];
- if (book.getAttribute("category") === "fiction") {
- console.log("Fiction book: " + book.getAttribute("category"));
- }
- // 好的做法:缓存属性值
- let book = xmlDoc.getElementsByTagName("book")[0];
- let category = book.getAttribute("category");
- if (category === "fiction") {
- console.log("Fiction book: " + category);
- }
复制代码
根据需求选择最合适的查询方法,例如,如果只需要一个元素,使用querySelector或getElementById比getElementsByTagName更高效。
- // 不好的做法:使用getElementsByTagName获取单个元素
- let books = xmlDoc.getElementsByTagName("book");
- let firstBook = books[0];
- // 好的做法:使用querySelector获取单个元素
- let firstBook = xmlDoc.querySelector("book");
复制代码
最佳实践
始终对DOM操作进行错误处理,特别是当处理外部来源的XML数据时。
- try {
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
-
- // 检查解析错误
- let parserError = xmlDoc.getElementsByTagName("parsererror")[0];
- if (parserError) {
- throw new Error("XML parsing error: " + parserError.textContent);
- }
-
- // 继续处理XML
- } catch (e) {
- console.error("Error processing XML:", e.message);
- }
复制代码
当处理带有命名空间的XML时,始终使用支持命名空间的方法。
- // 不好的做法:忽略命名空间
- let titles = xmlDoc.getElementsByTagName("title");
- // 好的做法:使用命名空间
- let titles = xmlDoc.getElementsByTagNameNS("http://www.example.com/books", "title");
复制代码
在处理XML数据之前,验证其结构是否符合预期。
- function validateXmlStructure(xmlDoc) {
- // 检查必需的元素是否存在
- if (!xmlDoc.getElementsByTagName("bookstore").length) {
- throw new Error("Missing required element: bookstore");
- }
-
- // 检查必需的属性是否存在
- let books = xmlDoc.getElementsByTagName("book");
- for (let i = 0; i < books.length; i++) {
- if (!books[i].hasAttribute("category")) {
- throw new Error("Missing required attribute: category");
- }
- }
-
- return true;
- }
- try {
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
- validateXmlStructure(xmlDoc);
- // 继续处理XML
- } catch (e) {
- console.error("XML validation error:", e.message);
- }
复制代码
根据需求选择合适的XML解析器,例如,对于大型文件,考虑使用SAX或StAX解析器而不是DOM解析器。
- // Java示例:使用SAX解析器处理大型XML文件
- import org.xml.sax.helpers.DefaultHandler;
- import org.xml.sax.Attributes;
- import javax.xml.parsers.SAXParser;
- import javax.xml.parsers.SAXParserFactory;
- import java.io.ByteArrayInputStream;
- public class LargeXmlProcessor extends DefaultHandler {
- private StringBuilder currentValue = new StringBuilder();
-
- @Override
- public void startElement(String uri, String localName, String qName, Attributes attributes) {
- currentValue.setLength(0);
-
- if (qName.equals("book")) {
- String category = attributes.getValue("category");
- System.out.println("Processing book with category: " + category);
- }
- }
-
- @Override
- public void characters(char[] ch, int start, int length) {
- currentValue.append(ch, start, length);
- }
-
- @Override
- public void endElement(String uri, String localName, String qName) {
- if (qName.equals("title")) {
- System.out.println("Title: " + currentValue.toString());
- } else if (qName.equals("price")) {
- System.out.println("Price: " + currentValue.toString());
- }
- }
-
- public static void main(String[] args) {
- try {
- SAXParserFactory factory = SAXParserFactory.newInstance();
- SAXParser saxParser = factory.newSAXParser();
-
- LargeXmlProcessor handler = new LargeXmlProcessor();
-
- // 假设xmlString是一个大型XML文件的内容
- saxParser.parse(new ByteArrayInputStream(xmlString.getBytes()), handler);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
复制代码
对于关键业务数据,使用XML Schema进行验证可以确保数据的完整性和正确性。
- // Java示例:使用XML Schema验证XML文档
- import javax.xml.XMLConstants;
- import javax.xml.transform.Source;
- import javax.xml.transform.stream.StreamSource;
- import javax.xml.validation.*;
- import org.xml.sax.SAXException;
- import java.io.*;
- public class XmlSchemaValidator {
- public static boolean validate(String xmlString, String schemaString) {
- try {
- // 创建SchemaFactory
- SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
-
- // 创建Schema
- Source schemaSource = new StreamSource(new StringReader(schemaString));
- Schema schema = factory.newSchema(schemaSource);
-
- // 创建Validator
- Validator validator = schema.newValidator();
-
- // 验证XML文档
- Source xmlSource = new StreamSource(new StringReader(xmlString));
- validator.validate(xmlSource);
-
- return true;
- } catch (SAXException e) {
- System.out.println("Validation error: " + e.getMessage());
- return false;
- } catch (IOException e) {
- System.out.println("IO error: " + e.getMessage());
- return false;
- }
- }
-
- public static void main(String[] args) {
- String xmlString = "<?xml version="1.0" encoding="UTF-8"?>\n" +
- "<bookstore>\n" +
- " <book category="fiction">\n" +
- " <title>Harry Potter</title>\n" +
- " <author>J.K. Rowling</author>\n" +
- " <year>2005</year>\n" +
- " <price>29.99</price>\n" +
- " </book>\n" +
- "</bookstore>";
-
- String schemaString = "<?xml version="1.0" encoding="UTF-8"?>\n" +
- "<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">\n" +
- " <xs:element name="bookstore">\n" +
- " <xs:complexType>\n" +
- " <xs:sequence>\n" +
- " <xs:element name="book" maxOccurs="unbounded">\n" +
- " <xs:complexType>\n" +
- " <xs:sequence>\n" +
- " <xs:element name="title" type="xs:string"/>\n" +
- " <xs:element name="author" type="xs:string"/>\n" +
- " <xs:element name="year" type="xs:gYear"/>\n" +
- " <xs:element name="price" type="xs:decimal"/>\n" +
- " </xs:sequence>\n" +
- " <xs:attribute name="category" type="xs:string" use="required"/>\n" +
- " </xs:complexType>\n" +
- " </xs:element>\n" +
- " </xs:sequence>\n" +
- " </xs:complexType>\n" +
- " </xs:element>\n" +
- "</xs:schema>";
-
- boolean isValid = validate(xmlString, schemaString);
- System.out.println("XML is " + (isValid ? "valid" : "invalid"));
- }
- }
复制代码
常见问题及解决方案
问题1:XML解析错误
问题描述:在解析XML文档时遇到错误,如格式不正确、编码问题等。
解决方案:
- // JavaScript
- try {
- let parser = new DOMParser();
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
-
- // 检查解析错误
- let parserError = xmlDoc.getElementsByTagName("parsererror")[0];
- if (parserError) {
- throw new Error("XML parsing error: " + parserError.textContent);
- }
-
- // 继续处理XML
- } catch (e) {
- console.error("Error parsing XML:", e.message);
- // 处理错误或提供默认值
- }
复制代码- // Java
- try {
- DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
- DocumentBuilder builder = factory.newDocumentBuilder();
- Document document = builder.parse(new InputSource(new StringReader(xmlString)));
-
- // 继续处理XML
- } catch (SAXException e) {
- System.err.println("XML parsing error: " + e.getMessage());
- // 处理错误或提供默认值
- } catch (IOException e) {
- System.err.println("IO error: " + e.getMessage());
- // 处理错误或提供默认值
- } catch (ParserConfigurationException e) {
- System.err.println("Parser configuration error: " + e.getMessage());
- // 处理错误或提供默认值
- }
复制代码
问题2:命名空间处理问题
问题描述:在处理带有命名空间的XML文档时,无法正确获取元素或属性。
解决方案:
- // JavaScript
- // 创建命名空间解析器
- function nsResolver(prefix) {
- var ns = {
- 'ns': 'http://www.example.com/namespace'
- };
- return ns[prefix] || null;
- }
- // 使用命名空间执行XPath查询
- let xpathResult = xmlDoc.evaluate('//ns:book', xmlDoc, nsResolver, XPathResult.ANY_TYPE, null);
- // 或者使用getElementsByTagNameNS
- let books = xmlDoc.getElementsByTagNameNS('http://www.example.com/namespace', 'book');
复制代码- // Java
- // 创建命名空间上下文
- NamespaceContext nsContext = new NamespaceContext() {
- @Override
- public String getNamespaceURI(String prefix) {
- if (prefix.equals("ns")) {
- return "http://www.example.com/namespace";
- }
- return null;
- }
- @Override
- public String getPrefix(String namespaceURI) {
- return null;
- }
- @Override
- public Iterator<String> getPrefixes(String namespaceURI) {
- return null;
- }
- };
- // 创建XPath并设置命名空间上下文
- XPathFactory xpathFactory = XPathFactory.newInstance();
- XPath xpath = xpathFactory.newXPath();
- xpath.setNamespaceContext(nsContext);
- // 使用命名空间执行XPath查询
- XPathExpression expr = xpath.compile("//ns:book");
- NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
- // 或者使用getElementsByTagNameNS
- NodeList nodes = document.getElementsByTagNameNS("http://www.example.com/namespace", "book");
复制代码
问题3:性能问题
问题描述:处理大型XML文档时性能低下,内存占用高。
解决方案:
- // Java:使用SAX解析器代替DOM解析器
- import org.xml.sax.helpers.DefaultHandler;
- import org.xml.sax.Attributes;
- import javax.xml.parsers.SAXParser;
- import javax.xml.parsers.SAXParserFactory;
- import java.io.ByteArrayInputStream;
- public class LargeXmlProcessor extends DefaultHandler {
- @Override
- public void startElement(String uri, String localName, String qName, Attributes attributes) {
- // 处理元素开始
- }
-
- @Override
- public void characters(char[] ch, int start, int length) {
- // 处理元素内容
- }
-
- @Override
- public void endElement(String uri, String localName, String qName) {
- // 处理元素结束
- }
-
- public static void main(String[] args) {
- try {
- SAXParserFactory factory = SAXParserFactory.newInstance();
- SAXParser saxParser = factory.newSAXParser();
-
- LargeXmlProcessor handler = new LargeXmlProcessor();
-
- // 处理大型XML文件
- saxParser.parse(new ByteArrayInputStream(xmlString.getBytes()), handler);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
复制代码- # Python:使用iterparse处理大型XML文件
- from xml.etree.ElementTree import iterparse
- def process_large_xml(file_path):
- # 获取迭代器
- context = iterparse(file_path, events=("start", "end"))
-
- # 获取根元素
- event, root = next(context)
-
- for event, elem in context:
- if event == "end" and elem.tag == "book":
- # 处理book元素
- category = elem.get("category")
- title = elem.find("title").text
- print(f"Processing book: {title}, Category: {category}")
-
- # 清理已处理的元素以节省内存
- root.clear()
-
- # 关闭文件
- if hasattr(context, 'close'):
- context.close()
- # 使用示例
- process_large_xml("large_books.xml")
复制代码
问题4:特殊字符处理
问题描述:XML中的特殊字符(如<, >, &, “, ‘)导致解析错误。
解决方案:
- // JavaScript:转义XML特殊字符
- function escapeXml(unsafe) {
- return unsafe.replace(/[<>&'"]/g, function(c) {
- switch (c) {
- case '<': return '<';
- case '>': return '>';
- case '&': return '&';
- case '\'': return ''';
- case '"': return '"';
- }
- });
- }
- // 使用示例
- let unsafeText = "This is a <test> & 'example'";
- let safeText = escapeXml(unsafeText);
- console.log(safeText); // 输出: This is a <test> & 'example'
复制代码- // Java:转义XML特殊字符
- import org.apache.commons.text.StringEscapeUtils;
- public class XmlUtils {
- public static String escapeXml(String unsafe) {
- return StringEscapeUtils.escapeXml11(unsafe);
- }
-
- public static void main(String[] args) {
- String unsafeText = "This is a <test> & 'example'";
- String safeText = escapeXml(unsafeText);
- System.out.println(safeText); // 输出: This is a <test> & 'example'
- }
- }
复制代码
问题5:XPath查询失败
问题描述:XPath查询无法找到预期的元素或属性。
解决方案:
- // JavaScript:调试XPath查询
- function debugXPath(xmlDoc, xpath) {
- try {
- let result = xmlDoc.evaluate(xpath, xmlDoc, null, XPathResult.ANY_TYPE, null);
- let nodes = [];
- let node = result.iterateNext();
- while (node) {
- nodes.push(node);
- node = result.iterateNext();
- }
-
- console.log(`XPath "${xpath}" found ${nodes.length} nodes:`);
- nodes.forEach((node, index) => {
- console.log(`Node ${index + 1}: ${node.nodeName} (type: ${node.nodeType})`);
- if (node.nodeType === Node.ELEMENT_NODE) {
- console.log(` Attributes: ${node.attributes.length}`);
- for (let i = 0; i < node.attributes.length; i++) {
- console.log(` ${node.attributes[i].name}: ${node.attributes[i].value}`);
- }
- } else if (node.nodeType === Node.ATTRIBUTE_NODE) {
- console.log(` Value: ${node.value}`);
- } else if (node.nodeType === Node.TEXT_NODE) {
- console.log(` Content: ${node.textContent.trim()}`);
- }
- });
-
- return nodes;
- } catch (e) {
- console.error(`Error evaluating XPath "${xpath}": ${e.message}`);
- return [];
- }
- }
- // 使用示例
- let xmlDoc = parser.parseFromString(xmlString, "text/xml");
- debugXPath(xmlDoc, '//book[@category="fiction"]/title');
复制代码- // Java:调试XPath查询
- import javax.xml.xpath.*;
- import org.w3c.dom.*;
- public class XPathDebugger {
- public static NodeList debugXPath(Document document, String xpath) {
- try {
- XPathFactory xpathFactory = XPathFactory.newInstance();
- XPath xpathObj = xpathFactory.newXPath();
-
- XPathExpression expr = xpathObj.compile(xpath);
- NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
-
- System.out.println("XPath "" + xpath + "" found " + nodes.getLength() + " nodes:");
- for (int i = 0; i < nodes.getLength(); i++) {
- Node node = nodes.item(i);
- System.out.println("Node " + (i + 1) + ": " + node.getNodeName() + " (type: " + node.getNodeType() + ")");
-
- if (node.getNodeType() == Node.ELEMENT_NODE) {
- Element element = (Element) node;
- NamedNodeMap attributes = element.getAttributes();
- System.out.println(" Attributes: " + attributes.getLength());
- for (int j = 0; j < attributes.getLength(); j++) {
- Node attr = attributes.item(j);
- System.out.println(" " + attr.getNodeName() + ": " + attr.getNodeValue());
- }
- } else if (node.getNodeType() == Node.ATTRIBUTE_NODE) {
- System.out.println(" Value: " + node.getNodeValue());
- } else if (node.getNodeType() == Node.TEXT_NODE) {
- System.out.println(" Content: " + node.getTextContent().trim());
- }
- }
-
- return nodes;
- } catch (XPathExpressionException e) {
- System.err.println("Error evaluating XPath "" + xpath + "": " + e.getMessage());
- return null;
- }
- }
-
- public static void main(String[] args) {
- try {
- DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
- DocumentBuilder builder = factory.newDocumentBuilder();
- Document document = builder.parse(new InputSource(new StringReader(xmlString)));
-
- debugXPath(document, "//book[@category='fiction']/title");
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
复制代码
总结
XML DOM属性获取技术是处理XML文档的核心技能,它为开发者提供了强大的工具来访问、操作和转换XML数据。本文从基础语法到高级应用,全面解析了XML DOM属性获取技术的各个方面。
我们首先介绍了XML DOM的基础概念,包括DOM树结构和节点类型,然后详细讲解了DOM属性获取的基础语法和常用方法。通过丰富的代码示例,我们展示了如何在不同编程语言中获取、设置和删除XML元素的属性。
节点遍历技术是DOM操作的重要组成部分,我们讨论了如何访问父子节点、兄弟节点,以及如何高效地查找特定节点。在高级应用部分,我们深入探讨了XPath查询、命名空间处理和属性值转换等高级技术,这些技术能够帮助开发者解决复杂的数据访问难题。
通过实际应用案例,我们展示了如何将XML DOM属性获取技术应用于配置文件解析和数据转换等实际场景。此外,我们还提供了性能优化技巧和最佳实践,帮助开发者编写更高效、更可靠的XML处理代码。
最后,我们讨论了开发者在使用XML DOM时可能遇到的常见问题,并提供了相应的解决方案。这些解决方案涵盖了XML解析错误、命名空间处理、性能问题、特殊字符处理和XPath查询失败等方面。
掌握XML DOM属性获取技术对于现代软件开发至关重要,它不仅能够帮助开发者高效地处理XML数据,还能够解决复杂的数据访问难题。通过本文的学习,开发者应该能够深入理解XML DOM的工作原理,并能够灵活运用各种技术来处理XML文档。
随着数据交换和集成需求的不断增长,XML作为一种通用的数据格式将继续发挥重要作用。因此,深入理解和掌握XML DOM属性获取技术将为开发者在数据处理和系统集成领域提供强大的竞争优势。 |
|