Python：从 XPath 获取元素值（xpath获取属性）

25-04-24 3

在这篇文章中，我们将为您详细介绍Python：从XPath获取元素值的内容，并且讨论关于xpath获取属性的相关问题。此外，我们还会涉及一些关于C#和Xpath-在单个Xpath查询中使用节点集和字符

在这篇文章中，我们将为您详细介绍Python：从 XPath 获取元素值的内容，并且讨论关于xpath获取属性的相关问题。此外，我们还会涉及一些关于C# 和 Xpath - 在单个 Xpath 查询中使用节点集和字符串类型返回结果、Cypress系列（98）- cypress-xpath 插件, xpath() 命令详解、Java：Gson：fromJson：获取元素值、lxml xpath() 函数不适用于正确的 XPath 查询的知识，以帮助您更全面地了解这个主题。

本文目录一览：

Python：从 XPath 获取元素值（xpath获取属性）
C# 和 Xpath - 在单个 Xpath 查询中使用节点集和字符串类型返回结果
Cypress系列（98）- cypress-xpath 插件, xpath() 命令详解
Java：Gson：fromJson：获取元素值
lxml xpath() 函数不适用于正确的 XPath 查询

Python：从 XPath 获取元素值（xpath获取属性）

如何解决Python：从 XPath 获取元素值

我对 xpaths 和网络抓取非常陌生，所以如果这是一个相对较小的问题，我深表歉意。我正在尝试抓取多个网站以确保更新数据库中的数据。我能够获取部分字符串的 xPath，但不确定如何使用 xPath 获取完整值。

代码：

def xpath_soup(element):
    components = []
    child = element if element.name else element.parent
    for parent in child.parents:

        prevIoUs = itertools.islice(parent.children,parent.contents.index(child))
        xpath_tag = child.name
        xpath_index = sum(1 for i in prevIoUs if i.name == xpath_tag) + 1
        components.append(xpath_tag if xpath_index == 1 else ''%s[%d]'' % (xpath_tag,xpath_index))
        child = parent
    components.reverse()
    return ''/%s'' % ''/''.join(components)



page = requests.get("https://www.gaumard.com/obstetricmr")
html = str(BeautifulSoup(page.content,''html.parser''))
soup = BeautifulSoup(html,''lxml'')
elem = soup.find(string=re.compile(''xt-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice''))
xPathValue = xpath_soup(elem)
print(xPathValue)

我正在尝试使用 xPathValue 获取元素的完整值。

预期的结果是完整版 xt-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice

存在

Obstetric MR™ is a next-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice faster than ever before. Using the latest technology in holographic visualization,Obstetric MR brings digital learning content into the physical simulation exercise,allowing participants to link kNowledge and skill through an entirely new hands-on training experience. The future of labor and delivery simulation is here.

这个完整的价值来自于使用 xPathValue。

解决方法

以下是获取带有 XPath 的全文的方法。

import requests
from lxml import html

page = requests.get("https://www.gaumard.com/obstetricmr").text
text = html.fromstring(page).xpath(''//*[@][2]/div/text()'')
print(text[0].strip())

输出：

Obstetric MR™ is a next-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice faster than ever before. Using the latest technology in holographic visualization,Obstetric MR brings digital learning content into the physical simulation exercise,allowing participants to link knowledge and skill through an entirely new hands-on training experience. The future of labor and delivery simulation is here.

特定的 XPath 不会有太大帮助，因为如上所述，网页可能会有所不同。用于搜索文本节点并获取包含该字符串的节点的数组或列表的通用 XPath 可以帮助进行一些后期处理。

在 Firefox 控制台上试用：

nodes = $x(''//*[contains(text(),"next-generation mixed reality")]'',window.document,"nodes");
<- Array [ div ]

nodes[0].textContent;
<- "Obstetric MR™ is a next-generation...(redacted)"

此 XPath 可以在其他页面上运行
''//*[contains(text(),"next-generation mixed reality")]''
前提是它们包含 next-generation mixed reality 字符串。

同样使用python：

import requests
from lxml import html
url = ''https://www.gaumard.com/obstetricmr''
response = requests.get(url)
html_doc = response.content
xpath0 = ''//*[contains(text(),"next-generation mixed reality")]''
result_arr = html.fromstring(html_doc).xpath(xpath0)
result_arr[0].text

输出：

''Obstetric MR™ is a next-generation mixed...''

C# 和 Xpath - 在单个 Xpath 查询中使用节点集和字符串类型返回结果

如何解决C# 和 Xpath - 在单个 Xpath 查询中使用节点集和字符串类型返回结果

在项目中，我使用 Xpath 来抓取价格。在这种情况下，我可以有 2 个选项来获取下一个查询指定的价格：

var xpath = @"substring-after(//div[@price''],":")|//span[@pln">oldPrice"]";

在 C# 代码中：

            HtmlDocument htmlDocument = new HtmlDocument();
                htmlDocument.LoadHtml(html);
            XPathNavigator navigator = document.DocumentNode.CreateNavigator();
            var eval = navigator.Evaluate(xpath); // here i get error: Expression must evaluate to a node-set.
            var expression = navigator.Compile(xpath); // and also here i get error: Expression must evaluate to a node-set.

我知道 substring-after(//div[@price''],":") 是字符串的返回类型。并且 //span[@] 是 Node-Set 的返回类型

你有什么建议我应该如何处理这个案子？
a) 我是否应该拆分 xpath 并解析由“|”分隔的每个部分?
b) 或者有没有其他方法可以使用上面的组合 xpath 查询获得结果而不拆分字符串和检查每个部分？
c) 我是否错过了任何其他符合我要求的课程？

我希望有足够的上下文。

解决方法

联合运算符仅适用于节点集，因此您的表达式应该会失败，因为 LHS 是一个字符串（即使 //div 不选择任何内容，它也是一个字符串）。

您是否希望 //span[@] 返回单个节点？在这种情况下，您可以使用 string() 函数将其转换为字符串，然后使用 concat() 函数连接两个字符串。

考虑迁移到更高版本的 XPath，它允许您返回字符串序列。

Cypress系列（98）- cypress-xpath 插件, xpath() 命令详解

如果想从头学起Cypress，可以看下面的系列文章哦

https://www.cnblogs.com/poloyy/category/1768839.html

前置学习

首先，得对 xpath 语法熟悉哦，可看此链接进行学习

https://www.cnblogs.com/poloyy/p/12626196.html

官方地址

https://github.com/cypress-io/cypress-xpath

安装方式

npm

npm install -D cypress-xpath

Yarn

yarn add cypress-xpath --dev

项目导入插件

在 cypress/support/index.js 文件下写下面语句即可

require('cypress-xpath')

个人总结

调用 xpath() 命令的两种方式

// 直接 cy.
cy.xpath()

 获取到 element 元素之后再调用
cy.get(ul').xpath()
cy.xpath().xpath()
cy.get(div').first().xpath()

xpath() 命令的返回结果

单个 element 元素或多个 element 元素组成的数组

入门使用的栗子

it('简单的栗子',function () {
    cy.xpath('//ul/li')
        .should('have.length',6)
});

调用 Cypress 命令后再接 xpath 命令

it('调用 Cypress 命令后再接 xpath 命令',1)"> () {
    cy.xpath('//ul')
        .first()
        .xpath('./li')
});

调用 xpath 后再接一次 xpath 命令

it('调用 xpath 后再接一次 xpath 命令',1)"> () {
    cy.xpath('//body/ul')
        .xpath('./li')
});

根据属性定位元素

it('根据属性定位元素',1)"> () {
    cy.xpath('//*[@id="form-wrapper"]')
    cy.xpath('//*[@class]')
});

选取当前节点的父节点再找元素

it('选取当前节点的父节点',1)"> () {
    cy.xpath('//*[@id="form-wrapper"]/../h2')
});

根据索引定位

it('根据索引定位',1)"> () {
    cy.xpath('//body/ul[1]/li[3]')
});

条件表达式

it('条件表达式',1)"> () {
    cy.xpath('//*[@name="password" or @id="form-wrapper"]')
}

模糊匹配函数

it('模糊匹配函数',1)"> () {
    cy.xpath('//*[starts-with(@class,"e")]')
    cy.xpath('//*[contains(text(),"Show")]')
});

定位函数

it('定位函数',1)"> () {
    cy.xpath('//input[position()=1]')
});

其他定位方式

it('其他定位方式',1)"> () {
    cy.xpath('//li[position()=2]/preceding-sibling::li')
     等价写法
    cy.xpath('//li[position()=2]/../li[position()<2]')
});

Java：Gson：fromJson：获取元素值

如何解决Java：Gson：fromJson：获取元素值

我有一个json字符串为：

string jsonString = "{"Header":{"ID": "103","DateTime": "2020-07-29 09:14:23.802-4:00 1","PlazaID": "01","Lane": "Lane 20","IPAddr": "192.9.0.123"},"Body": {"EventMsg": "Status: Online","EventNum": "99999"}}";

我正在尝试使用Gson从上述json中获取ID的值，它给了我NullPointerException。我的代码：

JsonObject jsonObject = new Gson().fromJson(jsonString,JsonObject.class );                                        
//System.out.println("jsonObject: " + jsonObject.toString());
String _ID = jsonObject.get("ID").getAsstring();

我不确定代码中的错误在哪里。任何帮助表示赞赏。

编辑：按照@Arvind的建议，我尝试了他的代码，并收到此错误：

enter image description here

按照@Arvind的建议，这可行：

String _ID = jsonObject.get("Header").getAsJsonObject().get("ID").getAsstring();

解决方法

为清楚起见，让我们首先美化您的jsonString：

{
  "Header": {
    "ID": "103","DateTime": "2020-07-29 09:14:23.802-4:00 1","PlazaID": "01","Lane": "Lane 20","IPAddr": "192.9.0.123"
  },"Body": {
    "EventMsg": "Status: Online","EventNum": "99999"
  }
}

请注意，"ID"位于"Header"内部，因此您必须通过以下方式对其进行解析：

String _ID = jsonObject.getJsonObject("Header").get("ID").getAsString();

也请避免使用get()，因为有更好的便捷方法：

String _ID = jsonObject.getJsonObject("Header").getString("ID");

lxml xpath() 函数不适用于正确的 XPath 查询

如何解决lxml xpath() 函数不适用于正确的 XPath 查询

我正在尝试使用 lxml 库评估一些 XPath 查询，但是，由于某种原因，它似乎不起作用。这是代码

if __name__ == ''__main__'':
    xml = r''''''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.srcML.org/srcML/src" revision="0.9.5" language="Java" filename="File.java"><package>package <name><name>com</name><operator>.</operator><name>samples</name><operator>.</operator><name>e978092668</name></name>;</package>
<class><annotation>@<name>Path</name></annotation>
<specifier>public</specifier> class <name>Correct</name> <block>{
    <decl_stmt><decl><annotation>@<name>Inject</name></annotation>
    <specifier>private</specifier> <type><name>JsonWebToken</name></type> <name>field</name></decl>;</decl_stmt>
}</block></class>
</unit>''''''.encode("UTF-8")

    xpath = ''''''unit/class[((descendant-or-self::decl_stmt/decl[(type[name[text()=''JsonWebToken'']] and annotation[name[text()=''Inject'']])]) and (annotation[name[text()=''Path'']]))]''''''
    tree = etree.fromstring(xml)
    a = tree.xpath(xpath)
    print(len(a)) # returns 0 (matches)

我在 freeformatter.com 上使用完全相同的 XML 字符串尝试了完全相同的 xpath 查询，它工作并显示匹配。我不知道我自己的代码有什么问题，因为在大多数情况下，我是按照网站上的官方教程进行的。

编辑 1：

尝试使用命名空间。

    xpath = ''''''src:unit/src:class[((descendant-or-self::src:decl_stmt/src:decl[(src:type[src:name[text()=''JsonWebToken'']] and src:annotation[src:name[text()=''Inject'']])]) and (src:annotation[src:name[text()=''Path'']]))]''''''
    tree = etree.fromstring(xml)
    a = tree.xpath(xpath,namespaces={
        "src": "http://www.srcML.org/srcML/src"
    })
    print(len(a)) # returns 0 (matches)

谢谢！

解决方法

问题是当你这样做时：

tree = etree.fromstring(xml)

tree 具有上下文 src:unit，因此您的 xpath 正在 src:unit 中寻找子 src:unit。（如果你print(tree.tag)，你会看到{http://www.srcML.org/srcML/src}unit。）

尝试在 src:class... 处启动 xpath

xpath = ''''''src:class[((descendant-or-self::src:decl_stmt/src:decl[(src:type[src:name[text()=''JsonWebToken'']] and src:annotation[src:name[text()=''Inject'']])]) and (src:annotation[src:name[text()=''Path'']]))]''''''

今天关于Python：从 XPath 获取元素值和xpath获取属性的分享就到这里，希望大家有所收获，若想了解更多关于C# 和 Xpath - 在单个 Xpath 查询中使用节点集和字符串类型返回结果、Cypress系列（98）- cypress-xpath 插件, xpath() 命令详解、Java：Gson：fromJson：获取元素值、lxml xpath() 函数不适用于正确的 XPath 查询等相关知识，可以在本站进行查询。

本文标签：