Python Selenium获取所有“ href”属性（selenium获取a标签）

25-03-29 8

以上就是给各位分享PythonSelenium获取所有“href”属性，其中也会对selenium获取a标签进行解释，同时本文还将给你拓展python+selenium获取验证码、pythonsele

以上就是给各位分享Python Selenium获取所有“ href”属性，其中也会对selenium获取a标签进行解释，同时本文还将给你拓展python + selenium获取验证码、python selenium firefox使用、python selenium right click on an href and choose Save link as... on Chrome.、Python Selenium-获取href值等相关知识，如果能碰巧解决你现在面临的问题，别忘了关注本站，现在开始吧！

本文目录一览：

Python Selenium获取所有“ href”属性（selenium获取a标签）
python + selenium获取验证码
python selenium firefox使用
python selenium right click on an href and choose Save link as... on Chrome.
Python Selenium-获取href值

Python Selenium获取所有“ href”属性（selenium获取a标签）

如何在此页面上获取此“ h2”标题的所有“ href”属性？

<h2><a href="http://www.allitebooks.com/deep-learning-with-python-2/" rel="bookmark">Deep Learning with Python</a></h2>

我尝试过的没有得到href的是：

title = driver.find_elements_by_class_name(''entry-title'')title[0].get_attribute(''href'')

这没有获得“ a”标签的链接。如果我在“ a”标签上找到了所有元素，它将返回页面上的每个href（这不是我想要的）。我只想返回上述标题，但能够获取其url“
href”属性。

答案1

小编典典

这是从所有页面获取所有书籍的代码：

from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECdriver = webdriver.Chrome()baseUrl = "http://www.allitebooks.com/page/1/?s=python"driver.get(baseUrl)# wait = WebDriverWait(driver, 5)# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".search-result-list li")))# Get last page numberlastPage = int(driver.find_element(By.CSS_SELECTOR, ".pagination a:last-child").text)# Get all HREFs for the first page and save them in hrefs listjs = ''return [...document.querySelectorAll(".entry-title a")].map(e=>e.href)''hrefs = driver.execute_script(js)# Iterate throw all pages and get all HREFs of booksfor i in range(2, lastPage):    driver.get("http://www.allitebooks.com/page/" + str(i) + "/?s=python")    hrefs.extend(driver.execute_script(js))for href in hrefs:    print(href)

python + selenium获取验证码

解决验证码的方法：

方法一：让开发帮忙去掉验证码代码，重新部署环境。（不推荐）

方法二：弄个万能验证码，每次登陆都可以登陆。（不推荐）

方法三：用cookie添加登陆名和密码避开验证码的方式。（我还不会）

方法四：老老实实获取验证码。（重点讲这个）

采用方法四需要引用第三方库：pytesseract，该库依赖于Tesseract，所以需要先安装Tesseract。

1、安装Tesseract模块

git文档地址：https://digi.bib.uni-mannheim.de/tesseract/

请安装不带dev的稳定版，下载后就是一个exe安装包，直接右击安装即可。

2、如果您想使用其他语言，请下载相应的培训数据，直接下载整个zip文件，解压后将文件复制到''tessdata''目录中。C:\Program Files (x86)\Tesseract-OCR\tessdata

3、配置环境变量：

　　（1）编辑系统变量里面 path，添加下面的安装路径：C:\Program Files (x86)\Tesseract-OCR　

　　（2）添加TESSDATA_PREFIX变量，值为：C:\Program Files (x86)\Tesseract-OCR\tessdata

　　cmd命令模式下测试是否安装成功:
　　tesseract test.jpg text -l chi_sim

4、安装python的第三方库：　　

　　pip install pillow #一个python的图像处理库，pytesseract依赖
　　pip install pytesseract

5、找到pytesseract的安装包，C:\Python34\Lib\site-packages\pytesseract，编辑pytesseract.py文件(此步骤必须做，否则运行代码时会报错)：

　　tesseract_cmd = ''C:/Program Files (x86)/Tesseract-OCR/tesseract.exe''

获取验证码有2种思路：

1、截图登陆页面，再截取验证码图片，识别；

2、直接在登陆页面，定位到验证码，将验证码图片另存为，识别；

思路一具体实现过程：

（1）先计算浏览器与登陆页面截图的比例值，再计算对应的验证码图片位置。不这样做的话，会导致获取的验证码位置不正确。

browser.maximize_window()
    #获取浏览器大小
    size_window = browser.get_window_size()
    time.sleep(1)
    #获取截图
    browser.save_screenshot(''login.png'')
    login_img = Image.open(''login.png'')
    (login_width,login_height) = login_img.size
    logger.info(''截图的宽高：'')
    logger.info(login_width,login_height)
    #计算浏览器与截图比例
    scale = size_window[''width''] / login_width
    #获取验证码
    code_loc = browser.find_element_by_xpath(''//*[@id="valiCode"]'').location
    code_size = browser.find_element_by_xpath(''//*[@id="valiCode"]'').size    
    #获取验证码位置
    #此处的X和Y分别加了数字，因为前端的样式中，验证码标签img的margin-left为15，margin-top为5
    location_X = math.ceil(code_loc[''x''] / scale) + 15
    location_Y = math.ceil(code_loc[''y''] / scale) + 5
    location_height = math.ceil(code_size[''height''] / scale)
    location_width = math.ceil(code_size[''width''] / scale)

    code_img = login_img.crop((location_X,location_Y,location_X + location_width ,location_Y + location_height))
    code_img.save(''code.png'')

（2）再将获取到的验证码图片，先进行二值化处理，以便提取验证码更容易，这里涉及一些图像处理的算法。

#先二值化处理  
    image=Image.open(img)  
    # 灰度图  
    lim=image.convert(''L'')  
    # 灰度阈值设为100，低于这个值的点全部填白色  
    threshold=100
    table=[]  
      
    for j in range(256):  
        if j<threshold:  
            table.append(0)  
        else:  
            table.append(1)  
  
    bim=lim.point(table,''1'')  
    bim.save(''newImg.png'')

（3）处理后的图像更容易获取验证码，这里采用pytesseract库转化，注意后面的参数设置。

先前不设置参数时，总是将1转化为7，设置后转化的准确率杠杠滴。

当然，目前只是识别数字型的验证码，文字类型的方法应该是类似的。

lastpic = Image.open(''newImg.png'')
    text = pytesseract.image_to_string(lastpic,lang=''eng'',config=''--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'').strip()
    #后面的配置简直是神器啊，之前一直将1读成7，加配置后很准确
    with open(''output.txt'',''w'') as f:
        f.write(text)
    with open(''output.txt'',''r'') as f:
        code = f.read() 
    f.close()
    logger.info("获取的验证码为：")
    logger.info(code)

思路二具体实现过程：

（1）先定位到登陆页面中验证码的位置，将验证码图片另存到一个路径下，并且返回最新时间保存的图片。

# 把验证码另存为图片
def image_save_as():
    image = driver.find_element_by_id("valiCode")
    actions = ActionChains(driver)
    actions.context_click(image)
    actions.perform()
    pyautogui.typewrite([''down'', ''down'', ''enter'', ''enter'']) # 右键找到图片另存为
    sleep(2)
    pyautogui.typewrite([''enter''])
    sleep(2)

def get_newest_image(image_path):

    lists = os.listdir(image_path)
    lists.sort(key=lambda fn:os.path.getmtime(image_path + "\\" + fn))  # 按时间排序
    image_new = os.path.join(image_path, lists[-1])

    return image_new

（2）后面的处理过程与思路一的图片处理方法相同。

python selenium firefox使用

演示的版本信息如下：

Python 3.6.0

Selenium 3.5.0

Firefox 55.0.3

geckodriver v1.0.18.0 win64

1、前提准备

1.1 安装python

1.2 安装Firefox浏览器

1.3下载geckodriver(是Firefox的官方webdriver)

2、Python安装selenium

python 3.0X包自带pip和setuptools。我们可以使用如下方法安装selenium：

pip install selenium [version] 如果不加版本号，就安装最新的。加了就安装指定版本

如果在安装过程中，遇到如下错误，需要单独安装pip和setuptools：

下载地址：https://pypi.python.org/pypi/pip/#downloads

https://pypi.python.org/pypi/setuptools#downloads

安装方式如下图所示：

安装setuptools

安装pip

安装完成之后，可以检查selenium的安装：

3、下载安装geckodriver

下载地址：https://github.com/mozilla/geckodriver/releases

根据电脑系统版本进行下载，我这里下载的win64位的：

下载完成后，解压，将geckodriver.exe放置在与python3.exe相同的路径下：

注意：这里之所以将geckodriver.exe与Python3.exe放置在相同路径下，是因为python3.exe在系统的环境变量中的Path中有配置：【D:\Program Files\Python36\】。如果不放置到相同路径，需要自己在Path中配置环境变量，路径为geckodriver所在路径。

4、访问页面

selenium进行自动化测试的方法是打开浏览器，按照脚本规定的操作模拟人的行为，然后检查期望值与实际值是否相符，以判定测试是否通过。所以第一步就是：访问页面。

如下图脚本，是使用Firefox访问百度首页：

5、PyCharm配置Selenium

这里使用PyCharm来运行以上脚本。需要在PyCharm里面配置Selenium。

进入File-Settings:

选择我们的项目，点击Project Interpreter，点击右侧的+:

6、运行代码

最后运行代码，就可以了。

python selenium right click on an href and choose Save link as... on Chrome.

From:https://stackoverflow.com/questions/42781483/right-click-on-an-href-and-choose-save-link-as-in-python-selenium/42783015
原生代码：
from selenium import webdriver
from selenium.webdriver import ActionChains
import pyautogui
 driver = webdriver.Chrome() driver.get(link) elem = driver.find_element_by_css_selector(''a[target="_blank"]'') actionChain = ActionChains(driver) actionChain.context_click(elem).perform()
pyautogui.typewrite([''down'',''down'',''down'',''down'',''enter''])

我的代码：

def test_bzw_install(self):
 __download_arrow = (By.CLASS_NAME, ''glyphicon-arrow-right'')
 __bluezone_web_build_logo = (By.CLASS_NAME, ''no-text-decoration'')
 """1.Download latest build from bzw Build web site(http://10.17.10.130/bzw) to local"""
 """1.1.Open Build web site(http://10.17.10.130/bzw)"""
 self.driver = BaseWebDriver().getDriver(Global().browser)
 self.build_page = BzwInstall(self.driver)
 self.build_page.go_to()
 # """1.2(PlanA).Click Master Download Arrow to download-Pass"""
 self.ele_visible_short(self.__bluezone_web_build_logo)
 elem_download_arrow = self.get_elements(*self.__download_arrow)[0]
 actionChain = ActionChains(self.driver)
 actionChain.context_click(elem_download_arrow).perform()
 pyautogui.typewrite([''down'', ''down'', ''down'', ''down'', ''enter''])

Python Selenium-获取href值

我正在尝试从网站复制href值，而html代码如下所示：

<p>
 <a href="https://www.iproperty.com.my/property/setia-eco-park/sale- 
 1653165/">Shah Alam Setia Eco Park,Setia Eco Park
 </a>
</p>

我试过了，driver.find_elements_by_css_selector(".sc- eYdvao.kvdWiq").get_attribute("href")但是又回来了'list' object has no attribute 'get_attribute'。使用driver.find_element_by_css_selector(".sc- eYdvao.kvdWiq").get_attribute("href")返回None。但是我不能使用xpath，因为该网站有20+
href，我需要全部复制。使用xpath只会复制一个。

如果有帮助，则所有20+ href都归入同一类sc-eYdvao kvdWiq。

最终，我想复制所有20+ href，并将其导出到csv文件中。

感谢任何可能的帮助。

关于Python Selenium获取所有“ href”属性和selenium获取a标签的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于python + selenium获取验证码、python selenium firefox使用、python selenium right click on an href and choose Save link as... on Chrome.、Python Selenium-获取href值等相关知识的信息别忘了在本站进行查找喔。

本文标签：