在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？

25-04-25 1

在本文中，我们将给您介绍关于在GCP中使用tensorflow_cloud训练模型时如何引用自定义Python文件？的详细内容，此外，我们还将为您提供关于Ansible和GCP使用事实GCP文件存储模

在本文中，我们将给您介绍关于在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？的详细内容，此外，我们还将为您提供关于Ansible 和 GCP 使用事实 GCP 文件存储模块、GCP Compute Engine Python API 创建客户端的正确方法、GCP Compute Engine Python API 和实例模板不起作用？、GCP dataproc 上的外部 Hive 表未从 GCP 存储桶读取数据的知识。

本文目录一览：

在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？
Ansible 和 GCP 使用事实 GCP 文件存储模块
GCP Compute Engine Python API 创建客户端的正确方法
GCP Compute Engine Python API 和实例模板不起作用？
GCP dataproc 上的外部 Hive 表未从 GCP 存储桶读取数据

在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？

如何解决在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？

我正在尝试使用 tensorflow_cloud 在 Google Cloud 中训练 Tensorflow 模型。

所以我有以下触发训练的代码：

import tensorflow_cloud as tfc
tfc.run(
    requirements_txt="requirements.txt",distribution_strategy="auto",docker_image_bucket_name=<bucket-name>
)

我有一个带有一些 util 函数的简单 python 文件，假设它的名称是 utils.py

所以，在笔记本中，我像这样导入这个文件：

from utils import *

我的问题是 - 在 GCP 中运行笔记本时如何引用此文件？

目前我收到一个模块未找到错误，表明无法找到 utils 文件。

我正在尝试将该文件复制到我的 GCP 存储桶，但它仍然说找不到它。

解决方法

我知道您想使用 import 来导入和访问您在单独的 .py 文件中拥有的函数。您收到的错误是因为解释器在当前或全局目录中找不到您的包 (.py)。为此，您需要添加正确的路径并导入模块。

为了正确访问它们，您需要按照以下步骤操作，

点击新建文件夹按钮，在您的 Jupyter 笔记本中创建一个文件夹；
在那里上传 .py 文件；
您需要添加位于您在第一步中创建的新目录中的文件的路径。现在，在要导入文件的 .ipynb 中，执行

    import sys  
    sys.path.insert(0,"your_path/package")
    import your_file as pck

注意：用您的文件路径和文件名替换正确的字段

导入成功。然后你就可以使用文件中的功能了。

Ansible 和 GCP 使用事实 GCP 文件存储模块

如果我相信您的答案的示例输出，信息将在您的任务结果的 resources 键中返回。我无法测试自己，但我相信以下内容应该符合您的期望。

请注意，resources 是一个字典列表。在下面的示例中，我将从列表的第一个元素访问信息。如果您需要其他内容（例如所有 createTime... 的列表）或循环遍历这些对象，您可以从这个示例进行扩展。

- name: get info on an instance
  gcp_filestore_instance_info:
    zone: xxxxx-xxxx-b
    project: dxxxxx-xxxxxx
    auth_kind: serviceaccount
    service_account_file: "/root/dxxxt-xxxxxxx.json"
  register: instance_info

- name: show create time for first resource
  debug:
    msg: "{{ instance_info.resources.0.createTime }}"

- name: show first ip of first network of first resource
  debug:
    msg: "{{ instance_info.resources.0.networks.0.ipAddresses.0 }}"

GCP Compute Engine Python API 创建客户端的正确方法

如何解决GCP Compute Engine Python API 创建客户端的正确方法

在 python 中创建 gcp 计算客户端的当前“标准”方法是什么？两个我都看过：

import googleapiclient.discovery
service = googleapiclient.discovery.build(
        ''container'',''v1'',credentials=credentials)

body = {
    "autocreatesubnetworks": False,"description": "","mtu": 1460.0,"name": "test_network","routingConfig": {
        "routingMode": "REGIONAL"
    }
}

network = compute.networks().insert(project=project_id,body=body,requestId=str(uuid.uuid4())).execute()

和：

from google.cloud import compute_v1        
compute = compute_v1.InstancesClient(credentials=credentials)

net = compute.Network()
net.auto_create_subnetworks = False
net.description = ""
net.mtu = 1460.0
net.name = "test_network"
net.routing_config = {
    "routingMode": "REGIONAL"
}

request = InsertNetworkRequest()
request.project = project_id
request.request_id = str(uuid.uuid4())
request.network_resource = net

network = compute.NetworksClient().insert(request=request)

Google 是否计划在未来的某个地方只支持一个？

解决方法

根据此存储库 google-api-python-client 库，其中解释说它现在受支持，但还没有停止更新的日期。

这个库被认为是完整的并且处于维护模式。这意味着我们将解决关键错误和安全问题，但不会添加任何新功能。

建议使用存储库 google-cloud-python，它有 3 个开发分支，GA（通用版）、Beta 支持和 Alpha 支持。

GCP Compute Engine Python API 和实例模板不起作用？

如何解决GCP Compute Engine Python API 和实例模板不起作用？

每次我尝试在 python Compute Engine API 中使用实例模板时，它都会错误地输出 URL。

例如：使用 PyCharm 和 Python 3.9

compute = googleapiclient.discovery.build(''compute'',''v1'')
host_project = ''testproject123''
host_zone = ''us-central1-a''
vm_name = host_project+''-api-fetch''
instance_template = ''projects/testproject123/global/instanceTemplates/testproject123-api-1''
compute.instances().insert(project=host_project,zone=host_zone,sourceInstanceTemplate=instance_template).execute()

哪个返回

Error
Traceback (most recent call last):
  File "<input>",line 1,in <module>
  File "C:\\Program Files\\JetBrains\\PyCharm Community Edition 2021.1.1\\plugins\\python-ce\\helpers\\pydev\\_pydev_bundle\\pydev_umd.py",line 197,in runfile
    pydev_imports.execfile(filename,global_vars,local_vars)  # execute the script
  File "C:\\Program Files\\JetBrains\\PyCharm Community Edition 2021.1.1\\plugins\\python-ce\\helpers\\pydev\\_pydev_imps\\_pydev_execfile.py",line 18,in execfile
    exec(compile(contents+"\\n",file,''exec''),glob,loc)
  File "[PATH]main.py",line 14,in <module>
    compute.instances().insert(project=host_project,sourceInstanceTemplate=instance_template).execute()
  File "C:\\Users\\[USER]\\AppData\\Roaming\\Python\\python39\\site-packages\\googleapiclient\\_helpers.py",line 134,in positional_wrapper
    return wrapped(*args,**kwargs)
  File "C:\\Users\\[USER]\\AppData\\Roaming\\Python\\python39\\site-packages\\googleapiclient\\http.py",line 935,in execute
    raise HttpError(resp,content,uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://compute.googleapis.com/compute/v1/projects/testproject123/zones/us-central1-a/instances?sourceInstanceTemplate=projects%2Ftestproject123%2Fglobal%2FinstanceTemplates%2Ftestproject123-api-1&alt=json returned "required field ''resource'' not specified". Details: "[{''message'': "required field ''resource'' not specified",''domain'': ''global'',''reason'': ''required''}]">

知道为什么它不断将 ''/'' 转换为 %2 吗？我假设这就是导致问题的原因，但我似乎无法追溯或找到简单的修复方法。

解决方法

显然，尽管 https://googleapis.github.io/google-api-python-client/docs/dyn/compute_v1.instances.html#insert 中的 body 是可选的，但仍然需要带有 name 的 body。添加它完全解决了问题。

%2F 一直作为 URL 编码的斜杠正常工作。

GCP dataproc 上的外部 Hive 表未从 GCP 存储桶读取数据

如何解决GCP dataproc 上的外部 Hive 表未从 GCP 存储桶读取数据

我在 GCP 存储桶中有以下格式的数据：

gs://bucket/my_table/data_date=2021-03-26/000
gs://bucket/my_table/data_date=2021-03-26/001
gs://bucket/my_table/data_date=2021-03-27/000
gs://bucket/my_table/data_date=2021-03-27/001

我正在使用以下数据创建一个外部表：

CREATE EXTERNAL TABLE `my_db.my_table`(
  `col1` string,  `col2` string,PARTITIONED BY ( 
  `data_date` string)
ROW FORMAT SERDE 
  ''org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'' 
WITH SERDEPROPERTIES ( 
  ''field.delim''=''\\t'',  ''serialization.format''=''\\t'') 
STORED AS INPUTFORMAT 
  ''org.apache.hadoop.mapred.TextInputFormat'' 
OUTPUTFORMAT 
  ''org.apache.hadoop.hive.ql.io.HiveIgnoreKeytextoutputFormat''
LOCATION
  ''gs://bucket/my_table/''

创建表时没有错误：

hive > CREATE EXTENAL TABLE ...
Time Taken: 0.012 seconds
OK

但是，我看不到任何数据。即使存储桶中有数据文件，以下命令也不会返回任何内容。

hive> show partitions my_db.my_table;
Ok
Time taken: 0.191 seconds

hive> select * from my_db.my_table;
Ok
Time taken: 0.191 seconds

我也没有看到任何错误。我已经验证并且我确实拥有对存储桶的读取权限。

解决方法

您需要修复表以检索外部表中的所有现有分区。修复命令恢复所有分区并更新 Hive 元存储。

MSCK REPAIR TABLE TABLE_NAME

您可以阅读有关修复命令 here 的更多信息。

关于在 GCP 中使用 tensorflow_cloud 训练模型时如何引用自定义 Python 文件？的介绍现已完结，谢谢您的耐心阅读，如果想了解更多关于Ansible 和 GCP 使用事实 GCP 文件存储模块、GCP Compute Engine Python API 创建客户端的正确方法、GCP Compute Engine Python API 和实例模板不起作用？、GCP dataproc 上的外部 Hive 表未从 GCP 存储桶读取数据的相关知识，请在本站寻找。

本文标签：