Tensorflow 2.0不计算梯度（tensorflow 梯度怎么计算）

25-01-29 23

本文将为您提供关于Tensorflow2.0不计算梯度的详细介绍，我们还将为您解释tensorflow梯度怎么计算的相关知识，同时，我们还将为您提供关于Centos6安装TensorFlow及Tens

本文将为您提供关于Tensorflow 2.0不计算梯度的详细介绍，我们还将为您解释tensorflow 梯度怎么计算的相关知识，同时，我们还将为您提供关于Centos6安装TensorFlow及TensorFlowOnSpark、github/tensorflow/tensorflow/contrib/slim/、hello tensorflow，我的第一个tensorflow程序、SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器的实用信息。

本文目录一览：

Tensorflow 2.0不计算梯度（tensorflow 梯度怎么计算）
Centos6安装TensorFlow及TensorFlowOnSpark
github/tensorflow/tensorflow/contrib/slim/
hello tensorflow，我的第一个tensorflow程序
SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器

Tensorflow 2.0不计算梯度（tensorflow 梯度怎么计算）

我想可视化CNN中给定特征图学会的模式（在此示例中，我使用的是vgg16）。为此，我创建了一个随机图像，通过网络馈送到所需的卷积层，选择特征图并找到相对于输入的梯度。想法是，以最大程度激活所需特征图的方式更改输入。使用tensorflow
2.0我有一个GradientTape跟随该函数然后计算梯度，但是该梯度返回None，为什么它无法计算梯度？

import tensorflow as tfimport matplotlib.pyplot as pltimport timeimport numpy as npfrom tensorflow.keras.applications import vgg16class maxFeatureMap():    def __init__(self, model):        self.model = model        self.optimizer = tf.keras.optimizers.Adam()    def getNumLayers(self, layer_name):        for layer in self.model.layers:            if layer.name == layer_name:                weights = layer.get_weights()                num = weights[1].shape[0]        return ("There are {} feature maps in {}".format(num, layer_name))    def getGradient(self, layer, feature_map):        pic = vgg16.preprocess_input(np.random.uniform(size=(1,96,96,3))) ## Creates values between 0 and 1        pic = tf.convert_to_tensor(pic)        model = tf.keras.Model(inputs=self.model.inputs,                                outputs=self.model.layers[layer].output)        with tf.GradientTape() as tape:            ## predicts the output of the model and only chooses the feature_map indicated            predictions = model.predict(pic, steps=1)[0][:,:,feature_map]            loss = tf.reduce_mean(predictions)        print(loss)        gradients = tape.gradient(loss, pic[0])        print(gradients)        self.optimizer.apply_gradients(zip(gradients, pic))model = vgg16.VGG16(weights=''imagenet'', include_top=False)x = maxFeatureMap(model)x.getGradient(1, 24)

答案1

小编典典

这是一个常见的陷阱GradientTape;
磁带仅跟踪设置为“监视”的张量，并且磁带默认情况下仅监视可训练的变量（表示使用tf.Variable创建的对象trainable=True）。要观察pic张量，应tape.watch(pic)在磁带上下文中添加第一行。

另外，我不确定索引（pic[0]）是否会起作用，因此您可能要删除它-因为pic在第一维中只有一个条目，所以无论如何都没关系。

此外，您不能使用，model.predict因为这将返回一个numpy数组，该数组基本上会“破坏”计算图链，因此不会向后传播梯度。您应该仅将模型用作可调用模型，即predictions= model(pic)。

Centos6安装TensorFlow及TensorFlowOnSpark

1. 需求描述

在Centos6系统上安装Hadoop、Spark集群，并使用TensorFlowOnSpark的 YARN运行模式下执行TensorFlow的代码。（最好可以在不联网的集群中进行配置并运行）

2. 系统环境（拓扑）

操作系统：Centos6.5 Final ； Hadoop：2.7.4 ； Spark：1.5.1-Hadoop2.6； TensorFlow 1.3.0；TensorFlowOnSpark （github最新下载）；Python：2.7.12；

s0.centos.com: memory:1.5G namenode/resourcemanager ; 1核

s1.centos.com / s2.centos.com/ s3.centos.com : datanode/nodemanager ; memory： 1.2G， 1 核

其中yarn-site.xml 部分配置如下（参考默认的，TensorFlowonspark运行不起来）：

<property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>2048</value>
            </property>
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>2048</value>
        </property>
        <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                        <value>2</value>
                            </property>

3. 参考

https://blog.abysm.org/2016/06/building-tensorflow-centos-6/： Centos6 build TensorFlow

TensorFlow github wiki ：https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN ; installTensorFlowOnSpark ；

TensorFlow github wiki： https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide ;conversionTensorFlow code ；

4. 步骤

步骤如下：

详细步骤如下：

1.安装devtoolset-6 及Python：

安装repo库： yum install -y centos-release-scl
安装 devtoolset:  yum install -y devtoolset-6

安装Python：

yum install python27 python27-numpy python27-python-devel python27-python-wheel

安装一些常用包：

yum install –y vim zip unzip openssh-clients

2.下载bazel，这里下载的是0.5.1（虽然也下载了0.4.X的版本，下载包难下）

先执行：
export CC=/opt/rh/devtoolset-6/root/usr/bin/gcc
接着进入编译环境：
scl enable devtoolset-6 python27 bash
接着以此执行：
 unzip bazel-0.5.1-dist.zip -d bazel-0.5.1-dist
cd bazel-0.5.1-dist

# compile
./compile.sh
 
# install
mkdir -p ~/bin
cp output/bazel ~/bin/

exit  //退出scl环境
// 耗时较久

3.下载TensorFlow1.3.0源码并解压

4.进入tensorflow-1.3.0 ，修改tensorflow/tensorflow.bzl文件中的tf_extension_linkopts函数如下形式：(添加一个-lrt)

def tf_extension_linkopts():
  return ["-lrt"]  # No extension link opts

5.编译安装TensorFlow：

安装基本软件： yum install –y patch
接着，进入编译环境：
scl enable devtoolset-6 python27 bash
cd tensorflow-1.3.0
./configure
 
# build
~/bin/bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
exit // 退出编译环境
// 耗时同样很久，同样使用bazel0.4.X的版本编译TensorFlow1.3提示版本过低

编译后在/tmp/tensorflow_pkg则会生成一个TensorFlow的安装包，并且是属于当前系统也就是Centos系统的安装包；

http://download.csdn.net/download/fansy1990/10042475 <<--- whl安装包下载地址

由于不想让现有的系统过于复杂，也就是直接在每个节点安装Python，然后安装TensorFlow等相关 Python包，所以参考TensorFlow on spark 官网进行，如下步骤：

6.安装Python自定义包(保持在联网状态下)；

由于想在未联网的情况下使用TensorFlow以及TensorFlowOnSpark，所以参考TensorFlowOnSpark github WIKI，直接编译一个Python包，并且把TensorFlow、TensorFlowOnSpark及其他常用module安装在这个Python包中，后面就可以直接把这个包上传到HDFS，使得各个子节点都可以共享共同一个Python.zip包的环境变量。

export PYTHON_ROOT=~/Python // 设置环境变量，并下载Python
curl -O https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz
tar -xvf Python-2.7.12.tgz

编译并安装Python：

pushd Python-2.7.12
./configure --prefix="${PYTHON_ROOT}" --enable-unicode=ucs4
make
make install
popd

安装Pip：

pushd "${PYTHON_ROOT}"
curl -O https://bootstrap.pypa.io/get-pip.py
bin/python get-pip.py
popd

安装TensorFlow：

pushd "${PYTHON_ROOT}"
bin/pip install /tmp/tensorflow_pkg/tensorflow-1.3.0-cp27-none-linux_x86_64.whl
popd

在安装TensorFlow的时候会自动安装诸如 numpy等常用Python包；

安装TensorFlowOnSpark：

pushd "${PYTHON_ROOT}"
bin/pip install tensorflowonspark
popd

把“武装”好的Python打包并上传到HDFS：

pushd "${PYTHON_ROOT}"
zip -r Python.zip *
popd

hadoop fs -put ${PYTHON_ROOT}/Python.zip

现在就可以使用TensorFlow了；

7. 修改TensorFlow代码，比如下面的TensorFlow代码是可以在TensorFlow环境中运行的：

# from __future__ import absolute_import
# from __future__ import division
# from __future__ import print_function

import numpy as np

import tensorflow as tf

X_FEATURE = 'x'  # Name of the input feature.

train_percent = 0.8


def load_data(data_file_name):
    data = np.loadtxt(open(data_file_name),delimiter=",",skiprows=0)
    return data


def data_selection(iris,train_per):
    data,target = np.hsplit(iris[np.random.permutation(iris.shape[0])],np.array([-1]))

    row_split_index = int(data.shape[0] * train_per)

    x_train,x_test = (data[1:row_split_index],data[row_split_index:])
    y_train,y_test = (target[1:row_split_index],target[row_split_index:])
    return x_train,x_test,y_train.astype(int),y_test.astype(int)


def run():
    # Load dataset.
    data_file = 'iris01.csv'
    iris = load_data(data_file)
    # x_train,y_train,y_test = model_selection.train_test_split(
    #     iris.data,iris.target,test_size=0.2,random_state=42)

    x_train,y_test = data_selection(iris,train_percent)

    # print(x_test)
    # print(y_test)

    #
    # # Build 3 layer DNN with 10,20,10 units respectively.
    feature_columns = [
        tf.feature_column.numeric_column(
            X_FEATURE,shape=np.array(x_train).shape[1:])]
    classifier = tf.estimator.DNNClassifier(
        feature_columns=feature_columns,hidden_units=[10,10],n_classes=3)
    #
    # # Train.
    train_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={X_FEATURE: x_train},y=y_train,num_epochs=None,shuffle=True)
    classifier.train(input_fn=train_input_fn,steps=200)
    #
    # # Predict.
    test_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={X_FEATURE: x_test},y=y_test,num_epochs=1,shuffle=False)
    predictions = classifier.predict(input_fn=test_input_fn)
    y_predicted = np.array(list(p['class_ids'] for p in predictions))
    y_predicted = y_predicted.reshape(np.array(y_test).shape)
    # #
    # # # score with sklearn.
    # score = metrics.accuracy_score(y_test,y_predicted)
    # print('Accuracy (sklearn): {0:f}'.format(score))
    print(np.concatenate(( y_predicted,y_test),axis= 1))
    # score with tensorflow.
    scores = classifier.evaluate(input_fn=test_input_fn)
    print('Accuracy (tensorflow): {0:f}'.format(scores['accuracy']))

    print(classifier.params)


if __name__ == '__main__':
    run()

其中iris01.csv 数据如下：

5.1,3.5,1.4,0.2,0
4.9,3.0,0
4.7,3.2,1.3,0
4.6,3.1,1.5,0
5.0,3.6,0
5.4,3.9,1.7,0.4,3.4,0.3,0
4.4,2.9,0.1,3.7,0
4.8,1.6,0
4.3,1.1,0
5.8,4.0,1.2,0
5.7,4.4,0
5.1,3.8,1.0,3.3,0.5,1.9,0
5.2,4.1,0
5.5,4.2,0
4.5,2.3,0.6,0
5.3,0
7.0,4.7,1
6.4,4.5,1
6.9,4.9,1
5.5,1
6.5,2.8,4.6,1
5.7,1
6.3,1
4.9,2.4,1
6.6,1
5.2,2.7,1
5.0,2.0,1
5.9,1
6.0,2.2,1
6.1,1
5.6,1
6.7,1
5.8,1
6.2,2.5,4.8,1.8,4.3,1
6.8,5.0,2.6,5.1,1
5.4,1
5.1,6.0,2
5.8,2
7.1,5.9,2.1,2
6.3,5.6,2
6.5,5.8,2
7.6,6.6,2
4.9,2
7.3,6.3,2
6.7,2
7.2,6.1,2
6.4,5.3,2
6.8,5.5,2
5.7,2
7.7,6.7,6.9,2
6.0,2
6.9,5.7,2
5.6,2
6.2,2
6.1,2
7.4,2
7.9,6.4,5.4,5.2,2
5.9,2

那代码怎么修改呢？

1）. 导入必要的包：

from pyspark.context import SparkContext
from pyspark.conf import SparkConf
from tensorflowonspark import TFCluster,TFNode
#from com.yahoo.ml.tf import TFCluster,TFNode
from datetime import datetime

这里要注意，导入TFCluster的时候，不要参考官网的导入方式，而应该从tensorflowonspark导入；

2.）修改main函数，比如我这里的函数run，只需要添加两个参数即可：（argv,cxt）

3) 把原来的main函数调用，替换成下面的调用方式，比如我这里原来只需要在main函数执行run即可，这里需要调用TFCluster.run，并且把我的run函数传递给第二个参数值：

sc = SparkContext(conf=SparkConf().setAppName("your_app_name"))
    num_executors = int(sc._conf.get("spark.executor.instances"))
    num_ps = 1
    tensorboard = True

    cluster = TFCluster.run(sc,run,sys.argv,num_executors,num_ps,tensorboard,TFCluster.InputMode.TENSORFLOW)
    cluster.shutdown()

然后就可以运行了，修改后的代码如下：

# from __future__ import absolute_import
# from __future__ import division
# from __future__ import print_function
from pyspark.context import SparkContext
from pyspark.conf import SparkConf
from tensorflowonspark import TFCluster,TFNode
from datetime import datetime
import numpy as np
import sys
# from sklearn import metrics
# from sklearn import model_selection

import tensorflow as tf

X_FEATURE = 'x'  # Name of the input feature.

train_percent = 0.8


def load_data(data_file_name):
    data = np.loadtxt(open(data_file_name),y_test.astype(int)


def map_run(argv,ctx):
    # Load dataset.
    data_file = 'iris01.csv'
    iris = load_data(data_file)
    # x_train,axis= 1))
    # score with tensorflow.
    scores = classifier.evaluate(input_fn=test_input_fn)
    print('Accuracy (tensorflow): {0:f}'.format(scores['accuracy']))

    print(classifier.params)


if __name__ == '__main__':
    import tensorflow as tf
    import sys
    sc = SparkContext(conf=SparkConf().setAppName("your_app_name"))
    num_executors = int(sc._conf.get("spark.executor.instances"))
    num_ps = 1
    tensorboard = False

    cluster = TFCluster.run(sc,map_run,TFCluster.InputMode.TENSORFLOW)
    cluster.shutdown()

7. 设置环境变量，并运行：

1)上传iris01.csv到HDFS： hdfs dfs -put iris01.csv

2) 设置环境变量：

export PYTHON_ROOT=./Python
export LD_LIBRARY_PATH=${PATH}
export PYSPARK_PYTHON=${PYTHON_ROOT}/bin/python
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=Python/bin/python"
export PATH=${PYTHON_ROOT}/bin/:$PATH
#export QUEUE=gpu

# set paths to libjvm.so,libhdfs.so,and libcuda*.so
#export LIB_HDFS=/opt/cloudera/parcels/CDH/lib64                      # for CDH (per @wangyum)
export LIB_HDFS=$HADOOP_PREFIX/lib/native
export LIB_JVM=$JAVA_HOME/jre/lib/amd64/server
#export LIB_CUDA=/usr/local/cuda-7.5/lib64

# for cpu mode:
 export QUEUE=default

3）调用代码：

/usr/local/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-memory 1024m --archives hdfs://s0:8020/user/root/Python.zip#Python,/root/iris01.csv /root/iris_c.py

4）查看yarn日志，可以看到执行成功；

5. 问题及解决

1） libc.so.6: version `GLIBC_2.14' not found

这个问题是由于Centos6的版本其GLIBC的版本是2.12 ，版本过低导致的；

解决思路：

a. 升级版本，这个选项不适用，由于这个软件是底层软件，升级后导致系统不稳定；

b. 编译一个可以在Centos6上运行的TensorFlow安装包，也就是本文的做法；

2）Cannot run program "patch" (in directory "/root/.cache/bazel/_bazel_root/6093305914d4a581ed00c0f6c06f975b/external/boringssl")

yum install patch

3)Traceback (most recent call last):
File "iris_c.py",line 6,in <module>
from com.yahoo.ml.tf import TFCluster,TFNode
ImportError: No module named com.yahoo.ml.tf

修改：
from com.yahoo.ml.tf import TFCluster,TFNode
=》
from tensorflowonspark import TFCluster,TFNode

6. 总结

1. 在编译tensorflow的时候遇到很多问题，使用bing的国际版查询效果会更好；

2. 暂时只能使用终端设置环境变量的方式执行程序，并且程序执行很慢，后面可以考虑使用开发工具直连提交任务，并着手提升效率；

分享，成长，快乐脚踏实地，专注转载请注明blog地址：http://blog.csdn.net/fansy1990

github/tensorflow/tensorflow/contrib/slim/

TensorFlow-Slim

TF-Slim 是一个轻量级的库，用来在 TF 中定义、训练和评估复杂模型。tf-slim 能够自由混入原生 TF 和其它框架（如 tf.contrib.learn 中）。

用法

import tensorflow.contrib.slim as slim

为什么用 TF-Slim?

TF-Slim 中都有什么组成部分？

定义模型

变量

层

Scopes

实例：实现 VGG16

训练模型

Training Tensorflow models requires a model, a loss function, the gradient computation and a training routine that iteratively computes the gradients of the model weights relative to the loss and updates the weights accordingly. TF-Slim provides both common loss functions and a set of helper functions that run the training and evaluation routines.

损失

The loss function defines a quantity that we want to minimize. For classification problems, this is typically the cross entropy between the true distribution and the predicted probability distribution across classes. For regression problems, this is often the sum-of-squares differences between the predicted and true values.

Certain models, such as multi-task learning models, require the use of multiple loss functions simultaneously. In other words, the loss function ultimately being minimized is the sum of various other loss functions. For example, consider a model that predicts both the type of scene in an image as well as the depth from the camera of each pixel. This model''s loss function would be the sum of the classification loss and depth prediction loss.

TF-Slim provides an easy-to-use mechanism for defining and keeping track of loss functions via the losses module. Consider the simple case where we want to train the VGG network:

Training Loop

TF-Slim provides a simple but powerful set of tools for training models found in learning.py. These include a Train function that repeatedly measures the loss, computes gradients and saves the model to disk, as well as several convenience functions for manipulating gradients. For example, once we''ve specified the model, the loss function and the optimization scheme, we can call slim.learning.create_train_op and slim.learning.train to perform the optimization:

实例：训练 VGG16 模型

To illustrate this, let''s examine the following sample of training the VGG network:

微调已存在的模型

Brief Recap on Restoring Variables from a Checkpoint

After a model has been trained, it can be restored using tf.train.Saver() which restores Variables from a given checkpoint. For many cases, tf.train.Saver() provides a simple mechanism to restore all or just a few variables.

Partially Restoring Models

It is often desirable to fine-tune a pre-trained model on an entirely new dataset or even a new task. In these situations, one can use TF-Slim''s helper functions to select a subset of variables to restore:

Restoring models with different variable names

Fine-Tuning a Model on a different task

Consider the case where we have a pre-trained VGG16 model. The model was trained on the ImageNet dataset, which has 1000 classes. However, we would like to apply it to the Pascal VOC dataset which has only 20 classes. To do so, we can initialize our new model using the values of the pre-trained model excluding the final layer:

评估模型

Once we''ve trained a model (or even while the model is busy training) we''d like to see how well the model performs in practice. This is accomplished by picking a set of evaluation metrics, which will grade the model''s performance, and the evaluation code which actually loads the data, performs inference, compares the results to the ground truth and records the evaluation scores. This step may be performed once or repeated periodically.

度量

我们定义一个度量来衡量训练效果，这不是一个损失函数（损失被用来在训练过程中进行优化的）。例如，我们训练时最小化 log 损失，但是评估模型时我们也许会用 F1 score , 或者 Intersection Over Union score（这个值不可微，因此也不能用在损失函数上）。

TF-Slim 提供了一组度量 operations。笼统地讲，计算一个度量值可以被分为三部分：

初始化：初始化用来计算度量的变量。
Aggregation: perform operations (sums, etc) used to compute the metrics.
Finalization: (optionally) perform any final operation to compute metric values. For example, computing means, mins, maxes, etc.

例如，计算 mean_absolute_error，两个变量 (count 和 total) 被初始化为 0。在 aggregation，我们得到一组 predictions 和 labels，计算它们的绝对误差并总计为 total。我们每增加一组，count也随之增加。最后，在 finalization 阶段，total 除以 count 来获得均值。

The following example demonstrates the API for declaring metrics. Because metrics are often evaluated on a test set which is different from the training set (upon which the loss is computed), we''ll assume we''re using test data:

images, labels = LoadTestData(...)
predictions = MyModel(images)

mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)

就像例子描述的那样，创建的 metric 返回两个值： value_op 和 update_op。 value_op 是一个 idempotent operation 返回 metric 的当前值。update_op 是一个 operation，它执行 aggregation 步骤并返回 metric 的值。

跟踪 value_op 和 update_op 费时费力。为了解决这个问题，TF-Slim 提供两个方便的函数：

# 总计value和update ops 到两个列表中:
value_ops, update_ops = slim.metrics.aggregate_metrics(
    slim.metrics.streaming_mean_absolute_error(predictions, labels),
    slim.metrics.streaming_mean_squared_error(predictions, labels))

# 总起value和update ops 到两个字典中:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

hello tensorflow，我的第一个tensorflow程序

上代码：

import tensorflow as tf

if __name__==''__main__'': 
    g = tf.Graph()   
    # add ops to the user created graph
    with g.as_default(): 
        hello = tf.constant(''Hello Tensorflow'') 
        sess = tf.compat.v1.Session(graph=g)
        print(sess.run(hello))

输出如下图右侧：

说明：python3.7.4 ，tensorflow2.0

若对您有用，请赞助个棒棒糖~

SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器

昨天类似的 YOLO： https://www.v2ex.com/t/392671#reply0

下载这个项目

https://github.com/balancap/SSD-Tensorflow

解压 checkpoint files in ./checkpoint

unzip ssd_300_vgg.ckpt.zip

运行 jupyter 文件命令

jupyter notebook notebooks/ssd_notebook.ipynb

项目说明： http://www.tensorflownews.com/2017/09/22/ssd-single-shot-multibox-detector-in-tensorflow/

项目地址： https://github.com/balancap/SSD-Tensorflow

更多 TensorFlow 教程： http://www.tensorflownews.com

关于Tensorflow 2.0不计算梯度和tensorflow 梯度怎么计算的介绍已经告一段落，感谢您的耐心阅读，如果想了解更多关于Centos6安装TensorFlow及TensorFlowOnSpark、github/tensorflow/tensorflow/contrib/slim/、hello tensorflow，我的第一个tensorflow程序、SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器的相关信息，请在本站寻找。

本文标签：