Tensorflow精度为.99，但预测很糟糕（tensorflow准确率）

25-02-05 26

在本文中，我们将详细介绍Tensorflow精度为.99，但预测很糟糕的各个方面，并为您提供关于tensorflow准确率的相关解答，同时，我们也将为您带来关于Centos6安装TensorFlow及

在本文中，我们将详细介绍Tensorflow精度为.99，但预测很糟糕的各个方面，并为您提供关于tensorflow准确率的相关解答，同时，我们也将为您带来关于Centos6安装TensorFlow及TensorFlowOnSpark、github/tensorflow/tensorflow/contrib/slim/、hello tensorflow，我的第一个tensorflow程序、SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器的有用知识。

本文目录一览：

Tensorflow精度为.99，但预测很糟糕（tensorflow准确率）
Centos6安装TensorFlow及TensorFlowOnSpark
github/tensorflow/tensorflow/contrib/slim/
hello tensorflow，我的第一个tensorflow程序
SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器

Tensorflow精度为.99，但预测很糟糕（tensorflow准确率）

也许我做错了预测？

这是项目…我有一个要分割的灰度输入图像。细分是一种简单的二进制分类（考虑前景与背景）。因此，基本真理（y）是0和1的矩阵-
因此有2个分类。哦，输入图像是一个正方形，所以我只使用一个称为n_input

我的准确度基本上收敛到0.99，但是当我做出预测时，我得到的都是零。编辑 -> 每个输出矩阵中只有一个1，都在同一位置…

这是我的会话代码（其他所有工作）…

with tf.Session() as sess:    sess.run(init)    summary = tf.train.SummaryWriter(''/tmp/logdir/'', sess.graph_def)    step = 1    from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data    data = scroll_data.read_data(''/home/kendall/Desktop/'')    # Keep training until reach max iterations    flag = 0    # while flag == 0:    while step * batch_size < training_iters:        batch_y, batch_x = data.train.next_batch(batch_size)        # pdb.set_trace()        # batch_x = batch_x.reshape((batch_size, n_input))        batch_x = batch_x.reshape((batch_size, n_input, n_input))        batch_y = batch_y.reshape((batch_size, n_input, n_input))        batch_y = convert_to_2_channel(batch_y, batch_size)        # batch_y = batch_y.reshape((batch_size, n_output, n_classes))        batch_y = batch_y.reshape((batch_size, 200, 200, n_classes))        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,                                       keep_prob: dropout})        if step % display_step == 0:            flag = 1            # Calculate batch loss and accuracy            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,                                                              y: batch_y,                                                              keep_prob: 1.})            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \                  "{:.6f}".format(loss) + ", Training Accuracy= " + \                  "{:.5f}".format(acc)        step += 1    print "Optimization Finished!"    save_path = "model.ckpt"    saver.save(sess, save_path)    im = Image.open(''/home/kendall/Desktop/HA900_frames/frame0635.tif'')    batch_x = np.array(im)    pdb.set_trace()    batch_x = batch_x.reshape((1, n_input, n_input))    batch_x = batch_x.astype(float)    # pdb.set_trace()    prediction = sess.run(pred, feed_dict={x: batch_x, keep_prob: 1.})    print prediction    arr1 = np.empty((n_input,n_input))    arr2 = np.empty((n_input,n_input))    for i in xrange(n_input):        for j in xrange(n_input):            for k in xrange(2):                if k == 0:                    arr1[i][j] = prediction[0][i][j][k]                else:                    arr2[i][j] = prediction[0][i][j][k]    # prediction = np.asarray(prediction)    # prediction = np.reshape(prediction, (200,200))    # np.savetxt("prediction.csv", prediction, delimiter=",")    np.savetxt("prediction1.csv", arr1, delimiter=",")    np.savetxt("prediction2.csv", arr2, delimiter=",")

由于存在两种分类，因此该末端部分（带有两个循环）仅用于将预测划分为两个2x2矩阵。

我将预测数组保存到CSV文件，就像我说的那样，它们全为零。

我还确认所有数据都是正确的（尺寸和值）。

为什么训练会收敛，但是预测却很糟糕？

如果您想查看所有代码，这里是…

import tensorflow as tfimport pdbimport numpy as npfrom numpy import genfromtxtfrom PIL import Image# Import MINST data# from tensorflow.examples.tutorials.mnist import input_data# mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)# Parameterslearning_rate = 0.001training_iters = 20000batch_size = 128display_step = 1# Network Parametersn_input = 200 # MNIST data input (img shape: 28*28)n_output = 40000 # MNIST total classes (0-9 digits)n_classes = 2#n_input = 200dropout = 0.75 # Dropout, probability to keep units# tf Graph inputx = tf.placeholder(tf.float32, [None, n_input, n_input])y = tf.placeholder(tf.float32, [None, n_input, n_input, n_classes])keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)# Create some wrappers for simplicitydef conv2d(x, W, b, strides=1):    # Conv2D wrapper, with bias and relu activation    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=''SAME'')    x = tf.nn.bias_add(x, b)    return tf.nn.relu(x)def maxpool2d(x, k=2):    # MaxPool2D wrapper    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],                          padding=''SAME'')# Create modeldef conv_net(x, weights, biases, dropout):    # Reshape input picture    x = tf.reshape(x, shape=[-1, n_input, n_input, 1])    # Convolution Layer    conv1 = conv2d(x, weights[''wc1''], biases[''bc1''])    # Max Pooling (down-sampling)    conv1 = maxpool2d(conv1, k=2)    conv1 = tf.nn.local_response_normalization(conv1)    # Convolution Layer    conv2 = conv2d(conv1, weights[''wc2''], biases[''bc2''])    # Max Pooling (down-sampling)    conv2 = tf.nn.local_response_normalization(conv2)    conv2 = maxpool2d(conv2, k=2)    # Convolution Layer    conv3 = conv2d(conv2, weights[''wc3''], biases[''bc3''])    # Max Pooling (down-sampling)    conv3 = tf.nn.local_response_normalization(conv3)    conv3 = maxpool2d(conv3, k=2)    # pdb.set_trace()    # Fully connected layer    # Reshape conv2 output to fit fully connected layer input    fc1 = tf.reshape(conv3, [-1, weights[''wd1''].get_shape().as_list()[0]])    fc1 = tf.add(tf.matmul(fc1, weights[''wd1'']), biases[''bd1''])    fc1 = tf.nn.relu(fc1)    # Apply Dropout    fc1 = tf.nn.dropout(fc1, dropout)    output = []    for i in xrange(2):        output.append(tf.nn.softmax(tf.add(tf.matmul(fc1, weights[''out'']), biases[''out''])))    return output    # return tf.nn.softmax(tf.add(tf.matmul(fc1, weights[''out'']), biases[''out'']))# Store layers weight & biasweights = {    # 5x5 conv, 1 input, 32 outputs    ''wc1'': tf.Variable(tf.random_normal([5, 5, 1, 32])),    # 5x5 conv, 32 inputs, 64 outputs    ''wc2'': tf.Variable(tf.random_normal([5, 5, 32, 64])),    # 5x5 conv, 32 inputs, 64 outputs    ''wc3'': tf.Variable(tf.random_normal([5, 5, 64, 128])),    # fully connected, 7*7*64 inputs, 1024 outputs    ''wd1'': tf.Variable(tf.random_normal([25*25*128, 1024])),    # 1024 inputs, 10 outputs (class prediction)    ''out'': tf.Variable(tf.random_normal([1024, n_output]))}biases = {    ''bc1'': tf.Variable(tf.random_normal([32])),    ''bc2'': tf.Variable(tf.random_normal([64])),    ''bc3'': tf.Variable(tf.random_normal([128])),    ''bd1'': tf.Variable(tf.random_normal([1024])),    ''out'': tf.Variable(tf.random_normal([n_output]))}# Construct modelpred = conv_net(x, weights, biases, keep_prob)# pdb.set_trace()pred = tf.pack(tf.transpose(pred,[1,2,0]))pred = tf.reshape(pred, [-1,n_input,n_input,n_classes])# Define loss and optimizercost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)# Evaluate modelcorrect_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))# Initializing the variablesinit = tf.initialize_all_variables()saver = tf.train.Saver()def convert_to_2_channel(x, batch_size):    #assume input has dimension (batch_size,x,y)    #output will have dimension (batch_size,x,y,2)    output = np.empty((batch_size, 200, 200, 2))    temp_arr1 = np.empty((batch_size, 200, 200))    temp_arr2 = np.empty((batch_size, 200, 200))    for i in xrange(batch_size):        for j in xrange(200):            for k in xrange(200):                if x[i][j][k] == 1:                    temp_arr1[i][j][k] = 1                    temp_arr2[i][j][k] = 0                else:                    temp_arr1[i][j][k] = 0                    temp_arr2[i][j][k] = 1    for i in xrange(batch_size):        for j in xrange(200):            for k in xrange(200):                for l in xrange(2):                    if l == 0:                        output[i][j][k][l] = temp_arr1[i][j][k]                    else:                        output[i][j][k][l] = temp_arr2[i][j][k]    return output# Launch the graphwith tf.Session() as sess:    sess.run(init)    summary = tf.train.SummaryWriter(''/tmp/logdir/'', sess.graph_def)    step = 1    from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data    data = scroll_data.read_data(''/home/kendall/Desktop/'')    # Keep training until reach max iterations    flag = 0    # while flag == 0:    while step * batch_size < training_iters:        batch_y, batch_x = data.train.next_batch(batch_size)        # pdb.set_trace()        # batch_x = batch_x.reshape((batch_size, n_input))        batch_x = batch_x.reshape((batch_size, n_input, n_input))        batch_y = batch_y.reshape((batch_size, n_input, n_input))        batch_y = convert_to_2_channel(batch_y, batch_size)        # batch_y = batch_y.reshape((batch_size, n_output, n_classes))        batch_y = batch_y.reshape((batch_size, 200, 200, n_classes))        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,                                       keep_prob: dropout})        if step % display_step == 0:            flag = 1            # Calculate batch loss and accuracy            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,                                                              y: batch_y,                                                              keep_prob: 1.})            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \                  "{:.6f}".format(loss) + ", Training Accuracy= " + \                  "{:.5f}".format(acc)        step += 1    print "Optimization Finished!"    save_path = "model.ckpt"    saver.save(sess, save_path)    im = Image.open(''/home/kendall/Desktop/HA900_frames/frame0635.tif'')    batch_x = np.array(im)    pdb.set_trace()    batch_x = batch_x.reshape((1, n_input, n_input))    batch_x = batch_x.astype(float)    # pdb.set_trace()    prediction = sess.run(pred, feed_dict={x: batch_x, keep_prob: 1.})    print prediction    arr1 = np.empty((n_input,n_input))    arr2 = np.empty((n_input,n_input))    for i in xrange(n_input):        for j in xrange(n_input):            for k in xrange(2):                if k == 0:                    arr1[i][j] = prediction[0][i][j][k]                else:                    arr2[i][j] = prediction[0][i][j][k]    # prediction = np.asarray(prediction)    # prediction = np.reshape(prediction, (200,200))    # np.savetxt("prediction.csv", prediction, delimiter=",")    np.savetxt("prediction1.csv", arr1, delimiter=",")    np.savetxt("prediction2.csv", arr2, delimiter=",")    # Calculate accuracy for 256 mnist test images    print "Testing Accuracy:", \        sess.run(accuracy, feed_dict={x: data.test.images[:256],                                      y: data.test.labels[:256],                                      keep_prob: 1.})

答案1

小编典典

代码错误

您的代码中存在多个错误：

您不应tf.nn.sigmoid_cross_entropy_with_logits使用softmax层的输出进行调用，而应使用未 缩放的logits进行调用 ：

警告：此操作期望未缩放的logit，因为它在内部对logit执行softmax以提高效率。不要使用softmax的输出来调用该操作，因为这会产生错误的结果。

实际上，由于您有2个类，因此应使用softmax的损失，使用 tf.nn.softmax_cross_entropy_with_logits
使用时tf.argmax(pred, 1)，仅将argmax应用于轴1，即输出图像的高度。您应该tf.argmax(pred, 3)在最后一个轴（尺寸为2）上使用。
- 这可以解释为什么您获得0.99的准确性
- 在输出图像上，它将使argmax超过图像的高度，默认情况下为0（因为每个通道的所有值均相等）

型号错误

最大的缺点是您的模型通常很难优化。

您的softmax超过40,000个课程，这是巨大的。
您不会完全利用要输出图像的事实（预测前景/背景）。
- 例如，预测2,345与预测2,346和预测2,545高度相关，但是您没有考虑到这一点

我建议先阅读一些有关语义细分的内容：

本文：用于语义分割的全卷积网络
这些来自CS231n（斯坦福大学）的幻灯片：尤其是有关上采样和去卷积的部分

1. 需求描述

在Centos6系统上安装Hadoop、Spark集群，并使用TensorFlowOnSpark的 YARN运行模式下执行TensorFlow的代码。（最好可以在不联网的集群中进行配置并运行）

2. 系统环境（拓扑）

操作系统：Centos6.5 Final ； Hadoop：2.7.4 ； Spark：1.5.1-Hadoop2.6； TensorFlow 1.3.0；TensorFlowOnSpark （github最新下载）；Python：2.7.12；

s0.centos.com: memory:1.5G namenode/resourcemanager ; 1核

s1.centos.com / s2.centos.com/ s3.centos.com : datanode/nodemanager ; memory： 1.2G， 1 核

其中yarn-site.xml 部分配置如下（参考默认的，TensorFlowonspark运行不起来）：

<property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>2048</value>
            </property>
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>2048</value>
        </property>
        <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                        <value>2</value>
                            </property>

3. 参考

https://blog.abysm.org/2016/06/building-tensorflow-centos-6/： Centos6 build TensorFlow

TensorFlow github wiki ：https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN ; installTensorFlowOnSpark ；

TensorFlow github wiki： https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide ;conversionTensorFlow code ；

4. 步骤

步骤如下：

详细步骤如下：

1.安装devtoolset-6 及Python：

安装repo库： yum install -y centos-release-scl
安装 devtoolset:  yum install -y devtoolset-6

安装Python：

yum install python27 python27-numpy python27-python-devel python27-python-wheel

安装一些常用包：

yum install –y vim zip unzip openssh-clients

2.下载bazel，这里下载的是0.5.1（虽然也下载了0.4.X的版本，下载包难下）

先执行：
export CC=/opt/rh/devtoolset-6/root/usr/bin/gcc
接着进入编译环境：
scl enable devtoolset-6 python27 bash
接着以此执行：
 unzip bazel-0.5.1-dist.zip -d bazel-0.5.1-dist
cd bazel-0.5.1-dist

# compile
./compile.sh
 
# install
mkdir -p ~/bin
cp output/bazel ~/bin/

exit  //退出scl环境
// 耗时较久

3.下载TensorFlow1.3.0源码并解压

4.进入tensorflow-1.3.0 ，修改tensorflow/tensorflow.bzl文件中的tf_extension_linkopts函数如下形式：(添加一个-lrt)

def tf_extension_linkopts():
  return ["-lrt"]  # No extension link opts

5.编译安装TensorFlow：

安装基本软件： yum install –y patch
接着，进入编译环境：
scl enable devtoolset-6 python27 bash
cd tensorflow-1.3.0
./configure
 
# build
~/bin/bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
exit // 退出编译环境
// 耗时同样很久，同样使用bazel0.4.X的版本编译TensorFlow1.3提示版本过低

编译后在/tmp/tensorflow_pkg则会生成一个TensorFlow的安装包，并且是属于当前系统也就是Centos系统的安装包；

http://download.csdn.net/download/fansy1990/10042475 <<--- whl安装包下载地址

由于不想让现有的系统过于复杂，也就是直接在每个节点安装Python，然后安装TensorFlow等相关 Python包，所以参考TensorFlow on spark 官网进行，如下步骤：

6.安装Python自定义包(保持在联网状态下)；

由于想在未联网的情况下使用TensorFlow以及TensorFlowOnSpark，所以参考TensorFlowOnSpark github WIKI，直接编译一个Python包，并且把TensorFlow、TensorFlowOnSpark及其他常用module安装在这个Python包中，后面就可以直接把这个包上传到HDFS，使得各个子节点都可以共享共同一个Python.zip包的环境变量。

export PYTHON_ROOT=~/Python // 设置环境变量，并下载Python
curl -O https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz
tar -xvf Python-2.7.12.tgz

编译并安装Python：

pushd Python-2.7.12
./configure --prefix="${PYTHON_ROOT}" --enable-unicode=ucs4
make
make install
popd

安装Pip：

pushd "${PYTHON_ROOT}"
curl -O https://bootstrap.pypa.io/get-pip.py
bin/python get-pip.py
popd

安装TensorFlow：

pushd "${PYTHON_ROOT}"
bin/pip install /tmp/tensorflow_pkg/tensorflow-1.3.0-cp27-none-linux_x86_64.whl
popd

在安装TensorFlow的时候会自动安装诸如 numpy等常用Python包；

安装TensorFlowOnSpark：

pushd "${PYTHON_ROOT}"
bin/pip install tensorflowonspark
popd

把“武装”好的Python打包并上传到HDFS：

pushd "${PYTHON_ROOT}"
zip -r Python.zip *
popd

hadoop fs -put ${PYTHON_ROOT}/Python.zip

现在就可以使用TensorFlow了；

7. 修改TensorFlow代码，比如下面的TensorFlow代码是可以在TensorFlow环境中运行的：

# from __future__ import absolute_import
# from __future__ import division
# from __future__ import print_function

import numpy as np

import tensorflow as tf

X_FEATURE = 'x'  # Name of the input feature.

train_percent = 0.8


def load_data(data_file_name):
    data = np.loadtxt(open(data_file_name),delimiter=",",skiprows=0)
    return data


def data_selection(iris,train_per):
    data,target = np.hsplit(iris[np.random.permutation(iris.shape[0])],np.array([-1]))

    row_split_index = int(data.shape[0] * train_per)

    x_train,x_test = (data[1:row_split_index],data[row_split_index:])
    y_train,y_test = (target[1:row_split_index],target[row_split_index:])
    return x_train,x_test,y_train.astype(int),y_test.astype(int)


def run():
    # Load dataset.
    data_file = 'iris01.csv'
    iris = load_data(data_file)
    # x_train,y_train,y_test = model_selection.train_test_split(
    #     iris.data,iris.target,test_size=0.2,random_state=42)

    x_train,y_test = data_selection(iris,train_percent)

    # print(x_test)
    # print(y_test)

    #
    # # Build 3 layer DNN with 10,20,10 units respectively.
    feature_columns = [
        tf.feature_column.numeric_column(
            X_FEATURE,shape=np.array(x_train).shape[1:])]
    classifier = tf.estimator.DNNClassifier(
        feature_columns=feature_columns,hidden_units=[10,10],n_classes=3)
    #
    # # Train.
    train_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={X_FEATURE: x_train},y=y_train,num_epochs=None,shuffle=True)
    classifier.train(input_fn=train_input_fn,steps=200)
    #
    # # Predict.
    test_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={X_FEATURE: x_test},y=y_test,num_epochs=1,shuffle=False)
    predictions = classifier.predict(input_fn=test_input_fn)
    y_predicted = np.array(list(p['class_ids'] for p in predictions))
    y_predicted = y_predicted.reshape(np.array(y_test).shape)
    # #
    # # # score with sklearn.
    # score = metrics.accuracy_score(y_test,y_predicted)
    # print('Accuracy (sklearn): {0:f}'.format(score))
    print(np.concatenate(( y_predicted,y_test),axis= 1))
    # score with tensorflow.
    scores = classifier.evaluate(input_fn=test_input_fn)
    print('Accuracy (tensorflow): {0:f}'.format(scores['accuracy']))

    print(classifier.params)


if __name__ == '__main__':
    run()

其中iris01.csv 数据如下：

5.1,3.5,1.4,0.2,0
4.9,3.0,0
4.7,3.2,1.3,0
4.6,3.1,1.5,0
5.0,3.6,0
5.4,3.9,1.7,0.4,3.4,0.3,0
4.4,2.9,0.1,3.7,0
4.8,1.6,0
4.3,1.1,0
5.8,4.0,1.2,0
5.7,4.4,0
5.1,3.8,1.0,3.3,0.5,1.9,0
5.2,4.1,0
5.5,4.2,0
4.5,2.3,0.6,0
5.3,0
7.0,4.7,1
6.4,4.5,1
6.9,4.9,1
5.5,1
6.5,2.8,4.6,1
5.7,1
6.3,1
4.9,2.4,1
6.6,1
5.2,2.7,1
5.0,2.0,1
5.9,1
6.0,2.2,1
6.1,1
5.6,1
6.7,1
5.8,1
6.2,2.5,4.8,1.8,4.3,1
6.8,5.0,2.6,5.1,1
5.4,1
5.1,6.0,2
5.8,2
7.1,5.9,2.1,2
6.3,5.6,2
6.5,5.8,2
7.6,6.6,2
4.9,2
7.3,6.3,2
6.7,2
7.2,6.1,2
6.4,5.3,2
6.8,5.5,2
5.7,2
7.7,6.7,6.9,2
6.0,2
6.9,5.7,2
5.6,2
6.2,2
6.1,2
7.4,2
7.9,6.4,5.4,5.2,2
5.9,2

那代码怎么修改呢？

1）. 导入必要的包：

from pyspark.context import SparkContext
from pyspark.conf import SparkConf
from tensorflowonspark import TFCluster,TFNode
#from com.yahoo.ml.tf import TFCluster,TFNode
from datetime import datetime

这里要注意，导入TFCluster的时候，不要参考官网的导入方式，而应该从tensorflowonspark导入；

2.）修改main函数，比如我这里的函数run，只需要添加两个参数即可：（argv,cxt）

3) 把原来的main函数调用，替换成下面的调用方式，比如我这里原来只需要在main函数执行run即可，这里需要调用TFCluster.run，并且把我的run函数传递给第二个参数值：

sc = SparkContext(conf=SparkConf().setAppName("your_app_name"))
    num_executors = int(sc._conf.get("spark.executor.instances"))
    num_ps = 1
    tensorboard = True

    cluster = TFCluster.run(sc,run,sys.argv,num_executors,num_ps,tensorboard,TFCluster.InputMode.TENSORFLOW)
    cluster.shutdown()

然后就可以运行了，修改后的代码如下：

# from __future__ import absolute_import
# from __future__ import division
# from __future__ import print_function
from pyspark.context import SparkContext
from pyspark.conf import SparkConf
from tensorflowonspark import TFCluster,TFNode
from datetime import datetime
import numpy as np
import sys
# from sklearn import metrics
# from sklearn import model_selection

import tensorflow as tf

X_FEATURE = 'x'  # Name of the input feature.

train_percent = 0.8


def load_data(data_file_name):
    data = np.loadtxt(open(data_file_name),y_test.astype(int)


def map_run(argv,ctx):
    # Load dataset.
    data_file = 'iris01.csv'
    iris = load_data(data_file)
    # x_train,axis= 1))
    # score with tensorflow.
    scores = classifier.evaluate(input_fn=test_input_fn)
    print('Accuracy (tensorflow): {0:f}'.format(scores['accuracy']))

    print(classifier.params)


if __name__ == '__main__':
    import tensorflow as tf
    import sys
    sc = SparkContext(conf=SparkConf().setAppName("your_app_name"))
    num_executors = int(sc._conf.get("spark.executor.instances"))
    num_ps = 1
    tensorboard = False

    cluster = TFCluster.run(sc,map_run,TFCluster.InputMode.TENSORFLOW)
    cluster.shutdown()

7. 设置环境变量，并运行：

1)上传iris01.csv到HDFS： hdfs dfs -put iris01.csv

2) 设置环境变量：

export PYTHON_ROOT=./Python
export LD_LIBRARY_PATH=${PATH}
export PYSPARK_PYTHON=${PYTHON_ROOT}/bin/python
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=Python/bin/python"
export PATH=${PYTHON_ROOT}/bin/:$PATH
#export QUEUE=gpu

# set paths to libjvm.so,libhdfs.so,and libcuda*.so
#export LIB_HDFS=/opt/cloudera/parcels/CDH/lib64                      # for CDH (per @wangyum)
export LIB_HDFS=$HADOOP_PREFIX/lib/native
export LIB_JVM=$JAVA_HOME/jre/lib/amd64/server
#export LIB_CUDA=/usr/local/cuda-7.5/lib64

# for cpu mode:
 export QUEUE=default

3）调用代码：

/usr/local/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-memory 1024m --archives hdfs://s0:8020/user/root/Python.zip#Python,/root/iris01.csv /root/iris_c.py

4）查看yarn日志，可以看到执行成功；

5. 问题及解决

1） libc.so.6: version `GLIBC_2.14' not found

这个问题是由于Centos6的版本其GLIBC的版本是2.12 ，版本过低导致的；

解决思路：

a. 升级版本，这个选项不适用，由于这个软件是底层软件，升级后导致系统不稳定；

b. 编译一个可以在Centos6上运行的TensorFlow安装包，也就是本文的做法；

2）Cannot run program "patch" (in directory "/root/.cache/bazel/_bazel_root/6093305914d4a581ed00c0f6c06f975b/external/boringssl")

yum install patch

3)Traceback (most recent call last):
File "iris_c.py",line 6,in <module>
from com.yahoo.ml.tf import TFCluster,TFNode
ImportError: No module named com.yahoo.ml.tf

修改：
from com.yahoo.ml.tf import TFCluster,TFNode
=》
from tensorflowonspark import TFCluster,TFNode

6. 总结

1. 在编译tensorflow的时候遇到很多问题，使用bing的国际版查询效果会更好；

2. 暂时只能使用终端设置环境变量的方式执行程序，并且程序执行很慢，后面可以考虑使用开发工具直连提交任务，并着手提升效率；

分享，成长，快乐脚踏实地，专注转载请注明blog地址：http://blog.csdn.net/fansy1990

github/tensorflow/tensorflow/contrib/slim/

TensorFlow-Slim

TF-Slim 是一个轻量级的库，用来在TF中定义、训练和评估复杂模型。tf-slim能够自由混入原生TF和其它框架（如tf.contrib.learn中）。

用法

import tensorflow.contrib.slim as slim

为什么用TF-Slim?

TF-Slim中都有什么组成部分?

定义模型

变量

层

Scopes

实例: 实现VGG16

训练模型

Training Tensorflow models requires a model, a loss function, the gradient computation and a training routine that iteratively computes the gradients of the model weights relative to the loss and updates the weights accordingly. TF-Slim provides both common loss functions and a set of helper functions that run the training and evaluation routines.

损失

The loss function defines a quantity that we want to minimize. For classification problems, this is typically the cross entropy between the true distribution and the predicted probability distribution across classes. For regression problems, this is often the sum-of-squares differences between the predicted and true values.

Certain models, such as multi-task learning models, require the use of multiple loss functions simultaneously. In other words, the loss function ultimately being minimized is the sum of varIoUs other loss functions. For example, consider a model that predicts both the type of scene in an image as well as the depth from the camera of each pixel. This model's loss function would be the sum of the classification loss and depth prediction loss.

TF-Slim provides an easy-to-use mechanism for defining and keeping track of loss functions via the losses module. Consider the simple case where we want to train the VGG network:

Training Loop

TF-Slim provides a simple but powerful set of tools for training models found in learning.py. These include a Train function that repeatedly measures the loss, computes gradients and saves the model to disk, as well as several convenience functions for manipulating gradients. For example, once we've specified the model, the loss function and the optimization scheme, we can call slim.learning.create_train_op and slim.learning.train to perform the optimization:

实例: 训练VGG16模型

To illustrate this, let's examine the following sample of training the VGG network:

微调已存在的模型

Brief Recap on Restoring Variables from a Checkpoint

After a model has been trained, it can be restored using tf.train.Saver() which restores Variables from a given checkpoint. For many cases, tf.train.Saver() provides a simple mechanism to restore all or just a few variables.

Partially Restoring Models

It is often desirable to fine-tune a pre-trained model on an entirely new dataset or even a new task. In these situations, one can use TF-Slim's helper functions to select a subset of variables to restore:

Restoring models with different variable names

Fine-Tuning a Model on a different task

Consider the case where we have a pre-trained VGG16 model. The model was trained on the ImageNet dataset, which has 1000 classes. However, we would like to apply it to the Pascal VOC dataset which has only 20 classes. To do so, we can initialize our new model using the values of the pre-trained model excluding the final layer:

评估模型

Once we've trained a model (or even while the model is busy training) we'd like to see how well the model performs in practice. This is accomplished by picking a set of evaluation metrics, which will grade the model's performance, and the evaluation code which actually loads the data, performs inference, compares the results to the ground truth and records the evaluation scores. This step may be performed once or repeated periodically.

度量

我们定义一个度量来衡量训练效果，这不是一个损失函数（损失被用来在训练过程中进行优化的）。例如，我们训练时最小化log损失，但是评估模型时我们也许会用 F1 score ,或者 Intersection Over Union score（这个值不可微，因此也不能用在损失函数上）。

TF-Slim提供了一组度量 operations。笼统地讲，计算一个度量值可以被分为三部分：

初始化：初始化用来计算度量的变量。
Aggregation: perform operations (sums, etc) used to compute the metrics.
Finalization: (optionally) perform any final operation to compute metric values. For example, computing means, mins, maxes, etc.

例如，计算mean_absolute_error，两个变量 (count和total)被初始化为0。在 aggregation，我们得到一组predictions 和 labels，计算它们的绝对误差并总计为total。我们每增加一组，count也随之增加。最后，在 finalization阶段，total除以count来获得均值。

The following example demonstrates the API for declaring metrics. Because metrics are often evaluated on a test set which is different from the training set (upon which the loss is computed), we'll assume we're using test data:

images, labels = LoadTestData(...)
predictions = MyModel(images)

mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)

就像例子描述的那样，创建的metric返回两个值： value_op 和 update_op。 value_op是一个 idempotent operation 返回metric的当前值。update_op 是一个 operation，它执行 aggregation步骤并返回metric的值。

跟踪value_op 和update_op 费时费力。为了解决这个问题，TF-Slim提供两个方便的函数：

# 总计value和update ops 到两个列表中:
value_ops, update_ops = slim.metrics.aggregate_metrics(
    slim.metrics.streaming_mean_absolute_error(predictions, labels),
    slim.metrics.streaming_mean_squared_error(predictions, labels))

# 总起value和update ops 到两个字典中:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

hello tensorflow，我的第一个tensorflow程序

上代码：

import tensorflow as tf

if __name__==''__main__'': 
    g = tf.Graph()   
    # add ops to the user created graph
    with g.as_default(): 
        hello = tf.constant(''Hello Tensorflow'') 
        sess = tf.compat.v1.Session(graph=g)
        print(sess.run(hello))

输出如下图右侧：

说明：python3.7.4 ，tensorflow2.0

若对您有用，请赞助个棒棒糖~

SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器

昨天类似的 YOLO： https://www.v2ex.com/t/392671#reply0

下载这个项目

https://github.com/balancap/SSD-Tensorflow

解压 checkpoint files in ./checkpoint

unzip ssd_300_vgg.ckpt.zip

运行 jupyter 文件命令

jupyter notebook notebooks/ssd_notebook.ipynb

项目说明： http://www.tensorflownews.com/2017/09/22/ssd-single-shot-multibox-detector-in-tensorflow/

项目地址： https://github.com/balancap/SSD-Tensorflow

更多 TensorFlow 教程： http://www.tensorflownews.com

关于Tensorflow精度为.99，但预测很糟糕和tensorflow准确率的问题我们已经讲解完毕，感谢您的阅读，如果还想了解更多关于Centos6安装TensorFlow及TensorFlowOnSpark、github/tensorflow/tensorflow/contrib/slim/、hello tensorflow，我的第一个tensorflow程序、SSD-Tensorflow： 3 步运行 TensorFlow 单图片多盒目标检测器等相关内容，可以在本站寻找。

本文标签：