Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题

25-02-24 14

如果您对Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题感兴趣，那么本文将是一篇不错的选择，我们将为您详在本文中，您将会了解到关于Fli

如果您对Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题感兴趣，那么本文将是一篇不错的选择，我们将为您详在本文中，您将会了解到关于Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题的详细内容，并且为您提供关于.IllegalStateException: Previously tracked partitions [kafka.topic-0] been revoked by Kafka ??、.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition、.kafka.common.utils.AppInfoParser - Kafka commitId : unknown ？？、flink 读取kafka 数据，partition分配的有价值信息。

本文目录一览：

Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题
.IllegalStateException: Previously tracked partitions [kafka.topic-0] been revoked by Kafka ??
.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition
.kafka.common.utils.AppInfoParser - Kafka commitId : unknown ？？
flink 读取kafka 数据，partition分配

Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题

####1. 写在前面在利用flink实时计算的时候，往往会从kafka读取数据写入数据到kafka，但会发现当kafka多个Partitioner时，特别在P量级数据为了kafka的性能kafka的节点有十几个时，一个topic的Partitioner可能有几十个甚至更多，发现flink写入kafka的时候没有全部写Partitioner，而是写了部分的Partitioner，虽然这个问题不容易被发现，但这个问题会影响flink写入kafka的性能和造成单个Partitioner数据过多的问题，更严重的问题会导致单个Partitioner所在磁盘写满，为什么会出现这种问题，我们来分析flink写入kafka的源码，主要是FlinkKafkaProducer09这个类 ####2. 分析FlinkKafkaProducer09的源码

public class FlinkKafkaProducer09<IN> extends FlinkKafkaProducerBase<IN> {
    private static final long serialVersionUID = 1L;

    public FlinkKafkaProducer09(String brokerList, String topicId, SerializationSchema<IN> serializationSchema) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), getPropertiesFromBrokerList(brokerList), (FlinkKafkaPartitioner)(new FlinkFixedPartitioner()));
    }

    public FlinkKafkaProducer09(String topicId, SerializationSchema<IN> serializationSchema, Properties producerConfig) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), producerConfig, (FlinkKafkaPartitioner)(new FlinkFixedPartitioner()));
    }

    public FlinkKafkaProducer09(String topicId, SerializationSchema<IN> serializationSchema, Properties producerConfig, @Nullable FlinkKafkaPartitioner<IN> customPartitioner) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), producerConfig, (FlinkKafkaPartitioner)customPartitioner);
    }

    public FlinkKafkaProducer09(String brokerList, String topicId, KeyedSerializationSchema<IN> serializationSchema) {
        this(topicId, (KeyedSerializationSchema)serializationSchema, getPropertiesFromBrokerList(brokerList), (FlinkKafkaPartitioner)(new FlinkFixedPartitioner()));
    }

    public FlinkKafkaProducer09(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig) {
        this(topicId, (KeyedSerializationSchema)serializationSchema, producerConfig, (FlinkKafkaPartitioner)(new FlinkFixedPartitioner()));
    }

    public FlinkKafkaProducer09(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, @Nullable FlinkKafkaPartitioner<IN> customPartitioner) {
        super(topicId, serializationSchema, producerConfig, customPartitioner);
    }

    /** @deprecated */
    @Deprecated
    public FlinkKafkaProducer09(String topicId, SerializationSchema<IN> serializationSchema, Properties producerConfig, KafkaPartitioner<IN> customPartitioner) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), producerConfig, (KafkaPartitioner)customPartitioner);
    }

    /** @deprecated */
    @Deprecated
    public FlinkKafkaProducer09(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, KafkaPartitioner<IN> customPartitioner) {
        super(topicId, serializationSchema, producerConfig, new FlinkKafkaDelegatePartitioner(customPartitioner));
    }

    protected void flush() {
        if (this.producer != null) {
            this.producer.flush();
        }

    }
}

只关注下面这个两个构造器

	public FlinkKafkaProducer09(String brokerList, String topicId, SerializationSchema<IN> serializationSchema) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), getPropertiesFromBrokerList(brokerList), (FlinkKafkaPartitioner)(new FlinkFixedPartitioner()));
    	}

	public FlinkKafkaProducer09(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, @Nullable FlinkKafkaPartitioner<IN> customPartitioner) {
        super(topicId, serializationSchema, producerConfig, customPartitioner);
    	}

主要看第一个构造器，可以推测往Partition是这个类new FlinkFixedPartitioner())，再来关注该类

####3. 分析FlinkFixedPartitioner类的源码

public class FlinkFixedPartitioner<T> extends FlinkKafkaPartitioner<T> {
    private static final long serialVersionUID = -3785320239953858777L;
    private int parallelInstanceId;

    public FlinkFixedPartitioner() {
    }

    public void open(int parallelInstanceId, int parallelInstances) {
        Preconditions.checkArgument(parallelInstanceId >= 0, "Id of this subtask cannot be negative.");
        Preconditions.checkArgument(parallelInstances > 0, "Number of subtasks must be larger than 0.");
        this.parallelInstanceId = parallelInstanceId;
    }

    public int partition(T record, byte[] key, byte[] value, String targetTopic, int[] partitions) {
        Preconditions.checkArgument(partitions != null && partitions.length > 0, "Partitions of the target topic is empty.");
        return partitions[this.parallelInstanceId % partitions.length];
    }

    public boolean equals(Object o) {
        return this == o || o instanceof FlinkFixedPartitioner;
    }

    public int hashCode() {
        return FlinkFixedPartitioner.class.hashCode();
    }
}

根据代码可以推测 return partitions[this.parallelInstanceId % partitions.length]代码会导致有的partition无法写到，现在来自己重写一个类似FlinkKafkaProducer09的类MyFlinkKafkaProducer09

####4. 分析原生自带的FlinkKafkaProducer09的逻辑 1>.需要继承FlinkKafkaProducerBase类

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package org.apache.flink.streaming.connectors.kafka;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Properties;
import java.util.Map.Entry;
import org.apache.flink.annotation.Internal;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.RuntimeContext;
import org.apache.flink.api.java.ClosureCleaner;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.metrics.MetricGroup;
import org.apache.flink.runtime.state.FunctionInitializationContext;
import org.apache.flink.runtime.state.FunctionSnapshotContext;
import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.sink.SinkFunction.Context;
import org.apache.flink.streaming.api.operators.StreamingRuntimeContext;
import org.apache.flink.streaming.connectors.kafka.internals.metrics.KafkaMetricWrapper;
import org.apache.flink.streaming.connectors.kafka.partitioner.FlinkKafkaDelegatePartitioner;
import org.apache.flink.streaming.connectors.kafka.partitioner.FlinkKafkaPartitioner;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.apache.flink.util.NetUtils;
import org.apache.flink.util.SerializableObject;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.MetricName;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.serialization.ByteArraySerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Internal
public abstract class FlinkKafkaProducerBase<IN> extends RichSinkFunction<IN> implements CheckpointedFunction {
    private static final Logger LOG = LoggerFactory.getLogger(FlinkKafkaProducerBase.class);
    private static final long serialVersionUID = 1L;
    public static final String KEY_DISABLE_METRICS = "flink.disable-metrics";
    protected final Properties producerConfig;
    protected final String defaultTopicId;
    protected final KeyedSerializationSchema<IN> schema;
    protected final FlinkKafkaPartitioner<IN> flinkKafkaPartitioner;
    protected final Map<String, int[]> topicPartitionsMap;
    protected boolean logFailuresOnly;
    protected boolean flushOnCheckpoint = true;
    protected transient KafkaProducer<byte[], byte[]> producer;
    protected transient Callback callback;
    protected transient volatile Exception asyncException;
    protected final SerializableObject pendingRecordsLock = new SerializableObject();
    protected long pendingRecords;

    public FlinkKafkaProducerBase(String defaultTopicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, FlinkKafkaPartitioner<IN> customPartitioner) {
        Objects.requireNonNull(defaultTopicId, "TopicID not set");
        Objects.requireNonNull(serializationSchema, "serializationSchema not set");
        Objects.requireNonNull(producerConfig, "producerConfig not set");
        ClosureCleaner.clean(customPartitioner, true);
        ClosureCleaner.ensureSerializable(serializationSchema);
        this.defaultTopicId = defaultTopicId;
        this.schema = serializationSchema;
        this.producerConfig = producerConfig;
        this.flinkKafkaPartitioner = customPartitioner;
        if (!producerConfig.containsKey("key.serializer")) {
            this.producerConfig.put("key.serializer", ByteArraySerializer.class.getName());
        } else {
            LOG.warn("Overwriting the ''{}'' is not recommended", "key.serializer");
        }

        if (!producerConfig.containsKey("value.serializer")) {
            this.producerConfig.put("value.serializer", ByteArraySerializer.class.getName());
        } else {
            LOG.warn("Overwriting the ''{}'' is not recommended", "value.serializer");
        }

        if (!this.producerConfig.containsKey("bootstrap.servers")) {
            throw new IllegalArgumentException("bootstrap.servers must be supplied in the producer config properties.");
        } else {
            this.topicPartitionsMap = new HashMap();
        }
    }

    public void setLogFailuresOnly(boolean logFailuresOnly) {
        this.logFailuresOnly = logFailuresOnly;
    }

    public void setFlushOnCheckpoint(boolean flush) {
        this.flushOnCheckpoint = flush;
    }

    @VisibleForTesting
    protected <K, V> KafkaProducer<K, V> getKafkaProducer(Properties props) {
        return new KafkaProducer(props);
    }

    public void open(Configuration configuration) {
        this.producer = this.getKafkaProducer(this.producerConfig);
        RuntimeContext ctx = this.getRuntimeContext();
        if (null != this.flinkKafkaPartitioner) {
            if (this.flinkKafkaPartitioner instanceof FlinkKafkaDelegatePartitioner) {
                ((FlinkKafkaDelegatePartitioner)this.flinkKafkaPartitioner).setPartitions(getPartitionsByTopic(this.defaultTopicId, this.producer));
            }

            this.flinkKafkaPartitioner.open(ctx.getIndexOfThisSubtask(), ctx.getNumberOfParallelSubtasks());
        }

        LOG.info("Starting FlinkKafkaProducer ({}/{}) to produce into default topic {}", new Object[]{ctx.getIndexOfThisSubtask() + 1, ctx.getNumberOfParallelSubtasks(), this.defaultTopicId});
        if (!Boolean.parseBoolean(this.producerConfig.getProperty("flink.disable-metrics", "false"))) {
            Map<MetricName, ? extends Metric> metrics = this.producer.metrics();
            if (metrics == null) {
                LOG.info("Producer implementation does not support metrics");
            } else {
                MetricGroup kafkaMetricGroup = this.getRuntimeContext().getMetricGroup().addGroup("KafkaProducer");
                Iterator var5 = metrics.entrySet().iterator();

                while(var5.hasNext()) {
                    Entry<MetricName, ? extends Metric> metric = (Entry)var5.next();
                    kafkaMetricGroup.gauge(((MetricName)metric.getKey()).name(), new KafkaMetricWrapper((Metric)metric.getValue()));
                }
            }
        }

        if (this.flushOnCheckpoint && !((StreamingRuntimeContext)this.getRuntimeContext()).isCheckpointingEnabled()) {
            LOG.warn("Flushing on checkpoint is enabled, but checkpointing is not enabled. Disabling flushing.");
            this.flushOnCheckpoint = false;
        }

        if (this.logFailuresOnly) {
            this.callback = new Callback() {
                public void onCompletion(RecordMetadata metadata, Exception e) {
                    if (e != null) {
                        FlinkKafkaProducerBase.LOG.error("Error while sending record to Kafka: " + e.getMessage(), e);
                    }

                    FlinkKafkaProducerBase.this.acknowledgeMessage();
                }
            };
        } else {
            this.callback = new Callback() {
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if (exception != null && FlinkKafkaProducerBase.this.asyncException == null) {
                        FlinkKafkaProducerBase.this.asyncException = exception;
                    }

                    FlinkKafkaProducerBase.this.acknowledgeMessage();
                }
            };
        }

    }

    public void invoke(IN next, Context context) throws Exception {
        this.checkErroneous();
        byte[] serializedKey = this.schema.serializeKey(next);
        byte[] serializedValue = this.schema.serializeValue(next);
        String targetTopic = this.schema.getTargetTopic(next);
        if (targetTopic == null) {
            targetTopic = this.defaultTopicId;
        }

        int[] partitions = (int[])this.topicPartitionsMap.get(targetTopic);
        if (null == partitions) {
            partitions = getPartitionsByTopic(targetTopic, this.producer);
            this.topicPartitionsMap.put(targetTopic, partitions);
        }

        ProducerRecord record;
        if (this.flinkKafkaPartitioner == null) {
            record = new ProducerRecord(targetTopic, serializedKey, serializedValue);
        } else {
            record = new ProducerRecord(targetTopic, this.flinkKafkaPartitioner.partition(next, serializedKey, serializedValue, targetTopic, partitions), serializedKey, serializedValue);
        }

        if (this.flushOnCheckpoint) {
            synchronized(this.pendingRecordsLock) {
                ++this.pendingRecords;
            }
        }

        this.producer.send(record, this.callback);
    }

    public void close() throws Exception {
        if (this.producer != null) {
            this.producer.close();
        }

        this.checkErroneous();
    }

    private void acknowledgeMessage() {
        if (this.flushOnCheckpoint) {
            synchronized(this.pendingRecordsLock) {
                --this.pendingRecords;
                if (this.pendingRecords == 0L) {
                    this.pendingRecordsLock.notifyAll();
                }
            }
        }

    }

    protected abstract void flush();

    public void initializeState(FunctionInitializationContext context) throws Exception {
    }

    public void snapshotState(FunctionSnapshotContext ctx) throws Exception {
        this.checkErroneous();
        if (this.flushOnCheckpoint) {
            this.flush();
            synchronized(this.pendingRecordsLock) {
                if (this.pendingRecords != 0L) {
                    throw new IllegalStateException("Pending record count must be zero at this point: " + this.pendingRecords);
                }

                this.checkErroneous();
            }
        }

    }

    protected void checkErroneous() throws Exception {
        Exception e = this.asyncException;
        if (e != null) {
            this.asyncException = null;
            throw new Exception("Failed to send data to Kafka: " + e.getMessage(), e);
        }
    }

    public static Properties getPropertiesFromBrokerList(String brokerList) {
        String[] elements = brokerList.split(",");
        String[] var2 = elements;
        int var3 = elements.length;

        for(int var4 = 0; var4 < var3; ++var4) {
            String broker = var2[var4];
            NetUtils.getCorrectHostnamePort(broker);
        }

        Properties props = new Properties();
        props.setProperty("bootstrap.servers", brokerList);
        return props;
    }

    protected static int[] getPartitionsByTopic(String topic, KafkaProducer<byte[], byte[]> producer) {
        List<PartitionInfo> partitionsList = new ArrayList(producer.partitionsFor(topic));
        Collections.sort(partitionsList, new Comparator<PartitionInfo>() {
            public int compare(PartitionInfo o1, PartitionInfo o2) {
                return Integer.compare(o1.partition(), o2.partition());
            }
        });
        int[] partitions = new int[partitionsList.size()];

        for(int i = 0; i < partitions.length; ++i) {
            partitions[i] = ((PartitionInfo)partitionsList.get(i)).partition();
        }

        return partitions;
    }

    @VisibleForTesting
    protected long numPendingRecords() {
        synchronized(this.pendingRecordsLock) {
            return this.pendingRecords;
        }
    }
}

2>.需要关注FlinkKafkaProducerBase类下面的构造器:

public FlinkKafkaProducerBase(String defaultTopicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, FlinkKafkaPartitioner<IN> customPartitioner) {
        Objects.requireNonNull(defaultTopicId, "TopicID not set");
        Objects.requireNonNull(serializationSchema, "serializationSchema not set");
        Objects.requireNonNull(producerConfig, "producerConfig not set");
        ClosureCleaner.clean(customPartitioner, true);
        ClosureCleaner.ensureSerializable(serializationSchema);
        this.defaultTopicId = defaultTopicId;
        this.schema = serializationSchema;
        this.producerConfig = producerConfig;
        this.flinkKafkaPartitioner = customPartitioner;
        if (!producerConfig.containsKey("key.serializer")) {
            this.producerConfig.put("key.serializer", ByteArraySerializer.class.getName());
        } else {
            LOG.warn("Overwriting the ''{}'' is not recommended", "key.serializer");
        }

        if (!producerConfig.containsKey("value.serializer")) {
            this.producerConfig.put("value.serializer", ByteArraySerializer.class.getName());
        } else {
            LOG.warn("Overwriting the ''{}'' is not recommended", "value.serializer");
        }

        if (!this.producerConfig.containsKey("bootstrap.servers")) {
            throw new IllegalArgumentException("bootstrap.servers must be supplied in the producer config properties.");
        } else {
            this.topicPartitionsMap = new HashMap();
        }
    }

3>.同时关注类下面invoke()方法的以下代码

	if (this.flinkKafkaPartitioner == null) {
            record = new ProducerRecord(targetTopic, serializedKey, serializedValue);
        }

4>.再来分析ProducerRecord这个类

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package org.apache.kafka.clients.producer;

public final class ProducerRecord<K, V> {
    private final String topic;
    private final Integer partition;
    private final K key;
    private final V value;

    public ProducerRecord(String topic, Integer partition, K key, V value) {
        if (topic == null) {
            throw new IllegalArgumentException("Topic cannot be null");
        } else {
            this.topic = topic;
            this.partition = partition;
            this.key = key;
            this.value = value;
        }
    }

    public ProducerRecord(String topic, K key, V value) {
        this(topic, (Integer)null, key, value);
    }

    public ProducerRecord(String topic, V value) {
        this(topic, (Object)null, value);
    }

    public String topic() {
        return this.topic;
    }

    public K key() {
        return this.key;
    }

    public V value() {
        return this.value;
    }

    public Integer partition() {
        return this.partition;
    }

    public String toString() {
        String key = this.key == null ? "null" : this.key.toString();
        String value = this.value == null ? "null" : this.value.toString();
        return "ProducerRecord(topic=" + this.topic + ", partition=" + this.partition + ", key=" + key + ", value=" + value;
    }

    public boolean equals(Object o) {
        if (this == o) {
            return true;
        } else if (!(o instanceof ProducerRecord)) {
            return false;
        } else {
            ProducerRecord that;
            label56: {
                that = (ProducerRecord)o;
                if (this.key != null) {
                    if (this.key.equals(that.key)) {
                        break label56;
                    }
                } else if (that.key == null) {
                    break label56;
                }

                return false;
            }

            label49: {
                if (this.partition != null) {
                    if (this.partition.equals(that.partition)) {
                        break label49;
                    }
                } else if (that.partition == null) {
                    break label49;
                }

                return false;
            }

            if (this.topic != null) {
                if (!this.topic.equals(that.topic)) {
                    return false;
                }
            } else if (that.topic != null) {
                return false;
            }

            if (this.value != null) {
                if (!this.value.equals(that.value)) {
                    return false;
                }
            } else if (that.value != null) {
                return false;
            }

            return true;
        }
    }

    public int hashCode() {
        int result = this.topic != null ? this.topic.hashCode() : 0;
        result = 31 * result + (this.partition != null ? this.partition.hashCode() : 0);
        result = 31 * result + (this.key != null ? this.key.hashCode() : 0);
        result = 31 * result + (this.value != null ? this.value.hashCode() : 0);
        return result;
    }
}

5>. 关注该类下面的构造器

	public ProducerRecord(String topic, Integer partition, K key, V value) {
        if (topic == null) {
            throw new IllegalArgumentException("Topic cannot be null");
        } else {
            this.topic = topic;
            this.partition = partition;
            this.key = key;
            this.value = value;
        }
    }

6>. 从代码中可以看到获取默认的partition即所有partition,那么我们自定义的类就会很简单的

####5. 编写自己的flink写入kafka的类MyFlinkKafkaProducer09

package com.run;

import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase;
import org.apache.flink.streaming.connectors.kafka.partitioner.FlinkKafkaPartitioner;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchemaWrapper;
import org.codehaus.commons.nullanalysis.Nullable;

import java.util.Properties;

public class MyFlinkKafkaProducer09<IN> extends FlinkKafkaProducerBase<IN> {

    public MyFlinkKafkaProducer09(String brokerList, String topicId, SerializationSchema<IN> serializationSchema) {
        this(topicId, (KeyedSerializationSchema)(new KeyedSerializationSchemaWrapper(serializationSchema)), getPropertiesFromBrokerList(brokerList),null);
    }

    public MyFlinkKafkaProducer09(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, @Nullable FlinkKafkaPartitioner<IN> customPartitioner) {
        super(topicId, serializationSchema, producerConfig, customPartitioner);
    }

    protected void flush() {
        if (this.producer != null) {
            this.producer.flush();
        }
    }
}

在这里不给FlinkKafkaPartitioner(new FlinkFixedPartitioner()),直接给一个null，FlinkKafkaProducerBase会直接写所有的Partitioner 在自己的实时计算程序应用

distributeDataStream.addSink(new FlinkKafkaProducer09<String>("localhost:9092", "my-topic", new SimpleStringSchema()));

.IllegalStateException: Previously tracked partitions [kafka.topic-0] been revoked by Kafka ??

.IllegalStateException: Previously tracked partitions [kafka.topic-0] been revoked by Kafka because of consumer rebalance. This is mostly due to another stream with same group id joined, please check if there''re different streaming application misconfigure to use same group id. Fundamentally different stream should use different group id
   at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:200)
   at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:228)
   at org.apache.spark.streaming.dstream.DStreamanonfun$getOrCompute$1

anonfun$1

nonfun

$apply

$7.apply(DStream.scala:342)atorg.apache.spark.streaming.dstream.DStream

anonfun$getOrCompute$1

nonfun

anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.dstream.DStream

nonfun

$getOrCompute

anonfun$1.apply(DStream.scala:341)
at org.apache.spark.streaming.dstream.DStream

nonfun

$getOrCompute

anonfun$1.apply(DStream.scala:341)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.DStream

nonfun

$getOrCompute

$1.apply(DStream.scala:336)atorg.apache.spark.streaming.dstream.DStream

anonfun$getOrCompute$1.apply(DStream.scala:334)
   at scala.Option.orElse(Option.scala:289)
   at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
   at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
   at org.apache.spark.streaming.DStreamGraph

nonfun

$1.apply(DStreamGraph.scala:122)atorg.apache.spark.streaming.DStreamGraph

anonfun$1.apply(DStreamGraph.scala:121)
at scala.collection.TraversableLike

nonfun

$flatMap

$1.apply(TraversableLike.scala:241)atscala.collection.TraversableLike

anonfun$flatMap$1.apply(TraversableLike.scala:241)

.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition

Exception in thread "main" org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions: [edubox.event.detail.topic-0]
   at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsets(Fetcher.java:425)
   at org.apache.kafka.clients.consumer.internals.Fetcher.updateFetchPositions(Fetcher.java:284)
   at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1651)
   at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1083)
   at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
   at rongan.test.SimpleKafkaConsumer131.consume(SimpleKafkaConsumer131.java:46)
   at rongan.test.SimpleKafkaConsumer131.main(SimpleKafkaConsumer131.java:79)

.kafka.common.utils.AppInfoParser - Kafka commitId : unknown ？？

flink.kafka011.shaded.org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : unknown
2020-07-30 16:07:08,719 WARN org.apache.flink.kafka011.shaded.org.apache.kafka.clients.NetworkClient - Bootstrap broker master:9092 (id: -1 rack: null) disconnected

flink 读取kafka 数据，partition分配

每个并发有个编号，只会读取kafka partition % 总并发数 == 编号的分区

如： 6 分区， 4个并发

分区： p0 p1 p2 p3 p4 p5

并发： 0 1 2 3

分区 p0 分配给并发 0 ： 0 % 4 = 0

分区 p1分配给并发1： 1 % 4 = 1

分区 p2分配给并发2： 2 % 4 = 2

分区 p3 分配给并发 3： 3 % 4 = 3

分区 p4 分配给并发 0 ： 4 % 4 = 0

分区 p5 分配给并发 5 ： 5 % 4 = 1

源码解析：

FlinkKafkaConsumerBase.java 458 行 open 方法：

public void open(Configuration configuration) throws Exception {

调用 AbstractPartitionDiscoverer 类的方法 allPartitions ，发现并分配分区：

final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();

发现订阅的全部分区，并移除部分分区：

Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
KafkaTopicPartition nextPartition;
while (iter.hasNext()) {
   nextPartition = iter.next();
   if (!setAndCheckDiscoveredPartition(nextPartition)) {
      iter.remove();
   }
}
添加还未发现的分区

public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
   if (isUndiscoveredPartition(partition)) {
      discoveredPartitions.add(partition);

      return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks) == indexOfThisSubtask;
   }

   return false;
}

public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
   int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;

   // here, the assumption is that the id of Kafka partitions are always ascending
   // starting from 0, and therefore can be used directly as the offset clockwise from the start index
   return (startIndex + partition.getPartition()) % numParallelSubtasks;
}

先发现订阅的主题的所有新分区，循环所有新分区，把还未发现的新分区添加到发现的分区列表，把还未发现的新分区的

startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;

求 startIndex（不知道干嘛）， startIndex + 分区编号，对并发数取余，
返回的结果与当前sub task 的并发编号相比，如果不相等，就把分区从新分区列表中移除，最后剩下的分区就是当前并发
读取的分区。

关于Flink写入kafka时，只写入kafka的部分Partitioner，无法写所有的Partitioner问题的介绍已经告一段落，感谢您的耐心阅读，如果想了解更多关于.IllegalStateException: Previously tracked partitions [kafka.topic-0] been revoked by Kafka ??、.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition、.kafka.common.utils.AppInfoParser - Kafka commitId : unknown ？？、flink 读取kafka 数据，partition分配的相关信息，请在本站寻找。

本文标签：