kafka生产者

原创
半兽人 发表于: 2015-03-10   最后更新时间: 2019-09-18 10:30:34  
{{totalSubscript}} 订阅, 34,423 游览

负载平衡

The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriate direct its requests.
生产者将数据直接发送到分区leader的broker上(没有任何干预的路由层)。为了帮助producer做到这一点,Kafka所有节点都可应答给producer哪些服务器是正常的,哪些topic分区的leader允许producer在给定的时间内可以直接请求。

The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it can be done by some semantic partitioning function. We expose the interface for semantic partitioning by allowing the user to specify a key to partition by and using this to hash to a partition (there is also an option to override the partition function if need be). For example if the key chosen was a user id then all data for a given user would be sent to the same partition. This in turn will allow consumers to make locality assumptions about their consumption. This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers.
客户端控制消息发布到哪个parition,可以随机,实现一种的随机负载平衡,或者也可以通过语义分区函数,我们暴露接口,以允许用户通过key去指定分区和使用hash来指向分区(如果需要,可重写分区函数)。例如:如果选择的key是用户ID,那么对给定的用户ID的所有数据将被发送到相同分区。反过来,消费者有能指定消费那个分区,这种设计风格,让消费者可以对敏感性的消息进行局部处理。

异步发送 asynchronous send

Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer will attempt to accumulate data in memory and to send out larger batches in a single request. The batching can be configured to accumulate no more than a fixed number of messages and to wait no longer than some fixed latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to send, and few larger I/O operations on the servers. This buffering is configurable and gives a mechanism to trade off a small amount of additional latency for better throughput.
批处理是效率的一大驱动力,kafka生产者使用批处理试图在内存中积累数据,在单个请求发送累积的大批量数据,可以配置批处理积累的不大于一定的消息数,并等待时间不超过配置的延迟(64k 或 10毫秒)。这将累积更多消息 用于少数较大的I/O操作上,为了更好的吞吐量,这种缓存是可配置,并给出一种来权衡极少量的额外的延迟的机制。

Details on configuration and api for the producer can be found elsewhere in the documentation.
生产者的配置和api的详细信息可以在其他文档中找到。

更新于 2019-09-18

..... 5年前

...以允许用户通过key去指定分区和使用使用hash来指向分区(如果需要,可重写分区函数)。...
使用使用

半兽人 -> ..... 5年前

已优化,感谢,多多指正。

. 8年前

To help the producer do this all Kafka nodes can answer a request for metadata about which
servers are alive and where the leaders for the partitions of a topic
are at any given time to allow the producer to appropriate direct its
requests.
为了帮助生产者获得所有Kafka节点的元数据,通过应答请求判断哪些服务器是活着,哪里的topic的分区的leader在给定的时间允许生产者直接请求。
这个翻译的不太对吧,
为了帮助producer做这个,所有的kafka节点都可以响应对一些元数据的请求,哪些服务器是活着的?哪些topic的partition的leader可以允许producer在给定的时间内直接请求?

半兽人 -> . 8年前

已优化,感谢分享。

查看kafka更多相关的文章或提一个关于kafka的问题,也可以与我们一起分享文章