突发消息难题突发消息难题

突发消息是许多流式应用程序的克星。它们就像那个突然带着50个人来你家吃晚饭的朋友。你没有准备好，感到不知所措，肯定也不会觉得愉快。

引入 Kafka Streams 和 Quarkus

那么，为什么选择 Kafka Streams 和 Quarkus 来应对这项艰巨的任务呢？这就像问为什么要选择法拉利来参加比赛。Kafka Streams 是为高吞吐量事件处理而构建的，而 Quarkus 则带来了其超音速、亚原子的 Java 能力。

Kafka Streams：分布式、可扩展且容错。非常适合处理海量数据流。
Quarkus：轻量级、启动速度快、内存占用低。理想的云原生环境。

它们就像突发消息处理的蝙蝠侠和罗宾。让我们看看如何利用它们的力量。

为突发消息架构设计

在深入代码之前，让我们了解一下 Kafka Streams 如何处理数据。这一切都与拓扑有关，伙计！


StreamsBuilder builder = new StreamsBuilder();
KStream inputStream = builder.stream("input-topic");

KStream processedStream = inputStream
    .filter((key, value) -> value != null)
    .mapValues(value -> value.toUpperCase());

processedStream.to("output-topic");

Topology topology = builder.build();

这个简单的拓扑从输入主题读取，过滤掉空值，将消息转换为大写，并写入输出主题。但我们如何使其对突发消息具有抵抗力呢？

并行宇宙：配置并发性

处理突发消息的关键是并行性。让我们调整 Quarkus 配置以释放 Kafka Streams 的全部威力：


# application.properties
kafka-streams.num.stream.threads=4
kafka-streams.max.poll.records=500
quarkus.kafka-streams.topics=input-topic,output-topic

这里发生了什么：

num.stream.threads：我们告诉 Kafka Streams 使用4个线程进行处理。根据你的 CPU 核心数进行调整。
max.poll.records：限制单个轮询周期内处理的记录数量，防止应用程序处理过多数据。

缓冲区溢出：管理数据流

在处理突发消息时，缓冲是你的好朋友。它就像是你的消息的候诊室。让我们配置一些与缓冲相关的属性：


kafka-streams.buffer.memory=67108864
kafka-streams.batch.size=16384
kafka-streams.linger.ms=100

这些设置有助于管理数据流：

buffer.memory：生产者可以用来缓冲记录的内存总字节数。
batch.size：请求的最大字节大小。
linger.ms：如果批次未满，等待多长时间再发送。

背压：说“慢下来”的艺术

在处理突发消息时，背压至关重要。就像告诉你健谈的朋友，“等一下，我需要时间来处理你刚才说的话。”在 Kafka Streams 中，我们可以使用 Produced 类实现背压：


processedStream.to("output-topic", Produced.with(Serdes.String(), Serdes.String())
    .withStreamPartitioner((topic, key, value, numPartitions) -> {
        // 自定义分区逻辑以分配负载
        return Math.abs(key.hashCode()) % numPartitions;
    }));

这个自定义分区器有助于将负载分配到各个分区，防止任何单个分区成为瓶颈。

心态：优化状态存储

Kafka Streams 中的状态存储在突发处理期间可能成为性能瓶颈。让我们优化它们：


kafka-streams.state.dir=/path/to/state/dir
kafka-streams.commit.interval.ms=100
kafka-streams.cache.max.bytes.buffering=10485760

这些设置有助于更高效地管理状态：

state.dir：存储状态的位置。使用快速 SSD 以获得最佳性能。
commit.interval.ms：保存处理进度的频率。
cache.max.bytes.buffering：提交前用于缓冲记录的最大内存。

压缩以惊艳：消息压缩

在处理突发消息时，每个字节都很重要。让我们启用压缩：


kafka-streams.compression.type=lz4

LZ4 提供了良好的压缩比和速度平衡，非常适合处理突发。

信任但要验证：测试和监控

现在我们已经优化了应用程序，如何知道它能否处理突发？进入压力测试和监控。

使用 JMeter 进行压力测试

创建一个 JMeter 测试计划来模拟 50K 消息的突发：


<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.4.1">
  <hashTree>
    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Kafka Burst Test" enabled="true">
      <stringProp name="TestPlan.comments"></stringProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
      <boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
        <collectionProp name="Arguments.arguments"/>
      </elementProp>
      <stringProp name="TestPlan.user_define_classpath"></stringProp>
    </TestPlan>
    <hashTree>
      <ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Kafka Producers" enabled="true">
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <stringProp name="LoopController.loops">50000</stringProp>
        </elementProp>
        <stringProp name="ThreadGroup.num_threads">10</stringProp>
        <stringProp name="ThreadGroup.ramp_time">1</stringProp>
        <boolProp name="ThreadGroup.scheduler">false</boolProp>
        <stringProp name="ThreadGroup.duration"></stringProp>
        <stringProp name="ThreadGroup.delay"></stringProp>
        <boolProp name="ThreadGroup.same_user_on_next_iteration">true</boolProp>
      </ThreadGroup>
      <hashTree>
        <JavaSampler guiclass="JavaTestSamplerGui" testclass="JavaSampler" testname="Java Request" enabled="true">
          <elementProp name="arguments" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" enabled="true">
            <collectionProp name="Arguments.arguments">
              <elementProp name="kafka.topic" elementType="Argument">
                <stringProp name="Argument.name">kafka.topic</stringProp>
                <stringProp name="Argument.value">input-topic</stringProp>
                <stringProp name="Argument.metadata">=</stringProp>
              </elementProp>
              <elementProp name="kafka.key" elementType="Argument">
                <stringProp name="Argument.name">kafka.key</stringProp>
                <stringProp name="Argument.value">${__UUID()}</stringProp>
                <stringProp name="Argument.metadata">=</stringProp>
              </elementProp>
              <elementProp name="kafka.message" elementType="Argument">
                <stringProp name="Argument.name">kafka.message</stringProp>
                <stringProp name="Argument.value">Test message ${__threadNum}</stringProp>
                <stringProp name="Argument.metadata">=</stringProp>
              </elementProp>
            </collectionProp>
          </elementProp>
          <stringProp name="classname">com.example.KafkaProducerSampler</stringProp>
        </JavaSampler>
        <hashTree/>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

这个测试计划模拟了10个线程，每个线程发送5,000条消息，总计50,000条突发消息。

使用 Prometheus 和 Grafana 进行监控

设置 Prometheus 和 Grafana 来监控你的 Quarkus 应用程序。在 application.properties 中添加以下内容：


quarkus.micrometer.export.prometheus.enabled=true
quarkus.micrometer.binder.kafka.enabled=true

创建一个 Grafana 仪表板来可视化消息吞吐量、处理时间和资源使用等指标。

大结局：整合一切

现在我们已经优化、配置并测试了在 Quarkus 上的 Kafka Streams 应用程序，让我们看看它的实际表现：


@ApplicationScoped
public class BurstMessageProcessor {

    @Inject
    StreamsBuilder streamsBuilder;

    @Produces
    @ApplicationScoped
    public Topology buildTopology() {
        KStream inputStream = streamsBuilder.stream("input-topic");

        KStream processedStream = inputStream
            .filter((key, value) -> value != null)
            .mapValues(value -> value.toUpperCase())
            .peek((key, value) -> System.out.println("Processing: " + value));

        processedStream.to("output-topic", Produced.with(Serdes.String(), Serdes.String())
            .withStreamPartitioner((topic, key, value, numPartitions) -> {
                return Math.abs(key.hashCode()) % numPartitions;
            }));

        return streamsBuilder.build();
    }
}

这个由 Quarkus 驱动的 Kafka Streams 应用程序现在已经准备好像冠军一样处理那50K的突发消息！

总结：学到的经验

在 Quarkus 上处理 Kafka Streams 的突发消息并非易事，但使用正确的技术，它是完全可管理的。我们学到了什么：

并行是关键：使用多个线程和分区来分配负载。
明智地缓冲：配置你的缓冲区以平滑突发。
实现背压：不要让你的应用程序处理过多数据。
优化状态存储：快速高效的状态管理对于高吞吐量处理至关重要。
压缩消息：通过智能压缩节省带宽和处理能力。
测试和监控：始终验证你的优化并关注性能。

记住，处理突发消息既是一门艺术，也是一门科学。不断实验、测试和优化。你的 Kafka Streams 应用程序会感谢你，当用户在最繁忙的时刻体验到闪电般的处理速度时，他们也会感谢你。

现在，去驯服那些消息突发吧，就像你是流处理的超级英雄一样！

“在流处理的世界里，不是你能打多重，而是你能承受多重打击并继续前进。” - 洛奇·巴尔博亚（如果他是一名数据工程师的话）

祝流处理愉快！

引入 Kafka Streams 和 Quarkus

为突发消息架构设计

并行宇宙：配置并发性

缓冲区溢出：管理数据流

背压：说“慢下来”的艺术

心态：优化状态存储

压缩以惊艳：消息压缩

信任但要验证：测试和监控

使用 JMeter 进行压力测试

使用 Prometheus 和 Grafana 进行监控

大结局：整合一切

总结：学到的经验

更多 此分类 的内容 Programming

自愈架构：从理论到实践

为后端系统设计优雅降级：当情况变得艰难时，你的系统依然坚韧

超越数组和哈希映射：让你的后端如歌般流畅的高级数据结构

Kubernetes 无需 Docker：摆脱鲸鱼的束缚

加入我们的社区 👋

更多此分类的内容 Programming