Tuning Throughput from Azure Event Hub to Azure Stream Analytics

,

[read this post on Mr. Fox SQL blog]

Recently I had a requirement to load streaming JSON data to provide a data feed for near real-time reporting.  The solution streamed data into an “Ingress” Azure Event Hub, shred the JSON via Azure Stream Analytics and then push subsections of data as micro-batches (1 sec) into a “Egress” Azure Event Hub (for loading into a stage table in Azure SQL DW).

In Event Hubs and Stream Analytics there are only a few performance levers to help tune a solution like this, or said another way, doing nothing with these levers can affect your ongoing performance!

So this blog is to show the performance differences when using different Azure Event Hub partition configurations and the Azure Stream Analytics PARTITION BY clause.

Understanding the Environment Levers

When you configure Azure Event Hub you only have 2 levers;

  • Throughput Units (TU) – TU’s apply to all event hubs in a namespace and are pre-purchased units of capacity between 1 and 20 (or more by contacting support). Each partition has a maximum scale of 1 TU.  Each TU is 1MB/sec ingress + 2MB/sec egress + 84GB event storage.  For our test we wont change TU
  • Partitions – The number of partitions is specified at creation and must be between 2 and 32. The partition count is not changeable. Partitions are a data organization mechanism that relates to the downstream parallelism required when consuming data.
  • You can learn more about Event Hubs herehttps://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs

When you configure Azure Stream Analytics you only have 2 levers;

  • Streaming Units (SU) – Each SU is a blend of compute, memory and throughput between 1 and 48 (or more by contacting support).  The factors that impact SU are query complexity, latency, and volume of data. SU can be used to scale out a job to achieve higher throughput. Depending on query complexity and throughput required, more SU units may be necessary to achieve your performance requirements.  A level of SU6 assigns an entire Stream Analytics node.   For our test we wont change SU
  • SQL Query Design – Queries are expressed in a SQL-like query language. These queries are documented in the query language reference guide and includes several common query patterns.  The design of the query can greatly affect the job throughput, in particular if and/or how the PARTITION BY clause is used.
  • You can learn more about Stream Analytics herehttps://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction

 

Test Patterns and Scenarios

When running streaming solutions its important to aim to get maximum parallelism whch allows the greatest throughput.

I’m going to keep it simple and verify the difference between running ingress and egress Event Hub with different partition settings, and combine a Stream Analytics SQL query design using the PARTITION BY clause.

 

TESTINPUT

[Ingress AEH]

QUERY

[ASA]

OUTPUT

[Egress AEH]

TUPartitionsSUSQL QueryTUPartitions
1521NONE516
2521PARTITION BY516
35161NONE52
45161PARTITION BY52
5521NONE52
6521PARTITION BY52
75161NONE516
85161PARTITION BY516

 

Test Workload Configuration

The particulars of the test workload are as follows;

  • Each test will send 16000 events using 16 clients/threads with no wait time between each send event.
  • The JSON event is 88 bytes, and is shown below.  Each event will have an incremental event id, a random deviceId (1-10) and random value (1-1000000).
  • The events are not compressed
  • Each client will send data without using a partition key (ie round robin will allocate the incoming events into the Ingress Event Hub partitions).
  • The send clients will run from a DS13-4_v2 (4 v-cpus, 56 GB memory) Azure VM to minimise latency.
  • The VM is running in Australia South East.
  • The Event Hubs and Stream Analytics services are running in Australia South East.
  • Stream Analytics will run at SU1 (just so its easy to observe SU% differences)
  • Event Hub will run at TU5 (Igress Hub & Egress hub share the same namespace)
  • The ASA SQL query will perform a relatively complex aggregate tumbling window
  • I will use the awesomely cool Service Bus Explorer to send the events.  In case you haven’t seen it before, then strongly recommend to check it out – https://code.msdn.microsoft.com/windowsapps/Service-Bus-Explorer-f2abca5a
  • The tests will measure the SU% and the elapsed sec (first event sent [from the client] to last event received [in the event hub]).

 

Test JSON Event Message

{
    "eventId":#####,
    "deviceId":##,
    "value":#######,
    "timestamp":"####-##-##T##:##:##.#######Z"
}

 

Test Stream Analytics Query

My ASA SQL query below can use a “parallel” input step from the ingress event hub, but I needed an aggregated output step to load into the egress event hub.  I know I cannot achieve “maximum” parallelism for the query, however I wanted to identify if there are still observable performance benefits in this configuration.

For each test I will change the ASA SQL Query as follows…

  • Add / remove the “PARTITION BY” clause
  • Swap the pointers for “aAEHIngress##” and “aAEHEgress##” Event Hubs.  I have pre-configured several event hubs with either 2 or 16 partitions, so I can just swap the pointers around and rerun the query again!
WITH Step1 AS
(
   SELECT
     PartitionId,
     deviceId,
     COUNT(deviceId) as TOTRows,
     SUM(value) as SUMValue,
     AVG(value) as AVGValue,
     MIN(value) as MINValue,
     MAX(value) as MAXValue
     FROM
       [aAEHIngress##] --PARTITION BY PartitionId
     GROUP BY
       PartitionId,
       deviceId,
       TumblingWindow(second, 1)
)
SELECT deviceId,
   SUM(TOTRows) as TOTRows,
   SUM(SUMValue) as SUMValue,
   AVG(AVGValue) as AVGValue,
   MIN(MINValue) as MINValue,
   MAX(MAXValue) as MAXValue
INTO
   [aAEHEgress##]
FROM
   Step1
GROUP BY
   deviceId,
   TumblingWindow(second, 1)

If you are interested in other common ASA queries then check out the patterns here – https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns

 

Test Results/Outcomes

And so the results are in!

#Test Type
Total Events
Sent
Avg Event Send
Rate / Sec
Elapsed
(sec)
ASA
SU %
12:16:N16000201849
22:16:P16000202839
316:2:N1600017410211
416:2:P1600016710814
52:2:N160001988510
62:2:P16000200839
716:16:N160001839011
816:16:P160001839814

 

Key Observations of this Test Workload

Given the above tests and observed results, you could conclude that…

  1. Mismatching ingress and egress partitions can negatively affect performance specifically when the ingress partitions is higher than the egress partitions (ie the IN pipe is bigger than the OUT pipe).  My test showed as much as a 20% difference in elapsed time.  This also results in a 30% higher SU% utilisation.
  2. Mismatching egress and ingress partitions does not seem to negatively affect performance when the egress partitions are higher than the ingress partitions (ie the IN pipe is smaller than the OUT pipe).  My test shows the elapsed times and SU% are about the same, which kind of makes sense.
  3. Matching ingress and egress partitions can positively affect performance, however when higher partitions are used (16 partitions) there seems to be a degree of overhead to process data through the solution, as is shown by lower send rate, and higher SU%.  This could be related to send overhead at the client – I note that MS DOCS states 4 partitions is “generally enough” for most solutions.  Probably needs further investigation and retesting at 4 or 8 partitions…
  4. There was only a very small benefit from using the PARTITION BY in my specific ASA Query.  Some performed better by a few seconds, and others worse.  Its probably not enough to call it as beneficial or not.  As above I suspected this may occur as my ASA SQL query needed 2 steps; the 1st could be partitioned, however the 2nd couldn’t be partitioned (GROUP BY without partition id).

 

Summary

So there you have it, a pretty simple breakdown of how the different partition sizes in Azure Event Hubs and partition by clasue in Azure Stream Anaytics can change the performance!

My test was focused on my specific needs in my specific scenario; the outcomes will differ depending on the (a) SQL query, (b) the type/size of the event payload, (c) the number of clients sending data into the ingress event hub, and (d) if those clients send to a specific partition or use round robin.  Ideally if I had time it would be great to rerun this using a workload of a few million events over a few hundred clients!

So as usual, and as I always say, please test this out yourself with your own solution, data and event stream as your milage may vary!

 

References

There’s some exceptional MS DOCS articles I recommend that you review and validate the fit within your own streaming data solutions.


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating