Why is the observed latency much higher than what's configured in the latency experiment?

Modified on Thu, 7 Sep, 2023 at 3:56 PM

You may sometimes observe a higher latency impact on your application than you configured in your latency experiment. Gremlin injects latency on a per-packet basis on layer 3 of the OSI model. This is representative of common networking failure modes, such as queuing at network switches or latency due to increased route distance. This contributes to higher observed latency primarily in two ways.

First, for TCP connections, if your request or response doesn't fit into a single congestion window, then latency will be added for each round trip. Applications that reuse large connection pools on reliable networks typically have very large congestion windows, so the latency is applied only once. However, if the latency increase causes requests to queue and overwhelm the existing pool, new connections may be created with new congestion windows. Each of these new connections will have to reopen the connection window over multiple round trips, applying the latency each time.

Second, applications may make multiple requests to the same dependency in serial. For example, if you send two queries to the same database, adding 100ms of latency will increase the application request time by 200ms (100ms for each request).

In both cases, the experiment is working as intended and revealed a reliability risk in the design of the application being tested.

For more information on Latency Experiments, please see this page.