From TCP to Sigmoid: A Journey of Odds

I started with a networking question. I ended up understanding why sigmoid exists.

This is the story of one question, followed honestly, connecting four domains: networking, algebra, machine learning, and Bayesian inference. All through a simple transform: .

The Question

I was studying TCP congestion control for my networking course. The textbook said TCP uses AIMD (Additive Increase, Multiplicative Decrease) for fairness. Other policies like MIMD (Multiplicative Increase, Multiplicative Decrease) don’t converge.

The question: Why?

I stared at the page. I had no idea how to even think about it.

Geometric Thinking

The trick is to visualize the problem. Imagine two computers sharing a link. Plot their sending rates as a point (x₁, x₂) in 2D space.

  • Fairness line (x₁ = x₂): diagonal with slope +1
  • Efficiency line (x₁ + x₂ = C): diagonal with slope -1
  • Goal: green dot at intersection (fair AND efficient)
  • Current state: red star below fairness line (A has more than B)

Now trace what each operation does:

  • Additive (+k to both): Translates the point. Direction: 45°.
  • Multiplicative (×m to both): Scales the point. Direction: toward/away from origin.

For AIMD:

  • Additive increase moves at 45° (parallel to fairness line)
  • Multiplicative decrease moves toward origin (along a ray)

The combination spirals toward the fair point. But why does additive help fairness while multiplicative doesn’t?

Caveat: This analysis assumes both flows have the same RTT (round-trip time) and move together. In reality, flows with smaller RTT can increase faster (more ACKs per second), leading to RTT-based unfairness. The course material covers this in a later chapter—AIMD is fair among equals, not universally.

The Smaller Gains More

Let’s trace through numbers. Start unfair: A = 8, B = 2.

StepABRatio A/BAction
0824.0start
1414.0×0.5 (ratio preserved!)
2522.5+1 (ratio improved!)
3632.0+1 (ratio improved!)

Multiplicative preserves the ratio. Additive improves it.

Why? Adding 1 to B (which is 2) is a 50% increase. Adding 1 to A (which is 8) is only 12.5%. Same absolute amount, but the smaller one gains more proportionally.

This is the core insight: is larger when is smaller.

Geometrically: ratio = slope of ray from origin. Adding +k rotates the ray toward 45° (the fairness angle). Here, the angle increases from 14° to 24° — converging toward 45° where ratio = 1:1.

I wrote this observation out, and as I wrote, something clicked. The understanding came during the writing, not before it. (That’s a separate insight: writing is thinking, not recording thinking.)

Wait, This Sounds Familiar

“The smaller one gains more proportionally.”

This reminded me of a sugar water problem. If you add 10g of sugar to a dilute solution (10%) and a concentrated solution (50%), the dilute one changes more.

But wait — are these the same thing?

I worked through it. They’re not the same formula, but they’re related. Here’s the distinction:

RepresentationFormulaExample (1 sugar, 3 water)
Ratio (odds)
Proportion

Different formulas. But they convert to each other:

The key result: — this is called “odds” in statistics.

This is called “odds” in statistics. Probability 25% = odds 1:3.

Where Does Sigmoid Come From?

Here’s where it gets fun.

In machine learning, we use logistic regression for classification. The textbook says: “use sigmoid to squash outputs to [0,1], same range as probability.”

But why sigmoid specifically? Why that exact formula?

Most CS-flavored ML courses introduce sigmoid as “squashes to [0,1]” and move on. Statistics courses (Statistical Learning, Biostatistics) tend to teach from the odds perspective, since odds ratios are fundamental in clinical trials and epidemiology. But if you came through the CS path like me, you might have missed that framing entirely.

Now I had the tools to derive it.

The problem: We want to model probability with linear tools. But probability is bounded [0,1]. Linear models output unbounded values (-∞ to +∞).

The solution: Transform probability to something unbounded, do linear modeling there, transform back.

This is called logit.

Now, how do we transform back? The answer is the sigmoid:

That’s the sigmoid function. I didn’t look it up. It fell out of the math.

I sat there for a moment. The sigmoid isn’t arbitrary. It’s the only function that correctly inverts log-odds back to probability. The shape is mathematically necessary.

This was the first time I truly understood sigmoid, not just memorized it.

The Bayesian Connection

One more connection. In Bayesian inference, there’s something called “Bayes in odds form.”

Standard Bayes (messy):

Odds form (clean):

Log-odds form (cleanest):

Evidence adds to your belief in log-odds space. This is why logistic regression works — each feature contributes a log-likelihood-ratio, and they sum up.

Logistic regression isn’t just “using sigmoid.” It’s Bayesian updating in log-odds space.

Full Circle: Back to TCP

Here’s where it gets delicious. Remember how this started with TCP AIMD? There’s a newer algorithm called TCP CUBIC that replaces linear growth with a cubic polynomial. And understanding why CUBIC uses cubic instead of logistic brings everything together.

The CUBIC Equation

Where:

  • = window size at time
  • = time to reach
  • = window size when loss occurred
  • = aggressiveness parameter

Why Not Logistic?

Both cubic and logistic are S-curves. So why did TCP choose cubic?

The key difference is at the inflection point:

  • Logistic: maximum growth rate at inflection (aggressive → conservative)
  • Cubic: minimum growth rate at inflection (conservative → aggressive)

TCP CUBIC places its inflection at — where it last detected congestion. This means:

  1. Before : cautiously approaching the danger zone
  2. At : most conservative (slowest growth)
  3. After : aggressive again, probing for more bandwidth

Logistic can’t do this. It approaches as an asymptote — never exceeding it. But network capacity changes! CUBIC needs to probe past the old limit.

Logistic mindset: "Wmax is the ceiling"     (stuck forever)
CUBIC mindset:    "Wmax is a checkpoint"    (probe beyond)

Why Does e Appear in Logistic?

This connects to the sigmoid derivation above. The number appears because:

  1. Continuous growth: If , solving gives
  2. is special: It’s the only function that equals its own derivative
  3. Logistic = bounded exponential: Adding capacity limit gives the logistic curve
  4. Sigmoid = inverse of log-odds: Since log-odds uses (base ), the inverse uses

The unified view:

All three: continuous growth, bounded growth, and log-odds inversion — lead to .

The Irony

I started with TCP AIMD, discovered odds and sigmoid, and now I’m back to TCP — but CUBIC instead of AIMD. And CUBIC explicitly rejects logistic in favor of cubic, precisely because of the inflection point behavior.

The journey was: linear TCP → odds → sigmoid/logistic → back to TCP, which chose cubic instead of logistic.

Full circle, with a twist.

The Insight: Odds as Coordinate System

Here’s the meta-lesson.

Probability is what we care about — it’s interpretable, it’s what we communicate. But odds and log-odds are computational coordinate systems.

OperationEasier in
Interpret resultProbability
Bayes updateOdds (multiply)
Combine evidenceLog-odds (add)
Linear modelingLog-odds

They’re isomorphic — same information, lossless conversion. You choose based on what operations you need.

This pattern is everywhere:

  • Signals: work in frequency domain (FFT), interpret in time domain
  • Multiplication: work in log space (becomes addition), interpret in linear space
  • Probability: work in log-odds, interpret in probability

Odds isn’t “more real” than probability. It’s a coordinate system you choose for convenience.

The Thread

One question: “Why does AIMD converge?”

One thread, pulled honestly:

TCP AIMD convergence
  → geometric reasoning (phase space)
  → "smaller gains more proportionally"
  → ratio vs proportion
  → r = p/(1-p) (odds!)
  → log-odds (logit)
  → sigmoid (derived, not memorized)
  → Bayesian updating
  → "it's all coordinate systems"
  → TCP CUBIC (why cubic, not logistic?)
  → e in continuous growth
  → full circle!

Four domains. One underlying idea.

This is what learning feels like when you follow questions instead of memorizing answers. You start with TCP and end up understanding sigmoid. Not because someone told you they’re connected, but because you pulled the thread yourself.

The whiteboard derivation is mine now. Next time I see sigmoid in a neural network, I won’t think “squashes to [0,1].” I’ll think “inverts log-odds.”

That’s the difference between knowing and understanding.