Tor’s growing popularity and user diversity has resulted in network performance problems that are not well understood, though performance is understood to be a significant factor in Tor’s security. A large body of work has attempted to solve performance problems without a complete understanding of where congestion occurs in Tor. In this article, we first study congestion in Tor at individual relays as well as along the entire end-to-end Tor path and find that congestion occurs almost exclusively in egress kernel socket buffers. We then analyze Tor’s socket interactions and discover two major contributors to Tor’s congestion: Tor writes sockets sequentially, and Tor writes as much as possible to each socket. To improve Tor’s performance, we design, implement, and test KIST: a new socket management algorithm that uses real-time kernel information to dynamically compute the amount to write to each socket while considering all circuits of all writable sockets when scheduling cells. We find that, in the medians, KIST reduces circuit congestion by more than 30%, reduces network latency by 18%, and increases network throughput by nearly 10%. We also find that client and relay performance with KIST improves as more relays deploy it and as network load and packet loss rates increase. We analyze the security of KIST and find an acceptable performance and security tradeoff, as it does not significantly affect the outcome of well-known latency, throughput, and traffic correlation attacks. KIST has been merged and configured as the default socket scheduling algorithm in Tor version 0.3.2.1-alpha (released September 18, 2017) and became stable in Tor version 0.3.2.9 (released January 9, 2018). While our focus is Tor, our techniques and observations should help analyze and improve overlay and application performance, both for security applications and in general.