Using SQL Server’s SNITrace to Troubleshoot Networking Issues
In the process of tracking down a few TCP 10054 issues (highlighted here: https://bobsql.com/sql-mysteries-why-is-my-sql-server-experiencing-lots-of-17830-tcp-10054-errors) I also used the SNITrace (SNI Trace) capabilities.
SNITrace, formally BIDTrace, is an deeper level of tracing for the Microsoft SQL Server TDS consumers and providers.
Terms and Brief History
Before I dive into the details here is a brief history and related terminology.
BID
Built-In Diagnostics (BID) is the former name replaced with SNI. The original tracing was added as BIDTrace and since then many of the functions and macros were renamed to SNITrace* as improvements were made and features added.
SNI
Server Network Interface (SNI) is centralized code, shared among the SQL Server and SQL Server client providers, both Windows and Linux that can send, parse and act upon TDS communications.
ODS
Open Data Services (ODS) was a separate, public API set allowing a TDS gateway to be created. The API was deprecated several releases ago and the behaviors consumed internally by SQL Server. ODS provides an evening model for processing TDS requests as parsed by the SNI layer.
ODBC Trace
To capture an ODBC trace use the driver manager, trace capabilities (https://docs.microsoft.com/en-us/sql/odbc/admin/setting-tracing-options?view=sql-server-ver15)
-
Launch odbcad32.exe
-
Select the tracing tab and then start tracing
-
Execute the reproduction
-
Stop tracing and view the log file
Execution: sqlcmd -E -S.sql17
ODBC trace records the ENTER and EXIT activities for the ODBC APIs as well as the various parameters, values and messages.
740c-592c ENTER SQLDriverConnectW
HDBC 0x0000019DE23D8CC0
HWND 0x0000000000000000
WCHAR * 0x00007FFF581D21C0 [ -3] “****** 0”
SWORD -3
WCHAR * 0x00007FFF581D21C0
SWORD -3
SWORD * 0x0000000000000000
UWORD 0 <SQL_DRIVER_NOPROMPT>
740c-592c EXIT SQLDriverConnectW with return code 1 (SQL_SUCCESS_WITH_INFO)
HDBC 0x0000019DE23D8CC0
HWND 0x0000000000000000
WCHAR * 0x00007FFF581D21C0 [ -3] “****** 0”
SWORD -3
WCHAR * 0x00007FFF581D21C0 <Invalid buffer length!> [-3]
SWORD -3
SWORD * 0x0000000000000000
UWORD 0 <SQL_DRIVER_NOPROMPT>
DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed database context to ‘master’. (5701)
DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed language setting to us_english. (5703)
Going Beyond ODBC Trace
For my specific issue I was attempting to debug why the SQLDriverConnectW errored with TCP 10054 because it was failing the pre-login handshake. For that I first needed to understand and capture the flow of the handshake activities and this is where SNITrace is helpful.
10054 The Direction Matters
Understanding where the RST (TCP closure) occurs if helpful in understanding the problem and how to troubleshoot it.
Each stage in the login process uses asynchronous read and write timeouts. For example, when the SQL Server sends the initial Pre-Login Response the SQL Server posts a read request using a 5 second timeout. If the first Ssl/Tls context packet takes longer than 5 seconds to arrive, from the client, SQL Server closes the TCP connection (RST occurs from the SQL Server) and the next write attempt from the SQL Client fails with a 10054 error, the connection has been reset. The Ssl/Tls reads use similar timeout mechanisms and the SQL client provider honors the ‘login timeout.’ For example, if the login packet is sent by the client but the SQL Server does not respond before the established login timeout (30 second default) the client closes the TCP connection (RST from the SQL Client provider) and the server records the 10054, connection reset error and terminates the login attempt.
Before diving into SNITrace let’s breakdown the network trace of a successful and failed login.
Good Login Sequence (SYN, ACK, … FIN)
10054 sequence from SQL Server | Start login at 48.8 and server terminates at 52.9 (SYN, ACK … RST)