【SQL server】利用状況モニターの情報取得について

個人的にパフォーマンス障害時によく利用する「利用状況モニター」ですが、いまいち情報取得の方法や周期について理解が不足しておりましたので、ここで整理をしてみます。
（※プロファイラーの取得結果SQLをそのまま記載してありますので、内容問題がある場合はSQL部分の削除を致しますのでご一報願います。）

SQL server2016のSSMSで利用状況モニターを開くと以下のようになっています。
※SQL server2014までですと、一番下の「アクティブなコストの高いクエリ」はありません。
以下、ペインを開いている間は、開いているペインそれぞれに対し、一定の周期で内部でSQLを発行して情報を採取する仕組みとなっています。

f:id:utiowa:20160922195040p:plain

◆概要
「概要」ペインでは、全体のサマリのような形でグラフィカルに内部を確認できます。
「最新のコストの高いクエリ」ペインを除くそれぞれの情報採取周期は、ここの「概要」ペインで指定します。
f:id:utiowa:20160922195050p:plain

情報採取の周期は右クリック⇒更新間隔を選択して、任意の時間を選択します。
尚、デフォルトでは、10秒に設定されております。

f:id:utiowa:20160922195101p:plain

上記で指定した更新間隔に基づき、「概要」ペインを開いている間のみ情報を採取します。

情報採取で内部で実行されるSQLは以下となります。

EXEC #am_generate_waitstats;

SELECT 
    SUM (interval_wait_time_per_sec) / 1000 AS avg_waiting_task_count
FROM #am_resource_mon_snap;



SET NOCOUNT ON;

DECLARE @previous_collection_time datetime;
DECLARE @previous_total_io_mb numeric (28, 1);
DECLARE @current_collection_time datetime;
DECLARE @current_total_io_mb numeric (28, 1);
DECLARE @mb_per_sec numeric (20, 1);

-- Get the previous snapshot's total I/O
SELECT TOP 1 @previous_collection_time = collection_time, @previous_total_io_mb = total_io_bytes 
FROM #am_dbfileio
ORDER BY collection_time DESC;

-- Get the current total I/O.  
SET @current_collection_time = GETDATE();
SELECT @current_total_io_mb = SUM(num_of_bytes_read + num_of_bytes_written) / 1024.0 / 1024.0
FROM sys.dm_io_virtual_file_stats(default, default);

-- Calc the total I/O rate (MB/sec) for the just-completed time interval. 
-- Round values larger than 2MB/sec to the nearest MB.
SET @mb_per_sec = (@current_total_io_mb - @previous_total_io_mb) / DATEDIFF (millisecond, @previous_collection_time, @current_collection_time) * 1000;
IF @mb_per_sec > 2 
BEGIN
  SET @mb_per_sec = ROUND (@mb_per_sec, 0);
END; 

-- Save off current total I/O
INSERT INTO #am_dbfileio (collection_time, total_io_bytes) 
VALUES (@current_collection_time, @current_total_io_mb);

-- Return the I/O rate for the just-completed time interval. 
SELECT ISNULL (@mb_per_sec, 0) AS mb_per_sec;

-- Get rid of all but the most recent snapshot's data
DELETE FROM #am_dbfileio WHERE collection_time < @current_collection_time;



SET NOCOUNT ON;

DECLARE @previous_collection_time datetime;
DECLARE @previous_request_count bigint;
DECLARE @current_collection_time datetime;
DECLARE @current_request_count bigint;
DECLARE @batch_requests_per_sec bigint;
DECLARE @interval_sec bigint;

-- Get the previous snapshot's time and batch request count
SELECT TOP 1 @previous_collection_time = collection_time, @previous_request_count = request_count 
FROM #am_request_count
ORDER BY collection_time DESC;

-- Get the current total time and batch request count
SET @current_collection_time = GETDATE();
SELECT @current_request_count = cntr_value 
FROM sys.sysperfinfo
WHERE counter_name = 'Batch Requests/sec' COLLATE Latin1_General_BIN;

SET @interval_sec = 
    -- Avoid divide-by-zero
    CASE
        WHEN DATEDIFF (second, @previous_collection_time, @current_collection_time) = 0 THEN 1
        ELSE DATEDIFF (second, @previous_collection_time, @current_collection_time)
    END;

-- Calc the Batch Requests/sec rate for the just-completed time interval. 
SET @batch_requests_per_sec = (@current_request_count - @previous_request_count) / @interval_sec;

-- Save off current batch count
INSERT INTO #am_request_count (collection_time, request_count) 
VALUES (@current_collection_time, @current_request_count);

-- Return the batch requests/sec rate for the just-completed time interval. 
SELECT ISNULL (@batch_requests_per_sec, 0) AS batch_requests_per_sec;

-- Get rid of all but the most recent snapshot's data
DELETE FROM #am_request_count WHERE collection_time < @current_collection_time;

◆プロセス
「プロセス」ペインでは、セッションID毎の実行プロセスが確認できます。

f:id:utiowa:20160922195135p:plain

さらに右クリックを押下することで、以下を確認することができます。
詳細 ⇒実行されているSQLテキスト
ライブ実行プランの表示 ⇒後述
強制終了 ⇒kill によるセッションkill

f:id:utiowa:20160922195222p:plain

「概要」ペインで指定した更新間隔に基づき、「プロセス」ペインを開いている間のみ情報を採取します。

情報採取で内部で実行されるSQLは以下となります。

WITH profiled_sessions as (
	SELECT DISTINCT session_id profiled_session_id from sys.dm_exec_query_profiles
)
SELECT 
   [Session ID]    = s.session_id, 
   [User Process]  = CONVERT(CHAR(1), s.is_user_process),
   [Login]         = s.login_name,   
   [Database]      = case when p.dbid=0 then N'' else ISNULL(db_name(p.dbid),N'') end, 
   [Task State]    = ISNULL(t.task_state, N''), 
   [Command]       = ISNULL(r.command, N''), 
   [Application]   = ISNULL(s.program_name, N''), 
   [Wait Time (ms)]     = ISNULL(w.wait_duration_ms, 0),
   [Wait Type]     = ISNULL(w.wait_type, N''),
   [Wait Resource] = ISNULL(w.resource_description, N''), 
   [Blocked By]    = ISNULL(CONVERT (varchar, w.blocking_session_id), ''),
   [Head Blocker]  = 
        CASE 
            -- session has an active request, is blocked, but is blocking others or session is idle but has an open tran and is blocking others
            WHEN r2.session_id IS NOT NULL AND (r.blocking_session_id = 0 OR r.session_id IS NULL) THEN '1' 
            -- session is either not blocking someone, or is blocking someone but is blocked by another party
            ELSE ''
        END, 
   [Total CPU (ms)] = s.cpu_time, 
   [Total Physical I/O (MB)]   = (s.reads + s.writes) * 8 / 1024, 
   [Memory Use (KB)]  = s.memory_usage * (8192 / 1024), 
   [Open Transactions] = ISNULL(r.open_transaction_count,0), 
   [Login Time]    = s.login_time, 
   [Last Request Start Time] = s.last_request_start_time,
   [Host Name]     = ISNULL(s.host_name, N''),
   [Net Address]   = ISNULL(c.client_net_address, N''), 
   [Execution Context ID] = ISNULL(t.exec_context_id, 0),
   [Request ID] = ISNULL(r.request_id, 0),
   [Workload Group] = ISNULL(g.name, N''),
   [Profiled Session Id] = profiled_session_id
FROM sys.dm_exec_sessions s LEFT OUTER JOIN sys.dm_exec_connections c ON (s.session_id = c.session_id)
LEFT OUTER JOIN sys.dm_exec_requests r ON (s.session_id = r.session_id)
LEFT OUTER JOIN sys.dm_os_tasks t ON (r.session_id = t.session_id AND r.request_id = t.request_id)
LEFT OUTER JOIN 
(
    -- In some cases (e.g. parallel queries, also waiting for a worker), one thread can be flagged as 
    -- waiting for several different threads.  This will cause that thread to show up in multiple rows 
    -- in our grid, which we don't want.  Use ROW_NUMBER to select the longest wait for each thread, 
    -- and use it as representative of the other wait relationships this thread is involved in. 
    SELECT *, ROW_NUMBER() OVER (PARTITION BY waiting_task_address ORDER BY wait_duration_ms DESC) AS row_num
    FROM sys.dm_os_waiting_tasks 
) w ON (t.task_address = w.waiting_task_address) AND w.row_num = 1
LEFT OUTER JOIN sys.dm_exec_requests r2 ON (s.session_id = r2.blocking_session_id)
LEFT OUTER JOIN sys.dm_resource_governor_workload_groups g ON (g.group_id = s.group_id)
LEFT OUTER JOIN sys.sysprocesses p ON (s.session_id = p.spid)
LEFT OUTER JOIN profiled_sessions ON profiled_session_id = s.session_id
ORDER BY s.session_id;

◆リソースの待機
「リソースの待機」ペインでは、待機イベント毎に待機している時間および待機数を表示できます。

f:id:utiowa:20160922195308p:plain

「概要」ペインで指定した更新間隔に基づき、「リソースの待機」ペインを開いている間のみ情報を採取します。

情報採取で内部で実行されるSQLは以下となります。

EXEC #am_generate_waitstats;

SELECT 
    category_name AS [Wait Category], 
    SUM (interval_wait_time_per_sec) AS [Wait Time (ms/sec)], 
    SUM (weighted_average_wait_time_per_sec) AS [Recent Wait Time (ms/sec)], 
    SUM (interval_avg_waiter_count) AS [Average Waiter Count], 
    SUM (resource_wait_time_cumulative) /1000 AS [Cumulative Wait Time (sec)]
FROM #am_resource_mon_snap
GROUP BY category_name 
ORDER BY SUM (weighted_average_wait_time_per_sec) DESC;

◆データファイル I/O
「データファイル I/O」ペインでは、各データベースのデータファイルおよびログファイルなどにおけるI/O情報を確認することができます。

f:id:utiowa:20160922195342p:plain

「概要」ペインで指定した更新間隔に基づき、「データファイル I/O」ペインを開いている間のみ情報を採取します。

情報採取で内部で実行されるSQLは以下となります。

DECLARE @current_collection_time datetime;
SET @current_collection_time = GETDATE();

-- Grab a snapshot
INSERT INTO #am_dbfilestats
SELECT 
    @current_collection_time AS collection_time, 
    d.name AS [Database], 
    f.physical_name AS [File], 
    (fs.num_of_bytes_read / 1024.0 / 1024.0) [Total MB Read], 
    (fs.num_of_bytes_written / 1024.0 / 1024.0) AS [Total MB Written], 
    (fs.num_of_reads + fs.num_of_writes) AS [Total I/O Count], 
    fs.io_stall AS [Total I/O Wait Time (ms)], 
    fs.size_on_disk_bytes / 1024 / 1024 AS [Size (MB)]
FROM sys.dm_io_virtual_file_stats(default, default) AS fs
INNER JOIN sys.master_files f ON fs.database_id = f.database_id AND fs.file_id = f.file_id
INNER JOIN sys.databases d ON d.database_id = fs.database_id; 

-- Get the timestamp of the previous collection time
DECLARE @previous_collection_time datetime;
SELECT TOP 1 @previous_collection_time = collection_time 
FROM #am_dbfilestats 
WHERE collection_time < @current_collection_time
ORDER BY collection_time DESC;

DECLARE @interval_ms int;
SET @interval_ms = DATEDIFF (millisecond, @previous_collection_time, @current_collection_time); 

-- Return the diff of this snapshot and last
SELECT 
    cur.[Database], 
    cur.[File] AS [File Name], 
    CONVERT (numeric(28,1), (cur.[Total MB Read] - prev.[Total MB Read]) * 1000 / @interval_ms) AS [MB/sec Read], 
    CONVERT (numeric(28,1), (cur.[Total MB Written] - prev.[Total MB Written]) * 1000 / @interval_ms) AS [MB/sec Written], 
    -- protect from div-by-zero
    CASE 
        WHEN (cur.[Total I/O Count] - prev.[Total I/O Count]) = 0 THEN 0
        ELSE
            (cur.[Total I/O Wait Time (ms)] - prev.[Total I/O Wait Time (ms)]) 
                / (cur.[Total I/O Count] - prev.[Total I/O Count])
    END AS [Response Time (ms)]
FROM #am_dbfilestats AS cur
INNER JOIN #am_dbfilestats AS prev ON prev.[Database] = cur.[Database] AND prev.[File] = cur.[File]
WHERE cur.collection_time = @current_collection_time 
    AND prev.collection_time = @previous_collection_time;

-- Delete the older snapshot
DELETE FROM #am_dbfilestats
WHERE collection_time != @current_collection_time;

◆最新のコストの高いクエリ
「最新のコストの高いクエリ」ペインでは、クエリ毎のCPUTimeや平均実行時間などのパフォーマンス情報を確認することができます。

f:id:utiowa:20160922195435p:plain

右クリックを押下すると以下を確認できます。

クエリテキストの編集　　　⇒SQLテキストを一部出力
実行プランの表示　　　　　⇒推定の実行プランを表示

f:id:utiowa:20160922195447p:plain

「概要」ペインで指定した更新間隔ではなく、こちらは「最新のコストの高いクエリ」ペインを開いている間のみ30秒ごとに情報を採取します。

※SQLプロファイラーでトレースした結果、20秒ごとに採取しているように見えましたが、公式には30秒ごととのこと。
情報採取で内部で実行されるSQLは以下となります。

EXEC #am_get_querystats

◆アクティブなコストの高いクエリ
「アクティブなコストの高いクエリ」ペインでは、現在実行中のクエリのみに対して、セッション毎のSQLのCPUTimeや平均実行時間などのパフォーマンス情報を確認することができます。

f:id:utiowa:20160922195553p:plain

右クリックを押下すると以下を確認できます。

クエリテキストの編集　　　⇒SQLテキストを一部出力
ライブ実行プランの表示　　⇒後述
実行プランの表示　　　　　⇒推定の実行プランを表示

f:id:utiowa:20160922195643p:plain

「概要」ペインで指定した更新間隔に基づき、「アクティブなコストの高いクエリ」ペインを開いている間のみ情報を採取します。

情報採取で内部で実行されるSQLは以下となります。

WITH profiled_sessions as (
	SELECT DISTINCT session_id profiled_session_id from sys.dm_exec_query_profiles
)
SELECT TOP 10 SUBSTRING(qt.TEXT, (er.statement_start_offset/2)+1,
((CASE er.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE er.statement_end_offset
END - er.statement_start_offset)/2)+1) as [Query],
er.session_id as [Session Id],
er.cpu_time as [CPU (ms/sec)],
db.name as [Database Name],
er.total_elapsed_time as [Elapsed Time],
er.reads as [Reads],
er.writes as [Writes],
er.logical_reads as [Logical Reads],
er.row_count as [Row Count],
mg.granted_memory_kb as [Allocated Memory],
mg.used_memory_kb as [Used Memory],
mg.required_memory_kb as [Required Memory],
/* We must convert these to a hex string representation because they will be stored in a DataGridView, which can't handle binary cell values (assumes anything binary is an image) */
master.dbo.fn_varbintohexstr(er.plan_handle) AS [sample_plan_handle], 
er.statement_start_offset as [sample_statement_start_offset],
er.statement_end_offset as [sample_statement_end_offset],
profiled_session_id as [Profiled Session Id]
FROM 
sys.dm_exec_requests er
LEFT OUTER JOIN sys.dm_exec_query_memory_grants mg 
	ON er.session_id = mg.session_id
LEFT OUTER JOIN profiled_sessions
	ON profiled_session_id = er.session_id
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) qt,
sys.databases db
WHERE db.database_id = er.database_id
AND er.session_id  <> @@spid

ライブ実行プランの表示は、拡張イベントでquery_post_complilation_showplanイベントを取得している場合、SQL server2016の新機能であるライブ実行プランの表示をさせることができます。
現在実行中のプランに対し、実際の実行プラン(actualなplan)のどのアクセスパスを実行中か確認することができる機能となっております。

利用状況モニターは、ペインを閉じている間は情報採取しないようですので、特にパフォーマンス障害中などは不要なペインは閉じて必要なもののみ開いておくことが必要です。
また、「最新のコストの高いクエリ」ペインは30秒ごとに更新とのことですので、リアルタイムで見る際には更新間隔に注意が必要ようそうです。
以下まとめ。

ペイン名	採取周期	採取トリガー
概要	"更新間隔"の設定値	ペインを開いているとき
プロセス	概要ペインと同じ	ペインを開いているとき
リソースの待機	概要ペインと同じ	ペインを開いているとき
データファイル I/O	概要ペインと同じ	ペインを開いているとき
最新のコストの高いクエリ	30秒毎	ペインを開いているとき
アクティブなコストの高いクエリ	概要ペインと同じ	ペインを開いているとき

MSの解説も以下に貼っておきます。
https://technet.microsoft.com/ja-jp/library/cc879320(v=sql.105).aspx

長時間リアルタイムで実行中の高負荷SQLを監視するのに、「アクティブなコストの高いクエリ」を監視するのは、非常に効果が高いと思います。
以下は、長時間化しているSQLを監視しているときの画面キャプチャですが、「最新のコストの高いクエリ」「アクティブなコストの高いクエリ」ペインそれぞれで同じSQLを監視しているはずなのに監視結果に大きな差が出てしまっています。
これは「アクティブなコストの高いクエリ」がセッション毎に情報を取得しているため、実質そのSQL単体をリアルタイムで監視できていることに対し、「最新のコストの高いクエリ」は、過去分や他セッション分を含めて実行回数で割っているので、結果がサマられてしまっているのではないかと考えられます。

f:id:utiowa:20160922195749p:plain