I recently ran into an interesting issue that drove me crazy for a couple of days! A very frequently executed statement on a database was generating some waits on “SQL*Net more data from client”. ASH shows the statement, as does the V$SESSION view, but the statement never shows up in the V$SQL view nor in the AWR tables. Here’s a brief run down of how the investigation went.
The reason for looking at this in the first place was that the application was timing out for brief periods of time. So a quick look at an AWR report for the time period showed a significant amount of time being spent waiting on the “SQL*Net more data from client” event. This event was taking a much larger slice of the pie than we saw during periods of “normal” activity. Here’s a little excerpt from one of the AWR reports:
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU 2,045 57.7
SQL*Net more data from client 165,760 868 5 24.5 Network
db file sequential read 58,083 449 8 12.7 User I/O
log file sync 33,656 85 3 2.4 Commit
direct path read 22,026 42 2 1.2 User I/O
...
% of Waits
-----------------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
SQL*Net break/reset to cli 833 95.1 .6 .2 .1 3.6 .4
SQL*Net message to client 7537K 100.0 .0
SQL*Net more data from cli 165K 87.5 .4 .1 .0 .0 7.1 4.8
SQL*Net more data to clien 410K 100.0 .0
Obviously this is not good. This is on an 11g system so the AWR report provides a nice little histogram section on the wait events. The histogram indicates that most of these events are under 1ms, but a significant percentage are in the 16ms – 1sec range. So the question was, what could be causing those waits.
No problem, just check the Active Session History, find the sql_id for the statement(s) waiting on this event, and then look in V$SQL for the text. But not so fast bucko!
SQL> !cat find_sql_waiting_on_event.sql
select event, sql_id, count(*),
avg(time_waited) avg_time_waited
from v$active_session_history
where event like nvl('&event','%more data from%')
group by event, sql_id
order by event, 3
/
SQL> @find_sql_waiting_on_event
Enter value for event:
EVENT SQL_ID COUNT(*)
------------------------------ ------------- ----------
SQL*Net more data from client dkc2kbj7kdsu7 21
SQL*Net more data from client 62
SQL*Net more data from client 40du2847gh6sq 156
SQL*Net more data from client 7vtb7h3kpfat0 2965
SQL> -- Ah there's the culpret, 7vtb7h3kpfat0
SQL>
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 7vtb7h3kpfat0
Enter value for plan_hash_value:
no rows selected
SQL> -- hmm, must have aged out of the shared pool (seems unlikely since ASH has it so many times)
SQL> -- but maybe it's in the AWR tables
SQL>
SQL> @find_sql_awr
Enter value for sql_text:
Enter value for sql_id: 7vtb7h3kpfat0
no rows selected
SQL> -- nope, not there either
So I tried a couple of things to see if I could prove to myself that it was never in V$SQL (and thus never in the AWR tables). First I tried coloring the statement with the dbms_workload_repository.add_colored_sql procedure. (See Dion Cho’s post on this feature for more info). No luck! Then I tried using Tanel Poder’s sample.sql script (more info on that here) to run a bunch of queries against the V$SQLAREA view to see if the statement ever showed up. It did not! I then traced (10046) a bunch of the sessions that were executing the statement, hoping that the trace files would have the statement text. They did not, nor was the SQL_ID even contained in any of the trace files!
Turns out it is never in the shared pool and it is never in the AWR tables. But it is captured by ASH and it shows up in the SQL_ID column in V$SESSION. Here’s a little more output from the system:
SQL> select username, sid, serial#, status, sql_id, event, seconds_in_wait
2 from v$session
3 where username like nvl('&username',username)
4 and sid like nvl('&sid',sid)
5 and sql_id='7vtb7h3kpfat0'
6* order by username, sid, serial#;
Enter value for username:
Enter value for sid:
USERNAME SID SERIAL# STATUS SQL_ID EVENT SECONDS_IN_WAIT
------------------------- ------ ---------- -------- ------------- ------------------------------ ---------------
XX_OWNER 4661 4929 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 823
XX_OWNER 4676 3835 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 0
XX_OWNER 4682 9170 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 16
XX_OWNER 4699 11522 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 537
XX_OWNER 4712 9592 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 93
XX_OWNER 4717 6659 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 814
XX_OWNER 4727 25467 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 155
XX_OWNER 4739 8796 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 17
XX_OWNER 4755 7327 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 9
XX_OWNER 4800 4002 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 11
XX_OWNER 4826 15699 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 153
XX_OWNER 4838 6283 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 68
XX_OWNER 4849 11468 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 1052
XX_OWNER 5089 8610 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 1
XX_OWNER 5091 7709 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 31
XX_OWNER 5168 4728 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 585
XX_OWNER 5185 8298 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 157
XX_OWNER 5186 6969 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 1197
XX_OWNER 5210 10830 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 0
XX_OWNER 5232 6509 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 1
XX_OWNER 5251 12231 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 155
XX_OWNER 5400 5217 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 697
XX_OWNER 5423 12903 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 5
XX_OWNER 5434 14154 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 191
XX_OWNER 5441 33222 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 3
XX_OWNER 5442 5669 INACTIVE 7vtb7h3kpfat0 SQL*Net message from client 217
26 rows selected.
SQL> -- so there are a lot of sessions that execute our statement and then wait
SQL> -- (sometimes for a long time) before the user requests another action
SQL>
SQL> -- Using Tanel's sample.sql script - does about 1000 queries against V$SQLAREA in a sec or so
SQL>
SQL> @sample sql_text v$sqlarea "sql_id = '7vtb7h3kpfat0'" 1000
no rows selected
SQL> -- doesn't look like we'll ever catch it in the shared pool
Grasping at straws, I started looking for any views that had a sql_id and/or sql_text field, and I found one that I hadn’t thought to look in – V$OPEN_CURSOR. Ah, that’s probably a good place to look since surely a cursor must be created for this statement. Sure enough, this turned out to provide the necessary clue.
SQL> select distinct * from v$open_cursor
2 where rownum < 10
3* and sql_id like '%fat%';
SADDR SID USER_NAME ADDRESS HASH_VALUE SQL_ID SQL_TEXT LAST_SQL_
---------------- ------ ------------------------------ ---------------- ---------- ------------- ----------------------------------------- ---------
SQL_EXEC_ID
-----------
0000000210ED3980 4750 XX_OWNER 000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0
00000002111EC2E8 5043 SIGNON 0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT u.app_user_id, u.username, u.pass
word, u.first_name,
000000021163BDA8 5451 XX_OWNER 000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0
00000002114067E0 5242 XX_OWNER 000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0
00000002114372F0 5260 SIGNON 0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT u.app_user_id, u.username, u.pass
word, u.first_name,
0000000210F06FD8 4769 XX_OWNER 000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0
0000000211439E38 5261 SIGNON 0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT u.app_user_id, u.username, u.pass
word, u.first_name,
00000002111E97A0 5042 SIGNON 0000000219E02E80 1527693606 6pfatm1dhxh96 SELECT u.app_user_id, u.username, u.pass
word, u.first_name,
000000021161E190 5440 XX_OWNER 000000021AFEB748 3847695136 7vtb7h3kpfat0 table_4_9_fbd5_0_0_0
9 rows selected.
So the SQL_ID was present in the V$OPEN_CURSOR view and there was a value in the SQL_TEXT field. Unfortunately, it didn't provide the text of the original statement, but at least I now had something to work with. A quick google for table_4_9_fbd5_0_0_0 (which is what was in the SQL_TEXT field) turned up nothing, of course. But a little digging turned up this post on Ask Tom. You'll have to read down a ways in that post to find that the real culpret was LOBs. Also, see this post on the Oracle-l list (which is what I originally found that pointed me to the Ask Tom article).
So it appears that the issue is that LOBs are being accessed, which causes Oracle to generate a new SQL_ID for the LOB access part of the execution. Apparently, these types of statements do not get exposed via the V$SQL and V$SQLAREA views and thus do not get captured by AWR. There are other types of statements that exhibit this behavior as well, by the way. The Tom Kyte post talks about nested tables for example. So back to the question, what statement generated the SQL_ID with the missing sql_text in the first place? We'll the first clue came from the Tom Kyte post. Apparently the middle bit of the text in the V$OPEN_CURSOR view is a hex representation of the object_id of the object that is being accessed (in our case, "fbd5"). So if we convert it to decimal we can find the object that is contains the LOB, like so:
SQL> select owner, object_name, object_type
2 from dba_objects
3 where object_id = (select to_number('&hex_value','XXXXXX') from dual);
Enter value for hex_value: fbd5
OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ ------------------------------ -------------------
XX_OWNER XXX_MY_TABLE TABLE
So now I know that I'm looking for a statement that is accessing XXX_MY_TABLE (note: names have been changed to protect the innocent). But how to find the statement, that's the question that was never answered in the Tom Kyte post. It was at about this point, while I was filling my coffee cup, that I had one of those ah-ha moments (usually I have them when I'm doing something else - not even actively thinking about the problem). Anyway, I got to thinking about the PREV_SQL_ID in the V$SESSION view. If a statement that accesses a LOB, immediately generates a new SQL_ID for the LOB access part of the execution, perhaps the original statement's SQL_ID would be in the PREV_SQL_ID field. And that's exactly what I found.
SQL> select username, sid, serial#, status, sql_id, prev_sql_id, event, seconds_in_wait
2 from v$session
3 where username like nvl('&username',username)
4 and sid like nvl('&sid',sid)
5 and sql_id='7vtb7h3kpfat0'
6* order by username, sid, serial#;
Enter value for username:
Enter value for sid:
USERNAME SID SERIAL# STATUS SQL_ID PREV_SQL_ID EVENT SECONDS_IN_WAIT
------------------------- ------ ---------- -------- ------------- ------------- ------------------------------ ---------------
XX_OWNER 4665 8707 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 1235
XX_OWNER 4745 9383 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 1
XX_OWNER 4792 23304 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 1235
XX_OWNER 4816 11098 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 0
XX_OWNER 4844 3050 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 773
XX_OWNER 4907 8184 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 232
XX_OWNER 4959 35967 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 633
XX_OWNER 5154 33701 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 25
XX_OWNER 5230 12792 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 510
XX_OWNER 5248 49715 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 456
XX_OWNER 5249 32300 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 147
XX_OWNER 5282 21331 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 182
XX_OWNER 5285 386 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 185
XX_OWNER 5295 792 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 40
XX_OWNER 5416 49780 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 7
XX_OWNER 5419 22645 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 890
XX_OWNER 5442 6200 INACTIVE 7vtb7h3kpfat0 78jdubkb65p2w SQL*Net message from client 1326
17 rows selected.
SQL> @find_sql_long
Enter value for sql_id: 78jdubkb65p2w
SQL_TEXT
----------------------------------------------------------------------
SELECT XYZ_COLUMN, BLAH ... FROM XXX_MY_TABLE ...
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 78jdubkb65p2w
SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ -----------------------------------------
78jdubkb65p2w 0 147781538 9,534,632 .0006 11 SELECT XYZ_COLUMN, BLAH
78jdubkb65p2w 1 147781538 191 .0005 11
One last comment (and then I'll shut up):
I was surprised to find that the statement was short - only a few lines long. I was expecting a really long string that would not fit in a single SQL*Net packet. After all, we are waiting on the "SQL*Net more data FROM client" event, right? Which is supposed to be the time waited for an additional packet from the client when Oracle knows that more is coming. Clearly this was not the case. Rather it is waiting because Oracle is sending data back to the client that takes more than one packet (i.e. the LOB). You would expect that time would be logged under the "SQL*Net more data TO client" event in this case. However, it appears that time is logged to that event only until the packet is turned over to the network. It appears that Oracle then starts logging time to the "SQL*Net more data FROM client" event while it waits for the next response from the client. Therefore, the actual network transfer time is clocked to the wrong event. You can understand why the developers made this choice, because Oracle has no way of knowing how long it actually takes the packet to get to the client. It only knows how long the round trip takes. So maybe they should have named the event slightly differently, like "SQL*Net more data from client including the entire round trip network time" or something. Anyway, Tanel Poder has a post that briefly mentions this quirk of the SQL*Net events.
Note: this is somewhat speculative on my part as I have not tried to prove exactly where the "TO" event stops and the "FROM" event begins. But I am pretty sure Tanel is correct on this point.
So there you have it. Just a little story about wandering through the weeds on the way to enlightenment.
=============================
Note: This part was added after Dion Cho commented that he wasn't able to re-produce the behavior, so I gave it shot (I got close). Here's the output:
> sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Mon Apr 20 14:59:25 2009
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> -- session (SID 124) connected as KSO issues "select xml_clob from test_lob"
...
SQL> -- what's going on?
SQL> @sessions
Enter value for username:
Enter value for sid:
USERNAME SID SERIAL# STATUS SQL_ID EVENT SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------------------------ ---------------
KSO 124 2402 ACTIVE cu98u9ukjqj19 SQL*Net message from client 0
SYS 128 3580 ACTIVE bsbmyw5thg8qf SQL*Net message to client 0
SQL> -- KSO is running cu98u9ukjqj19, what's that doing?
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: cu98u9ukjqj19
no rows selected
SQL> -- so that's probably the LOB access, let's check
SQL> select * from v$open_cursor where sql_id like 'cu98u9ukjqj19';
SADDR SID USER_NAME ADDRESS HASH_VALUE SQL_ID SQL_TEXT LAST_SQL_ SQL_EXEC_ID
-------- ---- ----------------------- -------- ---------- ------------- ------------------------------------- --------- -----------
DB1A69C8 124 KSO E0BD4C0C 2770027561 cu98u9ukjqj19 table_4_9_19a1d_0_0_0
SQL> -- yep, LOB access, let' see if we can find the original statement
SQL> @sessionsp
Enter value for username:
Enter value for sid:
USERNAME SID SERIAL# STATUS SQL_ID PREV_SQL_ID EVENT SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------- ------------------------------ ---------------
KSO 124 2402 ACTIVE cu98u9ukjqj19 9babjv8yq8ru3 direct path read 0
SYS 128 3580 ACTIVE grdgy5dcmqd0f 9babjv8yq8ru3 SQL*Net message to client 0
SQL> -- says prev statement is 9babjv8yq8ru3
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: 9babjv8yq8ru3
SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
9babjv8yq8ru3 1 0 39 .01 54 BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
SQL> -- oops, SQL*Plusism - need to turn off serveroutput
/*
-- go to other session here and set serveroutput off and re-run select statement
*/
SQL> -- wait a little while until the statement finishes
SQL> @sessionsp
Enter value for username:
Enter value for sid:
USERNAME SID SERIAL# STATUS SQL_ID PREV_SQL_ID EVENT SECONDS_IN_WAIT
------------------------- ---- ---------- -------- ------------- ------------- ------------------------------ ---------------
KSO 124 2402 INACTIVE djqbj30qjyngw SQL*Net message from client 74
SYS 128 3580 ACTIVE grdgy5dcmqd0f 9babjv8yq8ru3 SQL*Net message to client 0
SQL> -- note that the SQL_ID field has been wiped after the statement completes - need to look a little closer at this behavior
SQL> -- the java application I was looking at in production did not cause this behavior - i.e. the SQL_ID field would not be cleared
SQL> -- so finding the SQL_ID for the LOB access and the PREV_SQL_ID for the original statement didn't happen in my test case
SQL> -- although the PREV_SQL_ID field does contain the value for the original statement
SQL> @find_sql
Enter value for sql_text:
Enter value for sql_id: djqbj30qjyngw
SQL_ID CHILD PLAN_HASH EXECS AVG_ETIME AVG_LIO SQL_TEXT
------------- ------ ---------- ---------- ------------- ------------ ------------------------------------------------------------
djqbj30qjyngw 0 825212230 2 2.61 53,065 select xml_clob from test_lob