| Home | Technical Info | Yukon | Software Products | Security | Consulting Training | Business | Resources |
SQL Server 7.0 is shaking up the OnLine Analytical Processing industry through low cost, amazing functionality, incredible ease of use and extreme high performance. One of its most powerful features is an extension of a core concept in OLAP, Fast Analysis Shared Multidimensional Information (FASMI). FASMI can be explained as:
We want to know that we request data, we get it returned as Fast as possible, regardless how complex the query. So that we can execute an Analysis of our business trends. Because the data that is key to our company is important to many people, that same data must Shared with many users. While viewing the data, we realize that it does not fit well in a relational, 2 dimensional, world. Our business has many hierarchies of data which must be expressed using a Multidimensional schema. All this is to provide the Information that knowledge workers need to execute their jobs in a timely fashion.
I would doubt that it was by accident that the word fast appears first in FASMI. It is for that reason, Microsoft's team of developers put together a unique ability to examine a history of queries executed against the cube and combined it with the ability to optimize the cube for all queries, or a given subset of queries.
It is important to know which users are accessing your cubes, and how long they are waiting for the returned data. However, you can only run the OLAP Services manager from Windows NT. It will not run from Windows 95/98. Also, only members of the local NT Administrators group can access the manager. This can be quite limiting in a highly secure environment. For those individuals that are concerned with the operation and performance of the cube, but do not have direct access to the shares on the server, or are not administrators, this can be very frustrating. This is where the query log comes in. However, I could never find any documentation, even within Microsoft, that explained the log.
So, when a major customer asked me how they could get at the data, I began to dig and uncovered the information presented here. Since this information is based upon my own research, I do not believe it to represent any trade secrets, as you could easily come up with the same solutions.
When you install OLAP Services, 2 independent Microsoft Access databases in the installation directory. On my current system, that is located at "C:\Program Files\OLAP Services". The files are named msmdrep.mdb and msmdqlog.mdb. You probably have already guessed that the first datbase I mentioned is the database in which the repository is stored. The second is our query log database.
The Query Log database is simple enough to understand for the most part. There is only one table, and it is called...
Get this...
QueryLog. I bet you wouldn't have guessed that one ;-)
For our purposes, the first few fields give only baseline information, including the database, User and cube accessed. Slice is mildly important, because it let's us know exactly which data slice was accessed to fullfill the request. It is the following fields that we care most about...
Obviously, the start time indicates the time and date that the query was executed. This gives us a good measure of trends over time, particularly when combined with the duration of the query in seconds. It is in Dataset that the most important data is retrieved. To many of you, when examining a new installation, it may only have 1 or 2 records, or even none. This is because OLAP Services is set to log only 1 out of every 10 queries that is passed through the system. I strongly recomend changing this setting in a development environment so that you can get accurate estimates of how long it takes to get to the data and build the resultset. A common query you could pose to the log might resemble:
SELECT DISTINCT MSOLAP_User, Dataset, Duration, SamplingRate
FROM QueryLog
WHERE StartTime > #1/1/2000# AND Duration > 2 AND MSOLAP_Cube = 'FoodMart'
ORDER BY 3 DESC
When you examine the Dataset column, it will no doubt look strange to
you. It certainly did to me anyway. It is this field that generated
my interest in creating this document. In my log, my first record looks
like 111211111111. If you are not familiar with some key OLAP
concepts, this is nearly undecipherable. As it turns out that, this is
really just a list of levels for each hierarchy used in the query. It was
understanding the hierarchy side of this that finally allowed me to break most
of the code.
Each dimension in a cube has at least one hierarchy, sometimes more. When I first began looking at the dataset field, I had not considered this fact, and assumed that each digit represented a given dimension. That worked for quite a while, until Time took its toll (a time dimension that is). It turned out that a time dimension that I was working with had 2 hierarchies, so I was running out of dimesions before running out of digits in the dataset.
If I had the following cube design:
Time
Quarter
PayPeriod
Region
Fact
I might get a dataset that looked like "1213", in which it is telling me that the query used the first level in the Time.Quarter hierarchy, the second level in the Time.Period Hierarchy, the third level in my Region dimension and the third level in the Fact dimension (some of you might take exception to my calling the fact table a dimension, but for the purpose of OLAP Services, it is included as a dimesion called "Measures").
Given this information, how do we translate the dataset into a list of dimensions, hierarchies and levels that were queried? Well, first, we must connect to the OLAP database and cube using ADO-MD. We do this using a typical ADO connection object with code much like:
Set con = CreateObject("ADODB.Connection")
con.Open("Provider=MSOLAP.1;Data Source=localhost;" + _
"Client Cache Size=25;Auto Synch Period=10000;" + _
"Initial Catalog=" & Database)
Now that we have created the connection, we will need to attach that connection to an ADO-MD catalog like this:
Set cat = CreateObject("ADOMD.Catalog")
Set cat.ActiveConnection = con
Next, we will iterate through the list of CubeDefs until we find a cube with a name that matches 'FoodMart". Once we have located the FoodMart cube, we will need to examine the available dimensions. This can be a bit tricky, as they are not required, or even inclined, to return in the same order as the cube definition. I used the following code to retrieve my data, however, you may be able think of a more efficient method.
'X is my current dimensional ordinal
For Each Dms In cub.Dimensions
If Dms.Properties("DIMENSION_ORDINAL").Value = x Then
Set Dimension = Dms
Exit For
End If
Next
Notice the convenient property called "DIMENSIONAL_ORDINAL", it is the key to the order in which the cube dimensions were created. In the above code fragment, I searched through all dimensions until I found a matching ordinal. If none existed, then I may not be finished just yet. As it turns out, if I create a dimension, then later delete it and replace it with another dimension, the original ordinal is forever lost. So, we must use some mechanism to know if we have exceeded the number of available ordinals. For simplicity, I have excluded that peice of code here.
After locating the appropriate dimension, you should then work through the available hierarchies within the dimension. In our case, we used code very similar to:
For Each hrc in Dimension.Hierarchies
For Each lvl in hrc.Levels
If lvl.Properties("LEVEL_NUMBER") = Y Then
Set Level = lvl
Exit For
End If
Next
'Process the level
Debug.Print "Retrieved: " + Dimension.Name + "." + Level.Caption
... 'Some other code
Next
If you are a quick study, you may have already wondered what happens when there are more than 9 levels in a dimesion. We can have up to 64 levels in any dimension or hierarchy. This produces an interesting side effect that I had a difficult time understanding. When the level ordinal hits 10, a colon (:) appears in the stream. At first, I thought it was producing specific codes to resolve the problem. One night while dreaming about 486 op-codes and CPU registers, I recalled that a colon was the next character in the ASCII characterset which follows 9. So, if you took the ASCII value of the character displayed and subtracted 48, you would receive the ordinal value of the level. So to achieve the above Y, we would have some code like:
Y = Asc(Mid(Dataset, X, 1)) - 48
It kinda makes you wonder if they did that to be tricky on purpose.
Now, one thing you will notice that this code produces, is a bunch of "(ALL)" levels. That is because any query that does not explicitly state a dimension, implies the default level of that dimension. In most cases, the "(ALL)" level is the default level. However, it does not have to be. When the default is not the all level, it can lead to some very confusing results in the resultset (but that is another document). In my code, I checked if it was the all level. If so, I ignored the level and continued to the next hierarchy/dimension.
Using something like the above code will provide you with most of the incformation you need to generate meaningful information from the query log. In fact, it can be quite useful in knowing which queries to optimize within the cube. Using a combination of DSO and the above ADO-MD code you can build complex applications which rebuild your cubes, pre-aggregate your measures, and modify which aggregations are generated based upon use. Given such capabilities, it is not unreasonable to expect 1 second query response times regardless of the size of the data or the complexity of the query. We can easily manage the specific queries asked, while ignoring those rarely asked which helps us to conserve disk space.
One thing that I have not accounted for, is the possibility of changing designs, including new dimensions, etc. Specifically when we are looking at a dataset from a cube design that no longer has the dimension referenced. I would like to hear if any of you have encountered this problem, and if so, what was your solution.
In the above example, I have only given you a case where there was only one level queried per dimension or hierarchy. What happens if there were levels queried? This might happen in the Measures dimension.
With SQL 2000, we will see a huge number of changes. It will be interesting to see how those changes affect the query log. With the second beta now available to the public, I am anxious to hear your reports.