Problem:

Our team needed to create a data set listing retail stores belonging to different territories for performance tests that involved store selection. The data set was to be used by virtual users signing-in as different territory owners. The challenge was that the territory and store numbers in the data set should be as random as possible. For example, if there are 20 territories and each territory has 10 stores, the data set should contain the 20 territories with their first stores, followed by the same 20 territories with their second stores and so on.

Solution:

Billy Tilson of CapTech solved this sorting issue using the RANK function of Teradata. In fact, this Online Analytical Processing (OLAP) function is part of the ANSI SQL-99 standard and so this approach can be used in most databases.

The SQL we used was something like,

SELECT

USERID,

TERRITORY,

STORE_NO,

RANK() OVER (PARTITION BY TERRITORY ORDER BY STORE_NO) AS RANK_ID

FROM STORE_DIMENSION

ORDER BY 4, 2

The output will be,

USERID

TERRITORY

STORE_NO

RANK_ID

user1

123

342134

1

user2

234

234567

1

user3

345

456789

1

user20

789

764590

1

user1

123

890021

2

user2

234

992435

2

user3

345

896543

2

user20

789

667843

3

user1

123

436577

3

user2

234

298766

3

user3

345

903345

3

user20

789

217789

3

The PARTITION BY clause determines the group (in this case the TERRITORY) on which the RANK function executes. If this is not used, the entire result set delivered by the FROM clause will be treated as a single group.

If what's needed is a simple sort by territory and store numbers, order by can be used. However, the problem here is the need to shuffle territories and stores to the maximum extent possible. The way the performance testing tool uses the data file is that multiple virtual users pick chunks of data out of it to generate the transactions. The objective of this sort is to make sure that when multiple virtual users throw out SQLs concurrently, the database cache does not end up fetching the same territory's or store's information from its cache. In this approach for sorting, the stores with rank 1 are the first stores from all the territories and those of rank 2 are the second stores from all the territories, and so on.

Typically, RANK function is used to get the top few rows based ascending or the descending orders, but in this situation, the OLAP function is used to solve a unique sorting need.

Conclusion:

OLAP functions such as RANK can be very powerful and be used for many challenging situations. This article detailed one creative way to solve an unusual sorting problem with the RANK function.