ad

Performance Improvement Tips for String Calculations

This post describes several tips and guidelines for creating efficient string calculations in Tableau. These guidelines will help you to improve workbook performance. Myself, I use these tips regularly in my project implementations. Practically I seen performance improvement after applying these techniques.

Tip 1: Try to minimize usage of same field more than once in same calculation.

Example 1:

Let's say you create a calculated field that uses a complicated multiple line calculation to find mentions, or Twitter handles, in tweets. The calculated field is titled, Twitter Handle. Each handle that is returned starts with the '@' sign (for example: @user).
For your analysis, you want to remove the '@' symbol.
To do so, you can use the following calculation to remove the first character from the string:
RIGHT([Twitter Handle], LEN([Twitter Handle]) -1)
This calculation is quite simple. However, since it references the Twitter Handle calculation twice, it performs that calculation twice for each record in your data source: once for the RIGHT function and again for the LEN function.
In order to avoid calculating the same calculation more than once, you can rewrite the calculation to one that uses the Twitter Handle calculation only once. In this example, you can use MID to accomplish the same goal:
MID([Twitter Handle], 2)
Tip 2: Convert multiple equality comparisons to a CASE expression or a group

Let's say you have the following calculation, which uses the calculated field, Person (calc), multiple times and employs a series of OR functions. This calculation, though a simple logical expression, will cause query performance issues because it performs the Person (calc) calculation at least ten times.

IF [Person (calc)] = 'Henry Wilson'
OR [Person (calc)] = 'Jane Johnson'
OR [Person (calc)] = 'Michelle Kim'
OR [Person (calc)] = 'Fred Suzuki'
OR [Person (calc)] = 'Alan Wang'
THEN 'Lead'
ELSEIF [Person (calc)] = 'Susan Nguyen'
OR [Person (calc)] = 'Laura Rodriguez'
OR [Person (calc)] = 'Ashley Garcia'
OR [Person (calc)] = 'Andrew Smith'
OR [Person (calc)] = 'Adam Davis'
THEN 'IC'
END

Instead of using an equality comparison, try the following solutions.

Solution 1
Use a CASE expression. For example:

CASE [Person (calc)]
WHEN 'Henry Wilson' THEN 'Lead'
WHEN 'Jane Johnson' THEN 'Lead'
WHEN 'Michelle Kim' THEN 'Lead'
WHEN 'Fred Suzuki' THEN 'Lead'
WHEN 'Alan Wang' THEN 'Lead'

WHEN 'Susan Nguyen' THEN 'IC'
WHEN 'Laura Rodriguez' THEN 'IC'
WHEN 'Ashley Garcia' THEN 'IC'
WHEN 'Andrew Smith' THEN 'IC'
WHEN 'Adam Davis' THEN 'IC'
END

In this example, the calculated field, Person (calc), is only referenced once. Therefore, it is only performed once. CASE expressions are also further optimized in the query pipeline, so you gain an additional performance benefit.

Solution 2
Create a group instead of a calculated field.

Tip 3: Convert multiple string calculations into a single REGEXP expression
Note: REGEXP calculations are available only when using Tableau data extracts or when connected to Text File, Hadoop Hive, Google BigQuery, PostgreSQL, Tableau Data Extract, Microsoft Excel, Salesforce, Vertica, Pivotal Greenplum, Teradata (version 14.1 and above), and Oracle data sources.
Example 1: CONTAINS
Let's say you have the following calculation, which uses the calculated field, Category (calc), multiple times. This calculation, though also a simple logical expression, will cause query performance issues because it performs the Category (calc) calculation multiple times.
IF CONTAINS([Segment (calc)],'UNKNOWN')
OR CONTAINS([Segment (calc)],'LEADER')
OR CONTAINS([Segment (calc)],'ADVERTISING')
OR CONTAINS([Segment (calc)],'CLOSED')
OR CONTAINS([Segment (calc)],'COMPETITOR')
OR CONTAINS([Segment (calc)],'REPEAT')
THEN 'UNKNOWN'
ELSE [Segment (calc)] END
You can use a REGEXP expression to get the same results without as much repetition.

Solution:

IF REGEXP_MATCH([Segment (calc)], 'UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT') THEN 'UNKNOWN'
ELSE [Segment (calc)] END

With string calculations that use a similar pattern, you can use the same REGEXP expression.

Example 2: STARTSWITH
IF STARTSWITH([Segment (calc)],'UNKNOWN')
OR STARTSWITH([Segment (calc)],'LEADER')
OR STARTSWITH([Segment (calc)],'ADVERTISING')
OR STARTSWITH([Segment (calc)],'CLOSED')
OR STARTSWITH([Segment (calc)],'COMPETITOR')
OR STARTSWITH([Segment (calc)],'REPEAT')
THEN 'UNKNOWN'

Solution

IF REGEXP_MATCH([Segment (calc)], '^(UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT)') THEN 'UNKNOWN'
ELSE [Segment (calc)] END
Note that the '^' symbol is used in this solution.

Example 3: ENDSWITH

IF ENDSWITH([Segment (calc)],'UNKNOWN')
OR ENDSWITH([Segment (calc)],'LEADER')
OR ENDSWITH([Segment (calc)],'ADVERTISING')
OR ENDSWITH([Segment (calc)],'CLOSED')
OR ENDSWITH([Segment (calc)],'COMPETITOR')
OR ENDSWITH([Segment (calc)],'REPEAT')
THEN 'UNKNOWN'
ELSE [Segment (calc)] END

Solution

IF REGEXP_MATCH([Segment (calc)], '(UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT)$') THEN 'UNKNOWN'
ELSE [Segment (calc)] END
Note that the '$' symbol is used in this solution.

Tip 4: Manipulate strings with REGEXP instead of LEFT, MID, RIGHT, FIND, LEN

Regular expressions can be a very powerful tool. When doing complex string manipulation, consider using regular expressions. In a lot of cases, using a regular expression will result in a shorter and more efficient calculation.

Example 1

Let's say you have the following calculation, which removes protocols from URLs. For example: "https://www.tableau.com" becomes "www.tableau.com".
IF (STARTSWITH([Server], "http://")) THEN
MID([Server], Len("http://") + 1)
ELSEIF(STARTSWITH([Server], "https://")) THEN
MID([Server], Len("https://") + 1)
ELSEIF(STARTSWITH([Server], "tcp:")) THEN
MID([Server], Len("tcp:") + 1)
ELSEIF(STARTSWITH([Server], "\\")) THEN
MID([Server], Len("\\") + 1)
ELSE [Server]
END

Solution

You can simplify the calculation and improve performance by using a REGEXP_REPLACE function.
REGEXP_REPLACE([Server], "^(http://|https://|tcp:|\\\\)", "")

Example 2

Let's say you have the following calculation, which returns the second part of an IPv4 address. For example: "172.16.0.1" becomes "16".
IF (FINDNTH([Server], ".", 2) > 0) THEN
MID([Server],
FIND([Server], ".") + 1,
FINDNTH([Server], ".", 2) - FINDNTH([Server], ".", 1) - 1
)
END

Solution

You can simplify the calculation and improve performance by using a REGEXP_EXTRACT function.
REGEXP_EXTRACT([Server], "\.([^\.]*)\.")
Tip 5: Do not use sets in calculations

If you are using sets in a calculation, consider replacing them with an alternative, but equivalent calculation.

Example

Let's say you have the following calculation, which uses the set, Top Customers (set).
IF ISNULL([Customer Name]) OR [Top customers (set)] THEN [Segment] ELSE [Customer Name] END

Solution 1

If the set is simple, you can create a calculated field that returns the same result as the set. For example:
CASE [Customer Name]
WHEN 'Henry Wilson' THEN True
WHEN 'Jane Johnson' THEN True
WHEN 'Michelle Kim' THEN True
WHEN 'Fred Suzuki' THEN True
WHEN 'Alan Wang' THEN True
ELSE False
END
Note: Using the pattern WHEN TRUE … ELSE is recommended in this situation to avoid performance issues due to the use of sets. It is not a recommended pattern in most scenarios.

Solution 2

If the set is more complex, consider creating a group that maps all the elements in the set to a given value or attribute, such as 'IN', and then modify the calculation to check for that value/attribute. For example:
IF ISNULL([Customer Name]) OR [Top Customers(group)]='IN' THEN [Segment] ELSE [Customer Name] END
Tip 6: Do not use sets to group your data

Sets are meant to make comparisons on subsets of data. Groups are meant to combine related members in a field. Converting sets to groups, such as with the following example, is not recommended:
IF [Americas Set] THEN "Americas"
ELSEIF [Africa Set] THEN "Africa"
ELSEIF [Asia Set] THEN "Asia"
ELSEIF [Europe Set] THEN "Europe"
ELSEIF [Oceania Set] THEN "Oceania"
ELSE "Unknown"
END
This is not recommended for the following reasons:
·         Sets are not always exclusive. Some members can appear in multiple sets. For example, Russia could be placed both in the Europe set and the Asia set.
·         Sets cannot always be translated to groups. If the sets are defined by exclusion, conditions, or limits, it might be difficult or even impossible to create an equivalent group.

Solution

Group your data using the Group feature. 



Do you feel that you are Tableau Expert? Then try this challenge 30


Hello All…

Here I am coming with new challenge. This is very simple scenario. I am not going to provide detail information about this challenge. You just need to look at information provided and try to replicate image. This is how generally we get requirement when we work with clients. They just provide you single line requirement and then you have to write logic to fulfill those requirements.

Click Here to download Data . Use this data set to solve this challenge.

Below image describes customers sales of their first 365 days and after 365 days of their journey with vendor.


For more challenges: Click Here

Experiment… Observe… Learn… 

Do you feel that you are Tableau Expert? Then try this challenge 29

Hello All…

Here I am coming with new challenge. Its simple but interesting for me to solve. Have you ever asked or thought to implement metric drill down? Let’s see…

Long back someone asked me to drill down from summary to detail based on what ever metric they click on summary page. At that time I told it's not possible in tableau. This concept exists in OBIEE report.

Now metric specific drill down is possible in Tableau. See below scenario and try to solve this challenge.

If you see below image. If I click on Profit, I can see sheet 2 filtered for profit.














If you see below image. If I click on Sales, I can see sheet 2 filtered for Sales. 











Similarly, If I click on central region for Profit, I can see sheet 2 filtered for Central and profit will show . 

















Similarly, If I click on East region for Sales, I can see sheet 2 filtered for East and Sales will show. 

















See here how metric action works in tableau. Now Try by your self...
























For more challenges: Click Here

Experiment… Observe… Learn…

Do you feel that you are Tableau Expert? Then try this challenge 27


Hello All…

Here I am coming with new challenge. Its simple but interesting for me to solve. Have you ever asked or thought to implement this challenge? Let’s see…

Just look at simple drill down scenario below. Here I am showing sales by sub-category in sheet 1 and details for those sub-categories in sheet 2.



In general drill down process if you click on any bar of sheet 1 you can see details of that sub-category in sheet 2. Looks like below. In this case it’s machines sub-category

Have ever been asked to show like when ever you click on machines it should show only one bar Machines in sheet 1 and details of machine in sheet 2. Again, if you click on Machines it will reset view to all sub-categories. See below

If you click on Machines again it will reset to all sub-categories. See below.

You can see the animation steps in below image.

For more challenges: Click Here

Experiment… Observe… Learn…

Do you feel that you are Tableau Expert? Then try this challenge 26

Hello All…

Here I am coming with new challenge. Most of challenges which I have been posting here are not what I faced. Different people colleagues, friends or blog followers sends these challenges. I solved for them.

Click Here to download Data . Use this data set to solve this challenge.

Challenge here to build views shown below.

Step 1:
Showing number of customers by year of order date.


Step 2:
  • 595 customers ordered in 2011. Out of 595 Customers 437 customers ordered again in 2012. 136 customers are new customers ordered in 2012.
  • Similarly, out of 595 customers who ordered in 2011, 484 customers ordered in 2013. Out of 136 customers who ordered in 2012, 102 customers ordered again in 2013. 51 customers are new customer for year 2013.
  • Similarly, out of 595 customers who ordered in 2011, 517 customers ordered in 2014. Out of 136 customers who ordered in 2012, 120 customers ordered again in 2014. Out of 51 customers who ordered customer for year 2013, 45 customers ordered again in 2014. 



Step 3:
Calculating percentage with respect to first time when they ordered.
  • For 2012 437/595 is 73% customer retention that who ordered in 2011
  • For 2013 484/595 is 81% customer retention that who ordered in 2011. Similarly, 102/136 is 75 % that who ordered in 2012.
  • Same for rest of the years.


For more challenges: Click Here

Experiment… Observe… Learn…

Do you feel that you are Tableau Expert? Then try this challenge 25

Hello All…

I am back with new challenge. It’s been a while that I posted challenge here. It’s very interesting and complex challenge.

If you have this kind of challenges or looking for tableau expert help in your work reach me on Suresh.n2008@gmail.com

Note: Please download the Data from below link to solve this challenge. If you use this data, results should match with images posted.

You can see data set preview in below image. I have Admission date, Discharge date and Patient Count. Using this data, we can build Admission Discharge triangle.


Step 1: Build this view.

Step 2:
Now don’t show numbers for January 1900 (first column) and then display running sum across table. See below.

Step 3:
Now calculate Avg as shown in below. Calculate avg of blue color cells and display. Same for Green, Yellow and Red. Similarly, for all other cells.



Step 4:
Finally, Here I am comparing Avg calculated above with running sum which shown in Step 1. If Running sum (shown in Step 1) is higher than Avg calculated above (shown in Step 3) then showing up arrow with green color else down arrow with red color.

For more challenges: Click Here
Experiment… Observe… Learn…

Do you feel that you are Tableau Expert? Then try this challenge 24

Hello All…

I am back with new challenge. It’s been a while that I posted challenge here. This is very simple challenge. If you are following tableau new features and using them, you might know how to solve it. If not, then you might feel this is not possible.

If you have this kind of challenges or looking for tableau expert help in your work reach me on Suresh.n2008@gmail.com

Note: Please download the Data from below link to solve this challenge. If you use this data, results should match with images posted.


Below I am showing number of orders by Category, Sub-Category and Segment. 
Now I am applying running sum across table. 


Now I want to calculate avg by Furniture and Segment. For Example, see below
Finally, I want to see report which looks like below. 


Clue: I did not used LOD calculations. Try with table calculations. 


For more challenges: Click Here
Experiment… Observe… Learn…

Scenario 18: Day, Week to date, Month to date, Quarter to Date and year to date data filtering using parameter selection


For example. today is 12th May 2018. 
When you select Day then we must show 12th May 2018 data. 
When you select Week, we must show 5th May 2018 to 12th May 2018, i.e. One-week data.
When you select Month, we must show 11th Apr 2018 to 12th May 2018 
similarly, for quarter and year

With help of single calculation, we can filter data for above requirement.

I created two parameters.

“Date selection” to Select Date (we don’t have current date data, so I am selecting date and filtering Day, week, MTD, QTD and YTD with respect to that date.)

“Time Period Selection” to select Day, Week, Month, Quarter or Year to date.

Finally, I written calculation shown below to filter data

“Time Period Selection Filter”

Case [Time Period Selection]
when 'Day' then
If [Order Date] = [Date Selection] then TRUE else FALSE end
when 'Week' then
if [Order Date] >= dateadd('week',-1,[Date Selection]) and [Order Date] <= [Date Selection] then TRUE else FALSE end
when 'Month' then
if [Order Date] >= dateadd('month',-1,[Date Selection]) and [Order Date] <= [Date Selection] then TRUE else FALSE end
when 'Quarter' then
if [Order Date] >= dateadd('quarter',-1,[Date Selection]) and [Order Date] <= [Date Selection] then TRUE else FALSE end
when 'Year' then
if [Order Date] >= dateadd('year',-1,[Date Selection]) and [Order Date] <= [Date Selection] then TRUE else FALSE end
END



Drag this calculation field “Time Period Selection Filter” to filter shelf and select “True”

By changing parameter and date selection you can see expected results.