This post describes several
tips and guidelines for creating efficient string calculations in Tableau. These
guidelines will help you to improve workbook performance. Myself, I use these
tips regularly in my project implementations. Practically I seen performance
improvement after applying these techniques.

Tip 1: Try to minimize usage of same field
more than once in same calculation.

**Example 1:**

Let's say you create a calculated field
that uses a complicated multiple line calculation to find mentions, or Twitter
handles, in tweets. The calculated field is titled, Twitter Handle. Each handle
that is returned starts with the '@' sign (for example: @user).

For
your analysis, you want to remove the '@' symbol.

To
do so, you can use the following calculation to remove the first character from
the string:

RIGHT([Twitter
Handle], LEN([Twitter Handle]) -1)

This
calculation is quite simple. However, since it references the Twitter Handle
calculation twice, it performs that calculation twice for each record in your
data source: once for the RIGHT function and again for the LEN function.

In
order to avoid calculating the same calculation more than once, you can rewrite
the calculation to one that uses the Twitter Handle calculation only once. In
this example, you can use MID to accomplish the same goal:

MID([Twitter
Handle], 2)

Tip 2: Convert multiple equality
comparisons to a CASE expression or a group

Let's say you have the following
calculation, which uses the calculated field, Person (calc), multiple times and
employs a series of OR functions. This calculation, though a simple logical
expression, will cause query performance issues because it performs the Person
(calc) calculation at least ten times.

IF [Person
(calc)] = 'Henry Wilson'

OR [Person (calc)] = 'Jane Johnson'

OR [Person (calc)] = 'Michelle Kim'

OR [Person (calc)] = 'Fred Suzuki'

OR [Person (calc)] = 'Alan Wang'

THEN 'Lead'

ELSEIF [Person (calc)] = 'Susan Nguyen'

OR [Person (calc)] = 'Laura Rodriguez'

OR [Person (calc)] = 'Ashley Garcia'

OR [Person (calc)] = 'Andrew Smith'

OR [Person (calc)] = 'Adam Davis'

THEN 'IC'

END

OR [Person (calc)] = 'Jane Johnson'

OR [Person (calc)] = 'Michelle Kim'

OR [Person (calc)] = 'Fred Suzuki'

OR [Person (calc)] = 'Alan Wang'

THEN 'Lead'

ELSEIF [Person (calc)] = 'Susan Nguyen'

OR [Person (calc)] = 'Laura Rodriguez'

OR [Person (calc)] = 'Ashley Garcia'

OR [Person (calc)] = 'Andrew Smith'

OR [Person (calc)] = 'Adam Davis'

THEN 'IC'

END

Instead
of using an equality comparison, try the following solutions.

**Solution 1**

Use
a CASE expression. For example:

CASE [Person
(calc)]

WHEN 'Henry Wilson' THEN 'Lead'

WHEN 'Jane Johnson' THEN 'Lead'

WHEN 'Michelle Kim' THEN 'Lead'

WHEN 'Fred Suzuki' THEN 'Lead'

WHEN 'Alan Wang' THEN 'Lead'

WHEN 'Susan Nguyen' THEN 'IC'

WHEN 'Laura Rodriguez' THEN 'IC'

WHEN 'Ashley Garcia' THEN 'IC'

WHEN 'Andrew Smith' THEN 'IC'

WHEN 'Adam Davis' THEN 'IC'

END

WHEN 'Henry Wilson' THEN 'Lead'

WHEN 'Jane Johnson' THEN 'Lead'

WHEN 'Michelle Kim' THEN 'Lead'

WHEN 'Fred Suzuki' THEN 'Lead'

WHEN 'Alan Wang' THEN 'Lead'

WHEN 'Susan Nguyen' THEN 'IC'

WHEN 'Laura Rodriguez' THEN 'IC'

WHEN 'Ashley Garcia' THEN 'IC'

WHEN 'Andrew Smith' THEN 'IC'

WHEN 'Adam Davis' THEN 'IC'

END

In this example, the calculated field,
Person (calc), is only referenced once. Therefore, it is only performed once.
CASE expressions are also further optimized in the query pipeline, so you gain
an additional performance benefit.

**Solution 2**

Create
a group instead of a calculated field.

Tip 3: Convert multiple string
calculations into a single REGEXP expression

**Note:**REGEXP calculations are available only when using Tableau data extracts or when connected to Text File, Hadoop Hive, Google BigQuery, PostgreSQL, Tableau Data Extract, Microsoft Excel, Salesforce, Vertica, Pivotal Greenplum, Teradata (version 14.1 and above), and Oracle data sources.

**Example 1: CONTAINS**

Let's
say you have the following calculation, which uses the calculated field,
Category (calc), multiple times. This calculation, though also a simple logical
expression, will cause query performance issues because it performs the
Category (calc) calculation multiple times.

`IF CONTAINS([Segment (calc)],'UNKNOWN')`

`OR CONTAINS([Segment (calc)],'LEADER')`

```
OR CONTAINS([Segment
(calc)],'ADVERTISING')
```

`OR CONTAINS([Segment (calc)],'CLOSED')`

```
OR CONTAINS([Segment
(calc)],'COMPETITOR')
```

`OR CONTAINS([Segment (calc)],'REPEAT')`

`THEN 'UNKNOWN'`

`ELSE [Segment (calc)] END`

You
can use a REGEXP expression to get the same results without as much repetition.

**Solution:**

```
IF REGEXP_MATCH([Segment (calc)],
'UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT') THEN 'UNKNOWN'
```

ELSE [Segment (calc)] END

With string calculations that use a similar pattern, you can use the same REGEXP expression.

**Example 2: STARTSWITH**

```
IF STARTSWITH([Segment
(calc)],'UNKNOWN')
```

`OR STARTSWITH([Segment (calc)],'LEADER')`

```
OR STARTSWITH([Segment
(calc)],'ADVERTISING')
```

`OR STARTSWITH([Segment (calc)],'CLOSED')`

```
OR STARTSWITH([Segment
(calc)],'COMPETITOR')
```

`OR STARTSWITH([Segment (calc)],'REPEAT')`

`THEN 'UNKNOWN'`

####
**Solution**

```
IF REGEXP_MATCH([Segment (calc)], '^(UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT)')
THEN 'UNKNOWN'
```

`ELSE [Segment (calc)] END`

**Note**that the '^' symbol is used in this solution.

####
**Example 3: ENDSWITH**

`IF ENDSWITH([Segment (calc)],'UNKNOWN')`

`OR ENDSWITH([Segment (calc)],'LEADER')`

```
OR ENDSWITH([Segment
(calc)],'ADVERTISING')
```

`OR ENDSWITH([Segment (calc)],'CLOSED')`

```
OR ENDSWITH([Segment
(calc)],'COMPETITOR')
```

`OR ENDSWITH([Segment (calc)],'REPEAT')`

`THEN 'UNKNOWN'`

`ELSE [Segment (calc)] END`

####
**Solution**

```
IF REGEXP_MATCH([Segment (calc)],
'(UNKNOWN|LEADER|ADVERTISING|CLOSED|COMPETITOR|REPEAT)$') THEN 'UNKNOWN'
```

`ELSE [Segment (calc)] END`

**Note**that the '$' symbol is used in this solution.

Tip 4: Manipulate strings with
REGEXP instead of LEFT, MID, RIGHT, FIND, LEN

Regular expressions
can be a very powerful tool. When doing complex string manipulation, consider
using regular expressions. In a lot of cases, using a regular expression will
result in a shorter and more efficient calculation.

####
**Example 1**

Let's say you have
the following calculation, which removes protocols from URLs. For example:
"https://www.tableau.com" becomes "www.tableau.com".

```
IF (STARTSWITH([Server],
"http://")) THEN
```

```
MID([Server], Len("http://")
+ 1)
```

```
ELSEIF(STARTSWITH([Server],
"https://")) THEN
```

```
MID([Server], Len("https://")
+ 1)
```

```
ELSEIF(STARTSWITH([Server],
"tcp:")) THEN
```

`MID([Server], Len("tcp:") + 1)`

```
ELSEIF(STARTSWITH([Server],
"\\")) THEN
```

`MID([Server], Len("\\") + 1)`

`ELSE [Server]`

`END`

####
**Solution**

You
can simplify the calculation and improve performance by using a REGEXP_REPLACE
function.

```
REGEXP_REPLACE([Server],
"^(http://|https://|tcp:|\\\\)", "")
```

####
**Example 2**

Let's say you have
the following calculation, which returns the second part of an IPv4 address.
For example: "172.16.0.1" becomes "16".

```
IF (FINDNTH([Server], ".",
2) > 0) THEN
```

`MID([Server],`

`FIND([Server], ".") + 1,`

```
FINDNTH([Server], ".", 2) -
FINDNTH([Server], ".", 1) - 1
```

`)`

`END`

####
**Solution**

You
can simplify the calculation and improve performance by using a REGEXP_EXTRACT
function.

```
REGEXP_EXTRACT([Server],
"\.([^\.]*)\.")
```

Tip 5: Do not use sets in
calculations

If you are using
sets in a calculation, consider replacing them with an alternative, but
equivalent calculation.

####
**Example**

Let's say you have
the following calculation, which uses the set, Top Customers (set).

```
IF ISNULL([Customer Name]) OR [Top customers (set)] THEN
[Segment] ELSE [Customer Name] END
```

####
**Solution 1**

If the set is
simple, you can create a calculated field that returns the same result as the
set. For example:

`CASE [Customer Name]`

`WHEN 'Henry Wilson' THEN True`

`WHEN 'Jane Johnson' THEN True`

`WHEN 'Michelle Kim' THEN True`

`WHEN 'Fred Suzuki' THEN True`

`WHEN 'Alan Wang' THEN True`

`ELSE False`

`END`

**Note:**Using the pattern WHEN TRUE … ELSE is recommended in this situation to avoid performance issues due to the use of sets. It is not a recommended pattern in most scenarios.

####
**Solution 2**

If the set is more
complex, consider creating a group that maps all the elements in the set to a
given value or attribute, such as 'IN', and then modify the calculation to
check for that value/attribute. For example:

```
IF ISNULL([Customer Name]) OR [Top Customers(group)]='IN' THEN
[Segment] ELSE [Customer Name] END
```

Tip 6: Do not use sets to group your data

Sets
are meant to make comparisons on subsets of data. Groups are meant to combine
related members in a field. Converting sets to groups, such as with the following
example, is not recommended:

`IF [Americas Set] THEN "Americas"`

`ELSEIF [Africa Set] THEN "Africa"`

`ELSEIF [Asia Set] THEN "Asia"`

`ELSEIF [Europe Set] THEN "Europe"`

`ELSEIF [Oceania Set] THEN "Oceania"`

`ELSE "Unknown"`

`END`

This
is not recommended for the following reasons:

·

**Sets are not always exclusive.**Some members can appear in multiple sets. For example, Russia could be placed both in the Europe set and the Asia set.
·

**Sets cannot always be translated to groups.**If the sets are defined by exclusion, conditions, or limits, it might be difficult or even impossible to create an equivalent group.