Eric Cressey

Tech writer - Content strategist - Developer

Using Regular Expressions with MadCap Flare and Notepad++

| Comments

You can use regular expressions in MadCap Flare to find, modify, and remove HTML elements from your topic files. Regular expressions are very useful if you’re searching for specific elements that might have different content or different attributes.

One of the best times to use regular expressions is when you need to remove or replace all elements that contain a specific phrase. Because elements may contain various classes and linebreaks, you wouldn’t want to use a simple find a replace for this. Let’s take a look at an example:

Let’s suppose we have a drop-down section on a lot of our topics that we don’t want to use anymore that looks something like this:

      <MadCap:dropDownHotspot><span style="color: #00008b;">In this product</span>
      <li>Here's some content for this section</li>
      <li>Here's more content for this section</li>

For the sake of simplicity, let’s assume that there aren’t any other dropdown elements on these topic pages (If there were, we’d need to use a more precise regular expression). If we wanted to remove this entire section from each topic with find and replace, it’d take quite a bit of manual work. Instead of that, we can use a regular expression to find each occurence and remove it.

<MadCap:dropDown>.*?In this product.*?</MadCap:dropDown>

This regular expression starts by finding the first <MadCap:dropDown> element on the page, matches until it finds the phrase “In this product,” and then matches until it finds the </MadCap:dropDown> closing tag. The period is a special character in regular expressions that matches any character. The asterisk following means “greedy” so .* means “match any character as many times as possible.” The question mark after the asterisk means “lazy,” which tries to find the smallest match possible.

Although you can use regular expressions in Flare, I don’t really recommend using them there. Some of my co-workers have had some weird experiences with Flare’s results for regular expression searches. Instead, I recommend using Notepad++’s regular expression search and replace in files. Notepad++ is a free XML source editor that has pretty strong regular expression support and can search directories on your computer.

Once you’ve installed Notepad++, open it, click Search, and select Find in files…. In the Directory field, browse to the directory of your Madcap Flare project’s content folder. In the Filters field, enter *.htm (this will restrict the search to only your topic files). Under Search Mode, select Regular expression and .matches newline.

Particularly Useful Regular Expression Patterns

If you’re interested in finding an element and its contents, you’ll probably want to use something like: <element>.*?</element> Which will find all of the contents within the element tags without accidentally finding too much stuff. The lazy character (?) here is critical.

If you want to match something only when it doesn’t follow something else, you can use a Negative Lookbehind. For example (?<!super)hero matches hero, but not superhero. ?<! is the syntax for a negative lookbehind.

Another example: Let’s say we want to find a <div> that contains “In product B” as long as it doesn’t come after a <div> that contains “In product A.”

You’d use this regular expression:

(?<!<div>In product A)<div>In product B.*?</div></p>

Similarly, you use a regular expression to match an element as long as it isn’t followed by something else. This is called a Negative Lookahead For example, you could find the last paragraph element on the page that contains “In this product” using the following regular expression. ?! is the syntax for a negative lookahead.

<p(?!.*<p>).*?>In this product<.*?/p>

The uses for regular expressions are limited in Flare and, hopefully, you won’t have to use them very often. If you’re working with very large projects and you need to remove or modify elements from many topics, regular expressions might make things a lot easier.