As content owners, we’re sometimes asked to take on big projects to maintain that content. For example:
Last year at Symantec, we stopped using the verisign.com domain. We needed to update all VeriSign URL references to Symantec URLs in our products and emails. The project scope was all the URLs in more than 20,000 files.
At a previous company, we used several old, large Flare projects. Some of the files had weird HTML that caused issues with our CSS files. We needed to remove this legacy content from the 10,000 page project.
Big projects like these go beyond what you can do with a few regular expressions. The expanded scope requires more planning and more regular expressions applied to more locations.
To tackle these projects, I wrote a program to apply regular expressions to the files in a directory. This program let me to focus on writing regular expressions and made it easy to test when I wanted to measure my progress. You can get the C# program here.
Getting started with the programmatic approach
Let’s take a look at how to customize the program for your own purposes.
The program has a main function and two named functions:
- Main function
- ProcessDirectory
- ProcessFile
Updating the directory location in the Main function
The main function contains a variable for the directory you want to update.
1 2 3 4 |
|
By default, the directory variable points to a folder. If you want to point to a specific folder, there’s no need to build the application to run it. Instead, just hit the Start button on the Visual Studio toolbar.
If you want to run the program as an executable in the directory of your choice, you can do that by making a few changes:
1 2 3 4 |
|
Now when you’re ready to test, build the application (CTRL + SHIFT + B) and then grab the RegexForWriters.exe file from the visual studio project’s RegexForWriters\RegexForWriters\bin\Debug folder.
Run the .exe file in the directory you want to update.
Setting allowed file extensions with ProcessDirectory()
If you only want to process specific file types, you can make a few changes to the ProcessDirectory method. Here’s the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
If you want to change the file types to edit, change this line: if (Path.GetExtension(file) == ".txt" || (Path.GetExtension(file) == ".html")) {
As written, only .txt and .html files are processed. To process .properties, .xml, and .htm files instead:
if (Path.GetExtension(file) == ".properties" || (Path.GetExtension(file) == ".xml") || (Path.GetExtension(file) == ".htm")) {
Adding regular expressions to ProcessFile()
The ProcessFile method is the place you’ll add regular expressions and tell the program how to update the text. Here’s the entire method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Use the Regex.Replace()
method to apply regular expressions to the text. The method takes three or four arguments: Regex.Replace(text to update, regular expression, replacement text, regex options)
The text to update is the file text, stored in the text
variable. After that, specify your regular expression. If your regular expression has quotes in it, you may want to escape it with an @ at the beginning and then an additional quote before each quote, as shown here: @"<p.*?class="".*?unnecessary.*?"".*?>.*?<\/p>"
. Always put the regex and replacement text in quotes because they’re strings.
Using group text in replacements
One of the cool things about regex groups is that you can reference group values in your replacement text. This can save a lot of time. The syntax is the same as usual. Read this post to learn more about regex groups.
1 2 |
|
More uses for this program
Because this program quickly updates files in a directory, it is broadly applicable to file-related grunt work. You might use it to: * rename files in bulk * customize a web help output by inserting HTML and CSS and JavaScript references
Recently, we migrated some emails from XML to properties and I used this program to fetch content from XML and create the appropriate properties files. If you’re interested in using the programmatic approach to text editing tasks, this program is a great place to start.