Custom RegEx Search For Huge C Projects

(Updated on 05/30/2017)


Have you ever worked on a C project so huge that searching for something can yield multiple results, some of which aren’t relevant at all? The Kernel source code is one such example, but many others exist when not all files get compiled…

Today’s sample aims to simplify these RE searches and save you time!



Attachment Content

The zip file you can download at the end of this post contains one file:

  1. the Bash script responsible to manage the searching of RE patterns using either grep or egrep.

Prior To Start Using The Script

It’s important to select which program you’d like the script to use: grep or egrep. By default, egrep is the one selected.

Internally, the PROG variable is an abstraction for this configuration and if you so desire, change it in the line shown below:

Needless to say, the impact using one or the other program is on how you want to write your regular expressions.

For more information on regular expressions, you can check the post A Simple Regular Expression Tutorial.

General Overview

The attached script is basically a recursive search from the current folder, but with some improvements.

There are four main file groups you can address in each search and to each one, more parameters are available so you can further specify what you are looking for.

As a regular search, you can recursively address every single file regardless of their type just ignoring the first set of options. To be more specific, you can search only inside the source code ( .c, .s, and .S files) or only inside the headers ( .h files), but that’s not all this script can do!

What’s really nice and hence makes this code a great tool for huge C projects is the fact it can look for the exact files that were used to compile the target:

  1. First, it lists all .o files and uses every name as the basis to get their .c (or assembly) counterpart;
  2. It proceeds to get all included headers each .c uses, keeping track of them to avoid duplicated entries;
  3. It does the same thing for each .h file found in the previous step and loops until all headers have been found;
  4. Once the list is complete, saves it in case you want to skip the whole process next time;
  5. Finally, your RE pattern is matched against this list.

For a detailed list of options and parameters the script handles, call it with --help to see following (but colored in your console):

Observations Regarding Object Search

Searching Delay

This sample can be pretty useful if whatever program you use can’t correctly work with really large structures, but there is one drawback: the bigger the project gets, the more time it takes to finish the 5 step process described in the last topic…

To alleviate this, it’s a good practice to always use the skip parameter whenever you use the -o option if you know that no new headers were added. If nothing, you’ll have to wait only the first time you run some search.

Residual From Previous Compilations

In order to come up with the list of files that makes up the target, all .o files are listed. This can also be a problem…

Codes like the kernel have a way to enable/disable modules, in this example some form of the menuconfig option, and once you disable something, the Makefile doesn’t delete old objects by its own. The problem then is if you are unlucky enough to search for something that is inside a module you have disabled…

To avoid this, you need to clean and recompile the whole project before using the sample file and all results will be valid once again.

New Objects And Modified Sources

The opposite of the previous problem, what would happen if you change a file or add new modules?

Note this changes will go unnoticed if you are using the skip ( -s) parameter, which is not at all a problem if you only changed files without adding more modules or included headers to the source being compiled.

Calling the script suppressing the skip parameter will trigger the automatic internal skip which checks if no files were modified since the last time you ran the script. This is not the same thing as actually issuing the script to skip via -s: here, only the files’ time stamps count and any changes forces the (tedious) list of files creation. Even if no changes are identified, the process takes a small amount of time to complete.

That being said, if some important modification is made, the scripts already handles it. In case you add a new module that have nothing to do with any source code the target used before, you can always delete the old list and force a new one with the -of option.

Final Words

The attached script may be used and modified at your will, except for commercial use.

Don’t forget to leave any questions in the comments, in case you need some help, and good luck!



Download Attachments

Leave a Reply