Summary: This article will help you to learn grep command in Linux together with Regex.

What is a Regular Expression?

A regular expression (regex) describes a set of possible input strings. In simple words they allow us to search for text in files. Regular expressions descend from a fundamental concept in Computer Science called “finite automata theory“.

The simplest regular expressions are a string of literal characters to match. The string matches the regular expression if it contains the sub string.

regex linux
Regular Expression

Regex Meta-characters

Meta characters are special type of characters that has a special meaning during the pattern processing. Following are some of the most common meta characters used in regular expressions.

  • \d – Matches any digit character (0-9)
Regex meta character to match any digit character.
  • \w – Matches any alphanumeric character and the underscore (0-9)(A-Z)(a-z)(_)
Regex meta character to match alphanumeric character
  • \W – Matches any character that is not a word character (alphanumeric & underscore).
Regex meta character to match any character that is not a word character
  • | (alternation) – Acts like a Boolean “OR”. Matches the expression before or after the |
Regex meta character to match the expression before or after the |
  • ? (Quantifier) – Matches 0 or 1 of the preceding token, effectively making it optional.
Regex Quantifier
  • * – Matches 0 or more of the preceding token.
Regex meta character to match 0 or more of the preceding token
  • (period) – Matches any character
Regex meta character to match any character

Regex Character Classes (Character Sets)

Character classes can be used to match any specific set of characters. In these the order of the characters inside the character class does not matter.

Regular Expression Character Classes Example
Regular Expression Character Classes Example

In here [aeiou] will match any of the characters “a”,”e”,”i”,”o”,”u”.

Also Read: How To Install Burp Suite In Ubuntu

Negated Character Classes in Regex

Character classes can be negated using the “^” syntax.

Regex Negating Character Classes

Named Character Classes in Regex

Commonly used character classes can be referred to by name.

  • [a-zA-Z] = [[:alpha:]]
  • [a-zA-Z0-9] = [[:alnum:]]
  • [0-9] = [[:digit:]]

Regex Anchors

Anchors will match a position within a string, not a character.

  • ^ – This means beginning of a line.
Matching the beginning of a line
Matching the beginning of a line
  • $ – This means ending of a line.
Matching the ending of a line
Matching the ending of a line

Regex Match Length Limit

A match will be the longest string that satisfies the regular expression.

regex match length

Regular Expression Repetition Ranges

We can also specify ranges in regular expressions. “{ }” notation can specify a range of repetitions for the immediately preceding regex.

  • {n} – Repeat the previous symbol exactly n times
  • {n,} – Repeat the previous symbol n or more times
  • {n,m} - Repeat the previous symbol at least n occurrences but no more than m occurrences.

For example, n{1,3} will match any text that has between 1 and 3 consecutive letters.

Regex Repetition Ranges Example

How To Use grep With Regex Linux

grep comes from the ed (Unix text editor) search command “global regular expression print”. This was a useful command that it was written as a standalone utility. There are two other variants, egrep and fgrep that comprise the grep family.

grep is the answer to the moment where you know you want the file that contains a specific phase but you can’t remember it’s name.

Family Differences

  • grep – uses regular expressions for pattern matching.
  • fgrep – file grep, does not use regular expressions, only matches fixed strings but can get search strings from a file.
  • egrep – extended grep, uses a more powerful set of regular expressions but does not support backreferencinng.
  • agrep – approximate grep, not standard.

How To Use grep Command In Linux With Examples

Now let’s see some uses of regex together with grep command in the Linux terminal.

How To Search A Word In A File In Linux

For the demonstration purpose I’m going to use the standard american dictionary as the search file.

So let’s say you want to find every word that has the sub string “cat” in it. You can find it easily using the grep command with the help of regex.

grep "cat" /usr/share/dict/american-english
how to search a word in a file in Linux

Let’s say you want to search each and every word that has a sub string starts with “c” and ends with “t”. In between those two words it could be any of the characters “a”,”e”,”i”,”o”,”u”. You can do this using character classes in regex.

grep "c[aeiou]t" /usr/share/dict/american-english

You can also negate the above character class using a “^”.

grep "c[^aeiou]t" /usr/share/dict/american-english

If you want to find words that has the sub string “cat” only at the beginning of word you can use anchors.

grep "^cat" /usr/share/dict/american-english

Same as above if you want the sub string “cat” at the end of line you can use the “$” anchor.

grep "cat$" /usr/share/dict/american-english

grep command in Linux is a very useful and handy tool that you can use to make your daily tasks more easier. Therefore, the time you spend to get comfortable with grep and all it’s options will not be a waste.

In the other hand when using tools like grep you will automatically get used to regex a.k.a regular expressions. Regular expressions are used in many programs such as text editors like atom, vim, VS code. Also most modern programming languages use regex.

Thank you for reading. Hope you learned something new.

Write A Comment