-
Notifications
You must be signed in to change notification settings - Fork 111
Home
"Take big bites. Anything worth doing is worth overdoing." — Robert A. Heinlein, Time Enough for Love
ugrep.com with a helpful and compact user guide.
Very honored to receive the Google OSPB 2022 award for my work on ugrep. But let's not forget all the people who offered suggestions, comments and otherwise contributed to the project!
Ugrep has a clear roadmap. Ugrep is relatively new, so there is still some room for new features and improvements:
- the highest priority is testing and quality assurance to continue to make sure ugrep has no bugs and is reliable
- make ugrep even faster, see benchmarks
- listen to users to improve ugrep
- continue developing ugrep file indexing to speed up cold search performance
We were looking for an efficient grep tool to quickly dig through hundreds of zip- and tar-archived project repos with thousands of source code files, documentation files, images, and binary files. We wanted to do this without having to expand archives, to save time and storage resources. With ugrep we have the ability to specifically search source code (with option -t
) while ignoring everything else in these huge zip- and tar-archives. Even better, ugrep can ignore matches in strings and comments in source code using "negative patterns", e.g. with pre-defined patterns ugrep -f c++/zap_strings -f c++/zap_comments ...
. To keep ugrep clean BSD-3 source code unencumbered by GPL or LGPL terms and conditions, I wrote my own tar, zip, pax and cpio unarchivers from scratch in C++ that call external decompression libraries linked with ugrep.
Later on, we started to make ugrep a lot faster (see the part below). After that, many users offered suggestions to add more features, such as Boolean search queries, fuzzy search, improved TUI, binary search with hexdumps, and file indexing.
Ugrep uses the new method I presented in my talk at the Performance Summit IV. I explain in more detail the new method and performance results in my article. Ugrep is almost always faster than other grep tools for common search patterns and usage scenarios. Ugrep uses new methods from our research with new logic and arithmetic techniques to predict matches. When a possible match is predicted, a pattern match is performed with our high-performance RE/flex library. This DFA-based regex library is much faster to match patterns than other libraries such as PCRE2, even when PCRE2's JIT is enabled. In addition, ugrep's worker threads are optimally load-balanced. We also use AVX/SSE/ARM-NEON/AArch64 instructions and utilize efficient non-blocking asynchronous IO.
We at our research lab (and many others) use, test, and evaluate ugrep regularly and we cannot accept errors. Our RE/flex library that is used by ugrep has been around for several years and is stable. Ugrep also meets the highest quality standards (A+) for C++ source code according to lgtm. We continue field-testing ugrep. If there is any problem, let us know by opening an issue, so everyone benefits!
Some examples of what's new that other grep tools don't offer:
Option -Q
opens a query UI to search files as you type (press F1 or CTRL-Z for help and options):
Option -t
searches files by file type and predefined source code search patterns can be specified with option -f
:
Option -z
searches archives (cpio, pax, tar, zip) and compressed files and tarballs (zip, gz, bz2, xz, lzma, Z, lz4, zstd):
Options -U
, -W
and -X
search binary files, displayed as hexdumps:
Option --filter
searches pdf, office documents, and more:
Option -Z
searches for fuzzy (approximate) matches within an optionally specified max error:
Option --pretty
enhances the output to the terminal. You can specify pretty
in a .ugrep configuration file so that ug -l
lists directory trees instead of the traditional flat grep list:
Context options -ABC
also work with option -o
to display the context of the only-matching pattern part on a line, by fitting the match in the specified number of columns. This is particularly useful when searching files with very long lines!
Not really. We carefully designed and gradually implemented ugrep without limits, unlike some other grep tools that warn about potential truncated output under certain conditions. For example, unlike other grep tools, there are no practical limitations on the match size for multiline patterns, even when its context (option -C
) is large. There is no limit on the file size, which may exceed 2GB. The maximum regex pattern length is 2GB. If the pattern causes excessive memory requirements due to its size and complexity, then an error message may be generated before ugrep starts searching. This should not happen in any practical use case.
U name it. The U wasn’t used by any other grep tools I could find, so “ugrep” was a logical choice. But if you really must, take a pick:
- User friendly grep (not very meaningful, but OK...)
- Universal grep (includes features of other grep, but "Universal" means much more than that...)
- Unicode grep (ugrep is not the only grep that supports Unicode though)
- Ultra grep (yes it is fast, but "ultra" in general is a bit over the top...)
- Ultimate grep (not there yet, if ever, ha ha...)
- Uberty grep (sounds too über...)
- Unzymotic grep (too fab...)
- u grep ("you grep"? Maybe that sounds just right!)
Absolutely! There are many ways to contribute. If you have a suggestion or if you're not happy with something then post it as an issue.
A shout out and a big thank you to our heroes, the project contributors: rbnor, ribalda, theUncanny, ucifs, NightMachinary, jonassmedegaard, cdluminate, grylem, ISO8807, 0x7FFFFFFFFFFFFFFF, bolddane, marc-guenther, rrthomas, illiliti, stdedos, bmwiedemann, pete-woods, paoloschi, mmuman, alex-bender, smac89, htgoebel, gaeulbyul, dicktyr, andresroldan, AlexanderS, NapVMk, chy-causer, camuffo, trantor, essays-on-esotericism, hanyfarid, reneeotten, wahjava, idigdoug, ericonr, juhopp, emaste, zoomosis, ChrisMoutsos, wimstefan, navarroaxel, korziner, carlwgeorge and others.
Please ⭐️ the project if you use ugrep (even occasionally) to thank the contributors for their hard work!
-- Robert