Group Anagrams
Problem Statement
Given an array of strings strs
, group the anagrams together. You can return the answer in any order.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once
For example,
Input: strs = ["eat","tea","tan","ate","nat","bat"]
An acceptable output could be like:
Note: All anagrams would have the same frequency count. Thus, their sorted letters would also be the same
Hashing + Sorting
We’ll use Hashmap with keys as the sorted letters of word. All anagrams present in input array that match it will be grouped together as value of that key
If input contains ’\(n\)’ strings of average length ’\(m\)’ and let ’\(k\)’ be the number of groups, which is \(n\) at worst
Time taken:
- \( O(n) \) to iterate through input strings
- \( O(m \cdot logm) \) to sort each word i.e. \( O(n \cdot m \cdot logm) \) to sort all words
- \( O(1) \) for appending word entry into map
- When we construct answer at end, we will iterate through the constructed map with anagram groups of all elements of input i.e. \( O(k) = O(n) \) at worst
Total time \( = O(n) + O(n \cdot m \cdot logm) + O(1) + O(n) = O(n \cdot m \cdot logm) \)
Metric | Complexity |
---|---|
Time | \( O(n \cdot m \cdot logm) \) |
Space | \( O(n) \) … Frequency map |
Optimization: Instead of sorting letters to form key, use hashing function
Regardless of the order in which characters are processed within the loop, the final hash value H
will be the same for all anagrams. This is because the order of multiplication doesn’t affect the final result when dealing with modulo operations.
Hash functions are provided by deafult in Java, but in C++, we need to provide our own hash function where the data-type of key is not a primitive data-type
The step of sorting letters of word in first method has been replaced with calculating the hash value of the word. In our getHash()
function below, we are iterating over the letters of word i.e. \( O(m) \) time to calculate hash of a word, and thereby \( O(n \cdot m) \) time to calculate hash of all words
Metric | Complexity |
---|---|
Time | \( O(n \cdot m ) \) |
Space | \( O(n) \) … Hashmap |