Skip to content

Latest commit

 

History

History
123 lines (93 loc) · 5.85 KB

30.Substring_with_Concatenation_of_All_Words(Hard).md

File metadata and controls

123 lines (93 loc) · 5.85 KB

30. Substring with Concatenation of All Words (Hard)

Date and Time: Oct 11, 2024, 16:38 (EST)

Link: https://leetcode.com/problems/substring-with-concatenation-of-all-words/


Question:

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.


Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]

Output: [0,9]

Explanation:
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Output: []

Explanation:
There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Output: [6,9,12]

Explanation:
The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"].
The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"].
The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].


Constraints:

  • 1 <= s.length <= 10^4

  • 1 <= words.length <= 5000

  • 1 <= words[i].length <= 30

  • s and words[i] consist of lowercase English letters.


Walk-through:

  1. Use word_counts{} to record each word in words and their occurancy.

  2. Start from range(word_len) (we can form substring starting from range(0, word_len) and each time we step by word_len), we create l = i, counts = 0, sub_counts{}. And we check from l to r in range(l, len(s)-word_len + 1, word_len), so each increment of r will let us create a sub_word and we check if this sub_word in word_counts or not.

  3. In the for loop, we want to check the range of l, r and find if the words in this range equal to the counts of counts, and if we detect repeated word in this range, ie. sub_counts[sub_word] > word_counts[sub_word], we increment l += word_len and decrement counts -= 1. Later, if we find equal counts == len(words), we find a substring of all words, and we append this l index to res[].

  4. If a current sub_word does not exist in word_counts, we need to increment l = r + word_len and clear sub_counts{}, reset counts = 0.

  5. Finally, return res.


Example 1: from index 0, we can form substring = barfoothefoobarm, so we can find any valid concatenation from this substring. Later, we go to index 1, we can form arfoothefoobarma, for index 2, we can form rfoothefoobarman. This is how we use range(word_len) to find all the concatenations.


Python Solution:

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        # Count words frequency
        # start from each index of s, try if we can find all words from i
        # if yes, append i to res, otherwise, continue
        # if word in curr_counts > word_counts, decrement by moving l

        # TC: O(n^2), SC: O(n)
        res = []
        word_counts = {}
        word_len = len(words[0])
        for word in words:
            word_counts[word] = word_counts.get(word, 0) + 1
        
        for i in range(word_len):
            l = i
            counts = 0
            sub_counts = {}
            # Check sub_word
            for r in range(l, len(s) - word_len + 1, word_len):
                sub_word = s[r:r + word_len]
                # Check if sub_word exists in word_counts
                if sub_word in word_counts:
                    sub_counts[sub_word] = sub_counts.get(sub_word, 0) + 1
                    counts += 1
                    # If repeated word found, shrink the window from left
                    while sub_counts[sub_word] > word_counts[sub_word]:
                        sub_counts[s[l:l + word_len]] -= 1
                        l += word_len
                        counts -= 1

                    if counts == len(words):
                        res.append(l)
                # If sub_word doesn't exist, update l
                else:
                    l = r + word_len
                    sub_counts.clear()
                    counts = 0
        return res

Time Complexity: $O(n^2)$
Space Complexity: $O(n)$


CC BY-NC-SABY: credit must be given to the creatorNC: Only noncommercial uses of the work are permittedSA: Adaptations must be shared under the same terms