828. Count Unique Characters of All Substrings of a Given String #
Problem #
Let’s define a function countUniqueChars(s) that returns the number of unique characters on s, for example if s = "LEETCODE" then "L", "T","C","O","D" are the unique characters since they appear only once in s, therefore countUniqueChars(s) = 5.On this problem given a string s we need to return the sum of countUniqueChars(t) where t is a substring of s. Notice that some substrings can be repeated so on this case you have to count the repeated ones too.
Since the answer can be very large, return the answer modulo 10 ^ 9 + 7.
Example 1:
Input: s = "ABC"
Output: 10
Explanation: All possible substrings are: "A","B","C","AB","BC" and "ABC".
Evey substring is composed with only unique letters.
Sum of lengths of all substring is 1 + 1 + 1 + 2 + 2 + 3 = 10
Example 2:
Input: s = "ABA"
Output: 8
Explanation: The same as example 1, except countUniqueChars("ABA") = 1.
Example 3:
Input: s = "LEETCODE"
Output: 92
Constraints:
0 <= s.length <= 10^4scontain upper-case English letters only.
Problem Summary #
If a character appears in string S exactly once, then we call it a unique character. For example, in the string S = “LETTER” , “L” and “R” can be called unique characters. We then define UNIQ(S) as the number of unique characters in string S. Thus, in S = “LETTER” , UNIQ(“LETTER”) = 2.
For a given string S, calculate the sum of the number of unique characters (i.e. UNIQ(substring)) over all non-empty substrings. If two or even more identical substrings appear at different positions in S, we consider these substrings different. Considering that the answer may be very large, the return format is specified as: result mod 10 ^ 9 + 7.
Solution Ideas #
- For this problem, you can first try using a brute-force solution, but after submitting you will find that the judge result is time limit exceeded. One failing test case is a string with 10000 characters. The brute-force solution times out because it traverses too many subintervals.
- Think about this problem from another angle. When character X appears more than 2 times in a substring, it has no effect on the final result, so only when a certain character appears exactly once will it affect the final result. Furthermore, the total number of non-repeated characters in a substring is exactly the UNIQ value of this substring. For example, for “ABC”, the UNIQ value of this substring is 3. It can be calculated like this: it is a unique string belonging to A, also a unique string belonging to B, and also a unique string belonging to C. Then the problem of calculating this substring can be decomposed into calculating how many unique substrings A has, how many unique substrings B has, and how many unique substrings C has. When calculating how many substrings A has, it will definitely include the substring “ABC”. Therefore, the original problem is transformed into separately calculating the sum of the total number of times each character in the given string appears in unique strings.
- Suppose the original string is BAABBABBBAAABA. This string contains many A’s and many B’s. Suppose we have currently calculated the position of the 3rd A (index = 5), that is, the red A. How do we calculate in which substrings this A is unique? Since the problem requires substrings to be continuous intervals, this problem is very simple. Find the index position of the previous A before this A (index = 2), then find the index position of the next A after this A (index = 9), namely BAABBABBBAAABA. There are 2 characters in the interval between the first A and the current A, and 3 characters between the third A and the current A. Then the currently calculated A appears as unique in
(2 + 1) * (3 + 1) = 12substrings. These 12 strings are:A,BA,BBA,AB,ABB,ABBB,BAB,BABB,BABBB,BBAB,BBABB,BBABBB. Calculation method: suppose the index of the character currently to be calculated is i, find the index position left where the current character last appeared, then find the index position right where the current character next appears. Then the number of characters contained in the open interval* of the left interval (left,i) is i - left - 1, and the number of characters contained in the ***open interval**** of the right interval (i,right) is right - i - 1. On both the left and right sides, the empty string case also needs to be considered, that is, neither side needs to take any characters; this corresponds to having only the middle characterAto be calculated. Therefore, the empty string case needs to be added on both sides: left side i - left - 1 + 1 = i - left, right side right - i - 1 + 1 = right - i. Combine the cases on the left and right sides, that is, (i - left) * (right - i). Calculate such a value for every character in the string, and the final accumulated sum is the total UNIQ value required by the problem.
Code #
package leetcode
func uniqueLetterString(S string) int {
res, left, right := 0, 0, 0
for i := 0; i < len(S); i++ {
left = i - 1
for left >= 0 && S[left] != S[i] {
left--
}
right = i + 1
for right < len(S) && S[right] != S[i] {
right++
}
res += (i - left) * (right - i)
}
return res % 1000000007
}
// Brute-force solution, times out! Time complexity O(n^2)
func uniqueLetterString1(S string) int {
if len(S) == 0 {
return 0
}
res, mod := 0, 1000000007
for i := 0; i < len(S); i++ {
letterMap := map[byte]int{}
for j := i; j < len(S); j++ {
letterMap[S[j]]++
tmp := 0
for _, v := range letterMap {
if v > 1 {
tmp++
}
}
if tmp == len(letterMap) {
continue
} else {
res += len(letterMap) - tmp
}
}
}
return res % mod
}