Substrings with single character

Task:- Given a string, find the number of substrings in which all the characters are the same.

For eg:- input = “aabccc” , answer = 10

Explanation:- The substrings which have same characters are -

  • “a”(0), “aa”(0–1) ,
  • “a”(1) ,
  • “b”(2) ,
  • “c”(3) , “cc” (3–4), “ccc”(3–5) ,
  • “c”(4) , “cc”(4–5) ,
  • “c”(5)

Brute Force Solution:- Take all possible substrings into consideration and pass it to a function let’s say is_same() which takes a string as parameter and returns True if all the characters in string are same else returns False.

The time complexity of this approach would be O(N³) as it takes O(N²) to get each possible substring and for each substring we go linearly( O(N) ) to check whether all the characters are the same.

A little better approach to this would be, instead of creating a function that will check all the characters are the same, we can check the current character whether that matches with the previous character in the substring. If it does then it’s valid and we just increment the count of such strings.

The time complexity of this approach would be O(N²) as we iterate through two for loops to generate every possible substring ( which are valid ).

Brute Force Code:-

Note:- We don’t even need to maintain the substring variable. Removing line no 5 & 8 would still give the same answer.

Efficient Approach:- Instead of generating every possible substring, we can calculate the number of substrings that would be valid by just dividing the whole string into a series of segments, each segment would contain the substring with the same characters.

See the image below for better understanding.

We can see there are 3 segments in the given example:-

  1. “aa” . The length of segment = 2. The number of substrings that are valid ( have the same characters ) is 3. { “a”, “a”, “aa” }
  2. “b” . The length of segment = 1. The number of substrings that are valid is 1. { “b” }
  3. “ccc” . The length of segment = 3. The number of substrings that are valid is 6. { “c”, “c”, “c”, “cc”, “cc”, “ccc” }

So we can observe that for a segment of length “P”. The no of valid substrings that can be formed are: P * (P+1) / 2

We can iterate through the string and increment counter if the current char is the same as previous or else we reset the counter. When resetting the counter, we add the no of substrings of that segment to a variable ( answer ). In this way, we can get the required answer in one iteration.

The time complexity of this approach would be O(N) as we only iterate through the string once.

Efficient Code:-