Skip to main content

139. Word Break

mediumAsked at Hugging Face

Determine if a string can be segmented into words from a dictionary. Hugging Face uses this as a DP signal problem with direct ML relevance — tokenization itself is a form of word breaking, and understanding how dynamic programming explores segmentation space helps engineers reason about subword tokenizer design.

By Sam K., Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Hugging Face loops.

  • Glassdoor (2026-Q1)Cited in Hugging Face SWE onsite reports as a DP problem with strong tokenization relevance.
  • Blind (2025-11)Hugging Face threads highlight Word Break as a medium problem frequently asked for ML and NLP engineering roles.

Problem

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words. Note that the same word in the dictionary may be reused multiple times in the segmentation.

Constraints

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Examples

Example 1

Input
s = "leetcode", wordDict = ["leet","code"]
Output
true

Explanation: "leetcode" = "leet" + "code".

Example 2

Input
s = "applepenapple", wordDict = ["apple","pen"]
Output
true

Explanation: "apple" + "pen" + "apple". Reuse is allowed.

Example 3

Input
s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output
false

Explanation: No valid segmentation exists.

Approaches

1. Bottom-up DP

dp[i] = true if s[0..i-1] can be segmented. For each position i, check all substrings s[j..i-1] that are in the dictionary and dp[j] is true.

Time
O(n² * m) where m is the average word length for the slice check
Space
O(n)
function wordBreak(s, wordDict) {
  const wordSet = new Set(wordDict);
  const dp = new Array(s.length + 1).fill(false);
  dp[0] = true; // empty prefix is always segmentable
  for (let i = 1; i <= s.length; i++) {
    for (let j = 0; j < i; j++) {
      if (dp[j] && wordSet.has(s.slice(j, i))) {
        dp[i] = true;
        break;
      }
    }
  }
  return dp[s.length];
}

Tradeoff: O(n²) in practice (the slice is O(n) in the worst case). Clear recurrence, easy to explain. Preferred in interviews for its directness.

2. BFS / memoized DFS

Treat each index as a graph node. BFS from index 0: for each position, try all words in the dictionary; if s.startsWith(word, i), add i + word.length to the queue. Reach s.length = success.

Time
O(n * w * L) where w=wordDict size, L=max word length
Space
O(n)
function wordBreak(s, wordDict) {
  const visited = new Set();
  const queue = [0];
  while (queue.length) {
    const start = queue.shift();
    if (visited.has(start)) continue;
    visited.add(start);
    for (const word of wordDict) {
      if (s.startsWith(word, start)) {
        const next = start + word.length;
        if (next === s.length) return true;
        queue.push(next);
      }
    }
  }
  return false;
}

Tradeoff: BFS avoids revisiting positions (visited set). Good when wordDict is small but s is long. The DP approach is generally preferred for its simpler implementation.

Hugging Face-specific tips

Hugging Face will immediately connect this to tokenization: 'This is essentially what a greedy tokenizer does — it tries to match the longest word in the vocabulary at each position. A DP approach finds ALL valid segmentations, not just the greedy one.' Then ask: 'What changes if you need to return all valid segmentations instead of just a boolean?' (Word Break II — LC 140). The ability to articulate this connection signals NLP engineering depth.

Common mistakes

  • Not initializing dp[0] = true — the empty prefix is the base case; without it no word can start at index 0.
  • Using wordDict as a list for membership checks instead of converting to a Set — O(n) per lookup vs. O(1).
  • Forgetting to break early when dp[i] is set to true — otherwise the inner loop continues unnecessarily.
  • Off-by-one: s.slice(j, i) in a length-indexed loop — verify that dp indices and string slice indices align.

Follow-up questions

An interviewer at Hugging Face may pivot to one of these next:

  • Word Break II (LC 140) — return all valid segmentations; use backtracking with memoization.
  • How would you modify the algorithm to prefer the fewest segments?
  • How does this relate to the Viterbi algorithm used in sequence decoding for NLP models?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why must dp[0] = true?

It represents the empty prefix, which is trivially segmentable. Without this base case, no word match at the start of the string can ever set dp[word.length] = true.

What is the time complexity?

O(n² * L) where n = s.length and L = average word length (for the string slice). With a Trie instead of a Set, lookups become O(L) total and the inner loop can be eliminated.

How does a Trie optimize this?

Instead of trying all j < i for each i, walk the Trie from each position i backward — the Trie prunes branches early when no word starts with the current prefix, reducing the search space.

Companies that also ask Word Break