字典树初探

2019-12-17 约 2430 字预计阅读 5 分钟

字典树

在计算机科学中，trie，又称前缀树或字典树，字典树设计的核心思想是空间换时间，所以数据结构本身比较消耗空间。但它利用了字符串的**共同前缀（Common Prefix）**作为存储依据，以此来节省存储空间，并加速搜索时间。Trie 的字符串搜索时间复杂度为 O(m)，m 为最长的字符串的长度，其查询性能与集合中的字符串的数量无关。其在搜索字符串时表现出的高效，使得特别适用于构建文本搜索和词频统计等应用

性质

与二分搜索树不同，键不是直接保存在节点中，而是由节点在树中的位置决定。一个节点的所有子孙都有相同的前缀，也就是这个节点对应的字符串，而根节点对应空字符串。一般情况下，不是所有的节点都有对应的值，只有叶子节点和部分内部节点所对应的键才有相关的值

实现

使用 TreeMap 实现

import java.util.*;
public class Trie{
    private class Node{
        public boolean isWord; //是否是一个完整单词
        public TreeMap<Character,Node> next;

        public Node(boolean isWord){
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        public Node(){
            this(false);
        }
    }

    private Node root;

    private int size;

    public Trie(){
        root=new Node();
        size=0;
    }

    public int getSize(){
        return size;
    }

    //向 Trie 中添加 word
    public void addLoop(String word){
        Node cur=root;
        for (int i=0;i<word.length();i++) {
            char c=word.charAt(i);
            if (cur.next.get(c)==null) {
                cur.next.put(c,new Node());
            }
            cur=cur.next.get(c);
        }
        if (!cur.isWord) {
            size++;
            cur.isWord=true;
        }
    }

    //递归的添加
    public void add(String word){
        add(root,word,0);
    }

    public void add(Node cur,String word,int index){
        if (index==word.length()) {
            if (!cur.isWord) {
                size++;
                cur.isWord=true;
            }
            return;
        }
        char c=word.charAt(index);
        if (cur.next.get(c)==null) {
            cur.next.put(c,new Node());
        }
        add(cur.next.get(c),word,index+1); //尾递归
    }

    //查询 word 是否在 Trie 中
    public boolean contains(String word){
        return contains(root,word,0);
    }

    public boolean contains(Node cur,String word,int index){
        if (index==word.length()) {
            return cur.isWord;
        }
        char c=word.charAt(index);
        return cur.next.containsKey(c) && contains(cur.next.get(c),word,index+1);
    }

    //循环
    public boolean containsLoop(String word){
        Node cur=root;
        for (int i=0;i<word.length();i++) {
            Character c=word.charAt(i);
            if (!cur.next.containsKey(c)) {
                return false;   
            }
            cur=cur.next.get(c);
        }
        return cur.isWord;
    }

    //是否有某个前缀
    public boolean hasPerfix(String perfix){
        return hasPerfix(root,perfix,0);
    }

    public boolean hasPerfix(Node cur,String perfix,int index){
        if (index==perfix.length()) {
            return true;
        }
        char c=perfix.charAt(index);
        return cur.next.containsKey(c) && hasPerfix(cur.next.get(c),perfix,index+1);
    }

    //懒得写循环了。
}

整体上来说其实很简单，这里用的是 TreeMap 来存储当前节点的下一层节点，可以看到，在 Node 中并没有直接存储某个字符，而是对应了 TreeMap 的 Key，但是由于 TreeMap 底层是红黑树，其实在数据量比较小的时候并没有优势，所以在某些情况下，我们也可以直接使用数组来存储节点，比如下面这道题

LeetCode 练手例题

208. 实现 Trie （前缀树）

实现一个 Trie （前缀树），包含 insert, search, 和 startsWith 这三个操作。

示例：

Trie trie = new Trie();

trie.insert("apple");
trie.search("apple");   // 返回 true
trie.search("app");     // 返回 false
trie.startsWith("app"); // 返回 true
trie.insert("app");   
trie.search("app");     // 返回 true

说明：

你可以假设所有的输入都是由小写字母 a-z 构成的。
保证所有输入均为非空字符串

解法一

class Trie {
    private class Node{
        public boolean isWord; //是否是一个完整单词
        public Node[] next;

        public Node(boolean isWord){
            this.isWord = isWord;
            next = new Node[26];
        }

        public Node(){
            this(false);
        }
    }

    private Node root;

    /** Initialize your data structure here. */
    public Trie() {
        root=new Node();
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        insert(root,word,0);
    }
    
    public void insert(Node cur,String word,int index){
        if (index==word.length()) {
            if (!cur.isWord) {
                cur.isWord=true;
            }
            return;
        }
        char c=word.charAt(index);
        if (cur.next[c-'a']==null) {
            cur.next[c-'a']=new Node();
        }
        insert(cur.next[c-'a'],word,index+1); //尾递归
    }

    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        return search(root,word,0);
    }
    
    public boolean search(Node cur,String word,int index){
        if (index==word.length()) {
            return cur.isWord;
        }
        char c=word.charAt(index);
        return cur.next[c-'a']!=null&& search(cur.next[c-'a'],word,index+1);
    }
    /** Returns if there is any word in the trie that starts with the given prefix. */
    public boolean startsWith(String prefix) {
        return startsWith(root,prefix,0);
    }
    
    public boolean startsWith(Node cur,String prefix,int index){
        if (index==prefix.length()) {
            return true;
        }
        char c=prefix.charAt(index);
        return cur.next[c-'a']!=null && startsWith(cur.next[c-'a'],prefix,index+1);
    }
}

/**
 * Your Trie object will be instantiated and called as such:
 * Trie obj = new Trie();
 * obj.insert(word);
 * boolean param_2 = obj.search(word);
 * boolean param_3 = obj.startsWith(prefix);
 */

这里题目说明了所有的输入都是小写字符，所以可以直接使用固定大小的 Node 数组来实现，相比于上面的 TreeMap 时间空间上都会有很大的提升

211. 添加与搜索单词 - 数据结构设计

设计一个支持以下两种操作的数据结构：

void addWord(word)
bool search(word)

search(word) 可以搜索文字或正则表达式字符串，字符串只包含字母 . 或 a-z 。 . 可以表示任何一个字母。

示例：

addWord("bad")
addWord("dad")
addWord("mad")
search("pad") -> false
search("bad") -> true
search(".ad") -> true
search("b..") -> true

说明：

你可以假设所有单词都是由小写字母 a-z 组成的。

解法一

private class Node{
    public boolean isWord; //是否找到了一个单词
    public Node[] next;

    public Node(boolean isWord){
        this.isWord = isWord;
        next = new Node[26];
    }

    public Node(){
        this(false);
    }
}

private Node root;

/** Initialize your data structure here. */
public WordDictionary() {
    root=new Node();
}

/** Adds a word into the data structure. */
public void addWord(String word) {
    addWord(root,word,0);
}

public void addWord(Node cur,String word,int index) {
    if (index == word.length()) {
        cur.isWord=true;
        return;
    }
    char c=word.charAt(index);
    if (cur.next[c-'a']==null) {
        cur.next[c-'a']= new Node();
    }
    addWord(cur.next[c-'a'],word,index+1);
}

/** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
public boolean search(String word) {
    return search(root,word,0);
}

public boolean search(Node cur,String word,int index) {
    if (index == word.length()) {
        return cur.isWord;
    }
    char c=word.charAt(index);
    if (c=='.') {
        for (int i=0;i<cur.next.length;i++) {
            if(cur.next[i]!=null && search(cur.next[i],word,index+1)){
                return true;
            }
        }
        return false;
    }
    return cur.next[c-'a']!=null && search(cur.next[c-'a'],word,index+1);
}

没啥好说的，遇到点就循环判断每个子节点就 ok

677. 键值映射

实现一个 MapSum 类里的两个方法，insert 和 sum。

对于方法 insert，你将得到一对（字符串，整数）的键值对。字符串表示键，整数表示值。如果键已经存在，那么原来的键值对将被替代成新的键值对。

对于方法 sum，你将得到一个表示前缀的字符串，你需要返回所有以该前缀开头的键的值的总和。

示例 1:

输入：insert("apple", 3), 输出：Null
输入：sum("ap"), 输出：3
输入：insert("app", 2), 输出：Null
输入：sum("ap"), 输出：5

解法一

有了前面的铺垫，其实这个也很简单

class MapSum {

    private class Node{
        public boolean isWord; //是否找到了一个单词
        public Node[] next;
        public int value;

        public Node(boolean isWord,int value){
            this.isWord = isWord;
            this.value=value;
            next = new Node[26];
        }

        public Node(int value){
            this(false,value);
        }
    }
    
    private Node root;

    public MapSum() {
        root=new Node(0);
    }
    
    public void insert(String key, int val) {
        insert(root,key,val,0);
    }

    //a p p l e
    public void insert(Node cur,String key, int val,int index) {
        if (index==key.length()) {
            cur.isWord=true;
            cur.value=val;
            return;
        }
        char c=key.charAt(index);
        if (cur.next[c-'a']==null) {
            cur.next[c-'a']=new Node(0);
        }
        insert(cur.next[c-'a'],key,val,index+1);
    }
    
    public int sum(String prefix) {
        return sum(root,prefix,0);
    }

    public int sum(Node cur,String prefix,int index) {
        if (index == prefix.length()) {
            return tireSum(cur);
        }
        char c=prefix.charAt(index);
        if (cur.next[c-'a']==null) {
        	return 0;
        }
        return sum(cur.next[c-'a'],prefix,index+1);
    }

    public int tireSum(Node cur){
    	int sum=0;
    	if (cur.isWord) {
    		sum+=cur.value;
    	}
    	for (int i=0;i<26;i++) {
    		sum+=cur.next[i]!=null?tireSum(cur.next[i]):0;
    	}
    	return sum;
    }
}

还是和上面一样，利用前缀树，不过要在上面的基础上做一些改动添加一个 value 字段用来保存值，先在树中找到要求的前缀的最后一个单词所在的节点，然后就直接求这个节点所有的子节点中 isWord 的单词的 value 累加和就 ok 了，我上面的实现并不好，都是写的递归，而且是尾递归，很鸡肋其实

820. 单词的压缩编码

给定一个单词列表，我们将这个列表编码成一个索引字符串 S 与一个索引列表 A。

例如，如果这个列表是 ["time", "me", "bell"]，我们就可以将其表示为 S = "time#bell#" 和 indexes = [0, 2, 5]。

对于每一个索引，我们可以通过从字符串 S 中索引的位置开始读取字符串，直到 “#” 结束，来恢复我们之前的单词列表。

那么成功对给定单词列表进行编码的最小字符串长度是多少呢？

示例：

输入：words = ["time", "me", "bell"]
输出：10
说明：S = "time#bell#" ， indexes = [0, 2, 5] 。

提示：

1 <= words.length <= 2000
1 <= words[i].length <= 7
每个单词都是小写字母。

解法一

2020.3.28 打卡题，简单写了个 Trie，纠结了半天要不要排序

public int res=0;

private Node root;

public int minimumLengthEncoding(String[] words) {
    root=new Node();
    //纠结了半天要不要排序
    Arrays.sort(words,(a,b)->b.length()-a.length());
    for (int i=0;i<words.length;i++) {
        if(!hasPrefix(words[i])){
            add(words[i]);
            res+=(words[i].length()+1);
        }
    }
    return res;
}

private class Node{
    
    public Node[] next;
    
    public Node(){
        next = new Node[26];
    }
}

//递归的添加
public void add(String word){
    //后缀树
    add(root,word,word.length()-1);
}

public void add(Node cur,String word,int index){
    if (index==-1) {
        return;
    }
    char c=word.charAt(index);
    if (cur.next[c-'a']==null) {
        cur.next[c-'a']=new Node();
    }
    add(cur.next[c-'a'],word,index-1); //尾递归
}

public boolean hasPrefix(String word){
    return hasPrefix(root,word,word.length()-1);
}

public boolean hasPrefix(Node cur,String word,int index){
    if(index==-1){
        return true;
    }
    char c=word.charAt(index);
    return cur.next[c-'a']!=null && hasPrefix(cur.next[c-'a'],word,index-1);
}

解法二

直接逆序后按字典序排序，这样单词的前缀包含关系就排好了，比如 time , me, bell, 最后就会变为 em emit lleb

然后遍历所有单词，如果前面的单词是后面的前缀，就把前面的单词丢掉，否则就需要加上当前字符长度，代码后面有时间再补

目录

字典树初探

字典树

性质

实现

使用 TreeMap 实现

LeetCode 练手例题

208. 实现 Trie （前缀树）

211. 添加与搜索单词 - 数据结构设计

677. 键值映射

820. 单词的压缩编码