Friday, September 10, 2010

Trie data structure - In C++



Having had a comprehensive coverage of the TRIE data structure in Java, me and my roommate thought it would achieve completion if we have the same implemented in C++. Don't get carried away by the length of the code. Its as simple and easy as the equivalent one in Java. You may go through the comprehensive tutorial here. Trust me, it takes only 10 minutes!!.

Please feel free to ask any questions if you face difficulties in understanding any part of the resource. I would respond to you immediately.

/*
    ASSUMPTIONS:
        All characters are case insensitive
    ENHANCEMENTS:
        Auto correction of word
        Display all words in postorder

    SAMPLE DATA: - ALGO, ALL, ALSO, ASSOC, TREE, TRIE

                                +--> [G] ---+--> [O]
                                |
                    +--> [L] ---+--> [L]
                    |           |
        +--> [A] ---+           +--> [S] ---+--> [O]
        |           |
        |           +--> [S] ---+--> [S] ---+--> [O] ---+--> [C]
[\0] ---+
        |                       +--> [E] ---+--> [E]
        |                       |
        +--> [T] ---+--> [R] ---+
                                |
                                +--> [I] ---+--> [E]

*/

#include <iostream>
#include <cctype>

using namespace std;

class trie
{
    private:
        struct node
        {
            char  character;       // character of the node
            bool  eow;             // indicates a complete word
            int   prefixes;        // indicates how many words have the prefix
            node* edge[26];        // references to all possible sons
        }*root;                        // trie root node

        void preorder_display(node *); // preorder display
        void truncate_node(node *);    // Deletes node and sub-nodes
        void delete_node(node *);      // Deletes node if prefixes is 1

    public:
        trie();                        // constructor
        ~trie();                       // destructor

        void insert_word(char *);      // to insert a word
        void delete_word(char *);      // to delete a word
        bool search_word(char *);      // to search a word

        void display();                // display complete trie
};

trie::trie()
{
    root = new node();
    root->character = '\0';
    root->prefixes = 0;
    root->eow = false;
    for(int i=0;i<26;i++)
    {
        root->edge[i] = NULL;
    }
}

trie::~trie()
{
    truncate_node(root);
}

void trie::truncate_node(node* n)
{
    for(int i=0;i<26;i++)
    {
        if(n->edge[i] != NULL)
        {
            truncate_node(n->edge[i]);
        }
    }
    delete n;
}

void trie::insert_word(char* s)
{
    node *t = root;
    while(*s != '\0')
    {
        int c = toupper(*s) - 'A';
        if(t->edge[c] == NULL)
        {
            node* n = new node();
            n->character = *s;
            n->eow = false;
            n->prefixes = 1;
            for(int i=0;i<26;i++)
            {
                n->edge[i] = NULL;
            }
            t->edge[c] = n;
            t = n;
        }
        else
        {
            t = t->edge[c];
            t->prefixes++;
        }
        *s++;
    }
    t->eow = true;
}

bool trie::search_word(char* s)
{
    node *t = root;
    while(*s != '\0')
    {
        int c = toupper(*s) - 'A';
        if(t->edge[c] == NULL)
        {
            return false;
        }
        else
        {
            t = t->edge[c];
        }
        *s++;
    }
    if(t->eow == true)
        return true;
    else
        return false;
}

void trie::delete_word(char* s)
{
    node* t = root;
    while(*s != '\0')
    {
        int c = toupper(*s) - 'A';
        if(t->edge[c] == NULL)
        {
            return;
        }
        else if(t->edge[c]->prefixes == 1)
        {
            truncate_node(t->edge[c]);
            t->edge[c] = NULL;
            return;
        }
        else
        {
            t = t->edge[c];
        }
        *s++;
    }
}

void trie::display()
{
    preorder_display(root);
}

void trie::preorder_display(node* t)
{
    if(t == NULL)
        return;

    cout << "iterating :: " << t->character << " :: " << t->eow << " :: " << t->prefixes << endl;

    for(int i=0;i<26;i++)
    {
        if(t->edge[i] != NULL)
            preorder_display(t->edge[i]);
    }
} 


Demonstration of trie operations

int main()
{
    trie mytrie;
    char *s[] = {"tree","trie","algo","assoc","all","also","ass"};
    for(int i=0;i<sizeof(s)/sizeof(*s);i++)
    {
        mytrie.insert_word(s[i]);
    }

    mytrie.display();

    if(mytrie.search_word("all") == true) cout << "all exist" << endl;
    else cout << "all do not exist" << endl;

    mytrie.delete_word("all");

    if(mytrie.search_word("all") == true) cout << "all exist" << endl;
    else cout << "all do not exist" << endl;

    mytrie.display();
}

Cheers!!
Jack.

4 comments:

तपेश महेश्वरी said...

I think
'void trie::delete_word(char* s)' is not perfect. it just look for the condition '(t->edge[c]->prefixes == 1' and then delete the node. Actually we can get this situation and possibly the whole word is not validate yet, it means suppose we have a string in data structure say "abcdef" where "e" has "1" as prefixes. Now suppose we pass a string to be deleted as "abcdeM", this code will delete the string from DS without validating that it does not exists at all.

Please let me know if i am wrong any where.

Prabhu Jayaraman said...

Yes, you are very true Tapesh. We need to check for word existence before deleting it from the trie DS. Thanks for bringing this to our notice. We will update the code accordingly.

Kyle said...

Hi, I'm still learning c++ at the moment, and I'm doing a project where a trie would work best.

First of all thanks a lot for posting this resource, I've had so much trouble finding a good trie resource online.

I am having trouble understanding the node struct within the trie class. Specifically the node* member and the root part (line 39-40). My understanding of struct goes as far as using them like classes with typical data types like int, char etc, so I don't understand why it accepts node as a data type. Also a word on the pointers in the program if you can as I'm still fuzzy on usage and implementation.

Thanks a lot!

Chiranjeevi said...

Hey, What happens if I insert same string two times by doing this:
insert("Hello");
insert("Hello");

Above insert() implementation, will simply increase the prefixes value for each character in the string thus making to difficult to delete that string later.

Its better, if we check for the existence of the string in the trie before inserting it again.

Pls correct me if I am wrong.

Your Ad Here