Friday, April 2, 2010

TRIE data structure Part 1: The TRIE ADT in Java



TRIE is an interesting data-structure used mainly for manipulating with Words in a language. This word is got from the word retrieve. TRIE (pronounced as 'try') has a wide variety of applications in
  • Spell checking
  • Data compression
  • Computational biology
  • Routing table for IP addresses
  • Storing/Querying XML documents etc.,
We shall see how to construct a basic TRIE data structure in Java.

The main abstract methods of the TRIE ADT are,
public void insert(String s);
public boolean search(String s);

In this data-structure, each node holds a character instead of a String. Each node has something called as 'marker' to mark the end of a word. And each node has a Collection of child nodes. This Collection can be either a Set or a List based on the speed vs space criterion.

The basic element - Node of a TRIE data structure looks like this,
char content;
boolean marker;
Collection<Node> child;

A TRIE tree would typically look like the following


The above TRIE is constructed by inserting the words ball, bat, doll, dork, do, dorm, send, sense. The markers are denoted on the Node using a red star(*). We shall look into more about the TRIE data structure in depth in the next post.

6 comments:

sony said...

this might be a stupid question but...is there any method to have index for a node... i have to return the file index as soon as the word is found ..

BragBoy said...

@sony: Can you be a bit elaborative on what do you mean by index here?

John Ortiz said...

It is so clear. Thanks for this information.

BragBoy said...

@John: Glad to see it helped you ! Thank you for stopping by!

Jenish Shah said...

Hi,

Can you please let me know why would we need to mark if node is end of word or not. Node not having any child anyways means that it is end of word.

Please correct me if I am wrong.

Thanks,
Jenish

BragBoy said...

@Jenish,

First up, thanks for reading my post!

For your question, assume if we don't have a marker to mark the end of the word. Then you will never be able to find words like BE, SEE etc., because BE can be within BEst, BEnt, BEt. All these have the same parent route. Same case for SEE. SEEk, SEEm, SEEn etc.,

Hope this helps, if you still have doubts let me know, i will try to clarify them!