in

How to easily break text to sentences/lines/words with JDK Classes?

I remember when I was an undergrad, I had this problem in splitting long pages of text to Sentences. (This was for my Final Year Project which was an NLP (Natural Language Processing) application). Having not thought that something so trivial functionality was already available, I wrote code from hand to split the text.

But recently while I was browsing the java.text package I found this purely awesome class, java.text.BreakIterator. This goes beyond the primary focus and also considers Locale specific differences in languages in finding these breaks.

This supports identifying 4 types of boundaries

Methods in this class, such as next()previous() gives us the feeling as if we are using an Iterator, but all these methods return an int representing the position of those items. There are some neat code samples in the documentation page. Do try it out!

Report

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Newbie

Written by Team ReadMe

Teach Yourself Programming (Codecademy)

WSO2, LightHouse partner to deliver solutions to Brazilian market