We live in an era where we generate vast unprecedented quantities of data. This data which we generate needs to be accommodated and managed both efficiently and effectively. This is where big data comes into play. In the age of more than 200 million emails being sent every second, big data has a very important role to play in it. Especially with each step we take in developing advanced systems bringing us closer to the era of Internet of Things.
Understanding this importance and the need for an effective session for like-minded professionals and enthusiasts, the good people at Virtusa hosted the Colombo Big Data Meetup on the 26th of January, 2016 at the Virtusa Auditorium. The entire meetup became a place for exchanging everyone’s collected knowledge with everyone opening new discussion topics and communicating with what they learned about big data over the years.
The meetup kicked off with a presentation delivered by Dinesh Asanka who is a Microsoft MVP and works as a Senior Architect at Virtusa on the topic of “Handling Uncertainty in Big Data”.
Dinesh explained how recorded data can be uncertain by mentioning a few examples. One such example in the Sri Lankan context being how a real estate system records a certain user who has purchased a house for sale. That user is actually not the end customer but is actually a broker whereas the system shows the broker to the end customer.
He also pointed out three types of uncertainties.
- Inability to measure (undefined, unknown and null)
- Impressions values
- Modified after storing data
The rest of the presentation explained how data can be stored and then how uncertainty type queries can be run. These queries are also known as fuzzy queries, where input parameters are uncertain and based on that the data is then stored in the database or where typical boundaries are not set. Dinesh explained that where big data is concerned, the uncertainty factor can come into play through various methods. In order to manage this data properly, it is important to know these methods where the outcomes and data recorded consist of uncertainty. Dinesh also explained how and where this can be implemented when it comes to systems we actually develop and use.
Dinesh then expressed three areas in data warehousing where this can be used in direct fuzzy; measure as range and fuzzy categorization. He went on to show the audience how the categorization creates uncertainty by taking coconut oil as an example where at times it is considered as an item used in cooking and at times it also is used as a cosmetic or an item used in religious offerings.
“Technology cannot be the uncertainty factor but you should look from the user’s perspective”
Dinesh also showed us that when the data is analyzed, the data collected can be inaccurate on how the item is categorized. He also added how architects consider data on technical aspects and not in an end user perspective which may create certain types of issues. He then concluded his presentation by showing how much we should be concerned with uncertainty and the significant importance of tackling it.
“Many of the real world things are uncertain. So being technical people when we are modelling we can’t simply know the uncertainty. What we do now is if it is uncertain we go and put null. We need to move forward and we need to apply correct theories. The theories are there it is that you need to adapt to that theories and you need to implement those theories”, said Dinesh concluding his presentation.
After Dinesh concluded his presentation on handling uncertainty in big data it was time to move on to the latter part of the session which was the panel discussion with Dr. Thushari Silva, Vice President of Sri Lanka Association for Artificial Intelligence; Sriskandarajah Suhothayan, Technical Lead at WSO2; Dinesh Asanka, Senior Architect at Virtusa and Selvendra Selvaraj, Senior Consultant at Virtusa. This entire panel discussion was moderated by Dinesh Priyankara – Senior Architect at Virtusa.
Priyankara kicked off the discussion by directing the question of the actual necessity of incorporating the big data component in projects to the panel. His question was answered by Asanka saying that the answer to that is “fuzzy” as highlighted by the previous presentation. Then he went on to remind us when it comes to projects, all of us would like to go that extra mile to try something new on the technical side of things but at the same time vendors, advancements in computational power and an actual need for using big data (not just used as a marketing term) over the years have contributed to using of big data in projects.
Sriskandarajah added to Asanka’s answer by mentioning that firms which tend to implement big data even though there’s no actual need, they like to do it anyways due to the current trend When these firms are put forward with the actual costs which are associated with having a big data solution with several servers to accommodate services such as Hadoop and Cassandra they seem to go back on their decision and move with normal database solutions. Therefore, Sriskandarajah said that the need for big data arises only when there is difficulty in managing data for a particular firm regardless of the size of the firm.
When the questions were open to the audience, an audience member asked whether we are asking the right questions regarding big data and using those questions to decide on the technologies needed to address those questions, Asanka replied that at times big data is used to discover things which are unknown and when thinking in that way it is clear that the questions that need to be asked is the actual unknown but it all depends according to the situation.
There also arose the important question about the factors which has to be considered to make the decision on whether big data solutions are needed for a particular project. Selvendra provided an explanation to this question by saying that at the point in which the existing system is unable to handle the increasing data volume, is the time which is suitable to move on to a big data solution.
Sriskandarajah also shared about an instance where privacy could have been violated from a large data set. He went on to explain how such a problem can be overcome by reducing the accuracy of the data stored through the use of anonymous data. He also said that we need to address the privacy concerns which pop up when handling big data when answering a question raised by an audience member working in the telecommunication industry.
In response to the Sri Lankan implementations of big data, Dr. Thushari brought up ICTA’s ongoing efforts to use big data to identify public behavior in order to increase the quality of transportation.
So during the entire session, there were key points which were discussed in order to gain a new level of understanding on the topic of big data and how it is relevant to us as Sri Lankans. In that sense, we would say the whole meetup was an immense success. It helped all attendees to broaden and to actively participate in getting their doubts cleared but also increase the amount of knowledge on the subject of big data.
If you were unable to turn up to the meetup, you can follow the Big Data meetup page where there is a video of the entire session made available for anyone to take a look at. However, we recommend you join future meetups to have effective discussions with the professionals.