Storing every webpage on the internet, books, movies, audio and software digitally and then making it available to everyone is s a challenging task is an understatement. However, that’s what the San Francisco based non-profit the Internet Archive is doing since it was founded. And that, for the record, was in 1996 by Brewster Kahle, after he had sold 2 successful companies: WAIS Inc to AOL in 1995 and Alexa Internet (yup that Alexa) to Amazon in 1996.
Head over to www.archive.org and you can access this library. When it was first founded it only stored a few webpages. Today it has 1,876,584 movies, 2,310,628 audio recordings, 7,481,674 texts from various books. It also has one of the greatest collections of classic software on the planet. As impressive as they are, all of them pale in comparison to the infamous Wayback Machine.
In case you’re lost: the Wayback Machine is the initiative by the Internet Archive to save webpages and ultimately archive the ENTIRE internet from 1996. In other words, the Wayback Machine is an (awesome) internet time machine.
At the time of writing, the Wayback Machine has 452 BILLION webpages saved. Want to see Microsoft.com back in it’s original form in 1996? No problem! Want to know what Google looked like in August 2003? Here’s your answer!
Why is it doing this? Because its mission, they say, is to build the greatest library on Earth.
To use the WayBack Machine, simply head over to www.archive.org/web. Then enter the website you want to see in the search bar and press enter.
You should then be greeted with a calendar like the one shown below. Click on one of the dates highlighted within a blue circle to view a snapshot of what the website looked like on that particular day. To go further back in time, click on a year in the menu on top which has a black bar.
When it comes to dealing with books, videos, audio recordings and software, the Internet Archive does the process of digitizing and adding them to the library manually. When it comes to collecting web pages for the Wayback Machine, things are different. While the option for anyone to upload webpages exists, most of the work is done with web crawlers.
Web crawlers are automated bots that visit a web page. They visit a link, save the resulting web page and the content on it. Once that is done, the crawlers repeat the process all over again for every other link on the web page. Once the website has been saved, the crawlers will revisit it in anywhere between a few weeks to months and grab an updated version of the website. While this is a simple process, it can still take anywhere between 6-14 months after a crawlers visit before a website appears on the Wayback Machine.
There are requirements, though. When it comes to websites, a crawler will only archive it if the site is listed on the Alexa Rankings, not password protected and the site owners have not used the robots exclusion standard. Even if a website meets these requirements, certain content on it may not be archived. This can be due to various reasons – files exceeding the 10MB limit, simply publishers restricting access. Which is why any website archived on the Wayback Machine is considered to be a snapshot.
So how much space does the Wayback Machine need? 9.6 petabytes, as of December 1st 2014. However, as the internet keeps growing at it’s rapid pace, so too does the archive of the Wayback Machine. Currently it’s growing at approximately 20TB each WEEK. That’s like downloading TWENTY THOUSAND 1080p movies every week!
All this data is stored in specially designed servers that store 1 Petabyte called the PetaBox (pictured above) across 4 data centres. One data centre is located in San Francisco itself inside the Internet Archive headquarters itself. The other two data centres are located in Redwood City and Richmond. The fourth data centre would be the modern day library of Alexandria which acts as a backup to ensure that the humanity never loses the Internet Archive library like the original library of Alexandria.
It’s probably safe to say that archiving the internet doesn’t come cheap. Even if it’s a non-profit, the Internet Archive still needs money for everything it does. According to Wikipedia, the Internet Archive has an annual budget of $10 million. So where does it get the money from? Like any good library, there’s a variety of sources:
Despite the Internet Archive having ambitious goals, it’s business model seems to be very simple.
The average Joe may never use the Wayback Machine, except maybe once or twice to satisfy his curiosity by looking at how his favourite websites were like back in the day. However, the average joe was never the target market to begin with! The main users of the WayBack Machine and the Internet Archive in general are: researchers, historians, scholars.
Furthermore, the WayBack machine is just like any other museum or library preserving our history. Take one look at the modern era and you’ll find that a lot of our culture and records of important events are all stored digitally somewhere on the Internet. However, this doesn’t mean it’ll be there forever – because a webpage lasts for only 77 days on average. The Wayback Machine is the keeper of modern history. History those future generations can learn from so that they don’t repeat our mistakes. Especially the design of, say, the Microsoft website back in the day.
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.
Workshop Agenda - The main purpose of the workshop is to give students the ability to analyze and present data by using Azure Machine Learning, and to provide an introduction to
Workshop Agenda –
The main purpose of the workshop is to give students the ability to analyze and present data by using Azure Machine Learning, and to provide an introduction to the use of machine learning and big data.
Module 1: Introduction to Machine Learning
This module introduces machine learning and discussed how algorithms and languages are used.
· What is machine learning?
· Introduction to machine learning algorithms
· Introduction to machine learning languages
Module 2: Introduction to Azure Machine Learning
Describe the purpose of Azure Machine Learning, and list the main features of Azure Machine Learning Studio.
· Azure machine learning overview
· Introduction to Azure machine learning studio
· Developing and hosting Azure machine learning applications
Module 3: Managing Datasets
At the end of this module, the student will be able to explore various types of data in Azure machine learning.
· Categorizing your data
· Importing data to Azure machine learning
· Exploring and transforming data into Azure machine learning
Module 4: Building Azure Machine Learning Models
This module describes how to use regression algorithms and neural networks with Azure machine learning.
· Azure machine learning workflows
· Using regression algorithms
· Using neural networks
Module 5: Using Azure Machine Learning Models
This module explores how to provide end users with Azure machine learning services, and how to share data generated from Azure machine learning models.
· Deploying and publishing models
· Consuming Experiments
Module 6: Using Cognitive Services
This module introduces the cognitive services APIs for text and image processing to create a recommendation application, and describes the use of neural networks with Azure machine learning.
· Cognitive services overview
· Processing language
· Processing images and video
· Recommending products
Register URL –
FB Event page-
(Tuesday) 12:00 am - 11:59 pm
Blue Chip Training0716092918
Discover new dimensions in connecting the Internet of Things with Narrowband IoT technology at the NB-IoT Forum and Hackathon, organized by Mobitel. Date - 23rd March 2018 at Trace Expert City
Discover new dimensions in connecting the Internet of Things with Narrowband IoT technology at the NB-IoT Forum and Hackathon, organized by Mobitel.
Date – 23rd March 2018 at Trace Expert City – Colombo 10.
Entrance – Free for a limited number of participants.
Register now – https://goo.gl/3cRdHJ
(Friday) 9:00 am - 5:00 pm
Trace Expert City
Maradana Rd, Colombo
Tech Coders V1.0 will be an online 12-hour problem solving competition. During this 12-hour period your problem solving skills will be put to the test through a series of questions. Competition
Tech Coders V1.0 will be an online 12-hour problem solving competition. During this 12-hour period your problem solving skills will be put to the test through a series of questions.
Competition will be conducted on HackerRank.
*Please note that you will be given access to the contest on HackerRank only if you fill this form on or before 11.59 pm on 22nd March (Thursday).
Organized by : Tech Seekers – Sri Lankan Community
24 (Saturday) 8:00 pm - 25 (Sunday) 8:00 am
One on one talk , Q&A and networking session with Manju Nishshanka,Founder and CEO,KRMG Capital. Mr.Nishshanka is a serial entrepreneur with extensive experience in financial markets and disruptive technologies.He has founded
One on one talk , Q&A and networking session with Manju Nishshanka,Founder and CEO,KRMG Capital.
Mr.Nishshanka is a serial entrepreneur with extensive experience in financial markets and disruptive technologies.He has founded and invested in several successful startups in fintech,blockchain ,AI,AR & VR and social media sectors.
He is a keynote speaker at the Digital Asset Investment Forum (DAIF 2018) and also serves on the board of NYU stern Blockchain and digital asset forum.
KRMG capital is an investment and advisory firm focused on early and growth staged startups and digital assets.
University of Sri Jayawardenepura,Sri Lanka partnered with KRMG capital to establish the first ever blockchain laboratory in Sri Lanka.
This will be an eyeopener to the vast potential and opportunities in Blockchain,Digital assets and Cryptonomics.
All are welcome to join the event and the networking session.
(Wednesday) 6:30 pm - 8:30 pm
Lakshman Kadirgamar Institute
Horton Place, Colombo 00700
Chandimal alahakoon 077 22 44 905
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.