Web scraping is data extraction. It is data scraping used for extracting data from websites. The web scraping software can directly access the web using the Hypertext Transfer Protocol or a web browser.
Ok i am going to use vuejs 3 and few npm packages so let's do it .
Ok Here are the thing we need to do to make a web scraper. First make sure you have node.js installed in your computer. If you do not have it install then download and install nodejs from here
we need to create a vuejs project. Can follow vuejs guide from here. We create a demo project using command
vue create folderName // Incase if you do not have vue installed run this command sudo npm install --global @vue/cli@latest
So i create project using command
vue create scraper // once project created we get into folder using cd scraper // Now we run server using npm run serve
You should have app up
Now if you go to "src/app.vue". You will see "Welcome message and vue image comming from this component". Here we can change the code to show our content that we will fetch using scraper.
So here now i create a function which is going to get the data from website. this function will have logic for web scraping.
<template> <div> </div> </template> <script> export default { data(){ return { lastestArticles: [], // defined empty array variable } }, methods:{ }, created(){ } } </script>
methods:{ getWebsiteData(){ let self = this; let url = 'https://www.msn.com/en-au?AR=1'; // url we get data from let dataArray = []; // we put data in this array // GET request for remote image in node.js axios({ method: 'get', url: url, }) .then(function (response) { let html = response.data; let $ = cheerio.load(html); $("ul.tertiary li").each(function(){ const title = $(this).find('h3').attr('aria-label'); const image = $(this).find('img').attr('src'); // putting data in array. dataArray.push({ 'title': title, 'image': image }) }); self.lastestArticles = dataArray; // Here we assign value to vuejs variable console.log(dataArray); }); } },
<template> <div> <h1> Web Scraper</h1> <div class="wrapper"> <div v-for="(article,index) in lastestArticles" :key="index"> <span v-text="article.title"></span> <hr> </div> </div> </div> </template>
For explanation check our video guide.