Issa.Abdullah

Creating an API that scrapes stock prices

Unlike the major exchanges in the UK or the US, there are no accessible financial data APIs like Alpha Vantage that provide data for Malaysian companies for retail investors like myself. The closest alternative is EOD Historical Data, but their API limit of 20 calls per day and (at the time of writing) only offering end-of-day (EOD) quotes for stocks would render their services nil in my specific use case. As I only require delayed intraday quotes to monitor my portfolio's performance, I could not justify the cost of subscribing to their services. Hence, I decided to create an API that would scrape the popular i3investor website for essential financial data such as company name, daily change and the current stock price.

Requirements

I wanted an API that returns financial data for a given stock symbol so that requests could be sent to a URI like:

/[symbol or stock id]

Luckily enough, i3investor shows stock information by the stock number, in the form:

klse.i3investor.com/servlets/stk/[stock number].jsp

This made it easy to scrape since we have a structured way of getting the data utilising a scraping library like cheerio to parse the webpage and obtain the data we're interested in.

Overview

The scraper API is built on NodeJS using ExpressJS, which allows us to write the program using the model-view-controller design pattern and create an API endpoint effortlessly. Axios and CheerioJS are then used to load the webpage programmatically and parse the webpage for the information we're interested in, respectively. To handle dynamic routes, the body-parser middleware is used to parse the stock number or symbol. The file structure is shown below.

/
|- routes/
|   |- index.js
|
|- controllers/
|   |- stQuoteByID.js
|
|- server.js
|- package.json

Setting up the server

To setup the server, we simply need to import the ExpressJS library and specify a port. We will also use the body-parse middleware to handle the dynamic routes for our endpoints.

require('dotenv').config()
const express = require('express')
const app = express()
const bodyParser = require('body-parser')
const PORT = process.env.PORT || 3000

// Routers
app.use('/', require('./routes/index'))

app.use(bodyParser.urlencoded({ extended: false }))

app.listen(PORT, () => console.log(`Server running on port ${PORT}`))

Scraping for data

First, an asynchronus function, getStockQuote is defined which takes the stock ID as a parameter. This is done so that we can call this function from the routes file later on. Then, we use axios to send a GET request and load the webpage.

async function getStockQuote(stockNumber) {
    const URI = 'https://klse.i3investor.com/servlets/stk/' + stockNumber + '.jsp'
    const response = await axios.get(URI)
    
    ... 
    
}

Finally, we use cheerio to parse the source code obtain the information we need. This is done by in a manner similar to how you would get an element on a page using jQuery.

const $ = cheerio.load(response.data)
const stockPrice  = $('table#stockhdr > tbody > tr:last-child > td:first-child').text().trim()
const stockName   = $('#content > table:nth-child(2) > tbody > tr > td:nth-child(1) > div.margint10 > table:nth-child(2) > tbody > tr > td:nth-child(1) > span').text().trim()
const companyName = $("#content > table:nth-child(2) > tbody > tr > td:nth-child(1) > div.margint10 > table:nth-child(2) > tbody > tr > td:nth-child(3) > span").text().trim()
const dailyChange = $("#stockhdr > tbody > tr:nth-child(2) > td:nth-child(2) > span").text().trim().split(" ")

The location of the data we'd like can be found easily by using the inspect element feature and copying the selector. Then we use the trim() function to sanitise and remove any remaining whitespace. If you are attempting to replicate this, you might want to use RegEx instead, but in this instance, the data obtained required very little sanitisation and the trim would suffice.

Finally, we return the data as an Object.

const output = {
    dateRetrieved: dateRetrieved.toISOString(),
    companyName: companyName,
    name: stockName.split(':')[1].trim().split(' ')[0],
    ticker: stockName.split(':')[1].trim().split(' ')[1].slice(1, - 1),
    stockPrice: stockPrice,
    change: {
        amount: dailyChange[0],
        percentage: dailyChange[0].charAt(0) + dailyChange[1].slice(1, -1)
    }
}

return output;

Setting up API routes

The API contains two routes: one to accept requests sent with the stock number, and another that accepts a company's stock symbol (e.g. AAPL). This can be done easily in the /routes/index.js file using the Router object. In the file, we can define the route /id/[stock id] to parse the stock ID and scrape the corresponding stock on i3investor.

const router = express.Router()

// Get quotes by ID
router.get('/id/:stid', async (req, res) => {
    try {
        const data = await stQuoteByID(req.params.stid)
        res.json(data)

    } catch (err) {
        res.status(500).send(`Internal Server Error: ${err}`)
    }
})

As you can see, we are calling the asynchronus function stQuoteByID we defined earlier whenever a request is sent to the endpoint and simply returning the data as a response. The same is done for the stock symbol route, shown below.

// Get quotes by symbol
router.get('/s/:stsym', async (req, res) => {
    try {
        const data = await stQuoteBySym(req.params.stsym)
        res.json(data)
    } catch (err) {
        res.status(500).send(`Internal Server Error: ${err}`)
    }
})

Running the API

Now that the core functionality is complete, we can easily run the server by modifying the package.json file and editing the script.

...,
"scripts": {
    "start": "node server.js",
    "dev": "nodemon server.js"
},
...

To run the server, simply use the command npm run start or yarn start.

Source Code

This article presents a summary of how to write a scraper and expose it to a URL endpoint, thus if you'd like to see the full source code or modify and use this scraper for yourself, feel free to get clone this project from the Github repository.