I wrote some code that showed the monthly stats for one of my websites at the time, the stats were concerned with how often I updated the website. The line shows monthly updates and the bars show the average for each year...
Quite soon after this I separated the database from the graph. To begin with I believe the database parts were removed from the code and an array was passed in. The data did not change very often, so one version used cron to make XML files once a day then the XML was used to make the graph. Other versions used JSON. Here is a multi-line version of the graph...
At the same time as making the line graphs, I also made pie charts, bar charts and world heat maps.
Another use for the line graph code was to display sterling against other currencies. There are two Y-axes, CNY uses the red axis on the right and USD/EUR using the black axis on the left...
Up to this point each graph was similar but different. Things like combining line and bar charts, and changing axis for different data made each graph quite specific to its use case.
When I made a WordPress plugin it also used statistics and graphs. This time the WordPress plugin showed a specific metric, but it was general enough to be used on any WordPress blog. The plugin can show a line graph for tags, categories and any custom taxonomy the person has added...
While you can make the case that another pre-existing library should be used instead of creating one yourself. But, for a WordPress plugin, not having any dependencies worked really well.
The WordPress plugin, Post Volume Stats also uses pie charts and bar charts and is still available on the WordPress plugin repo.
Starting to keep the graph functionality separate from the database made it very straightforward to put the code into a composer package.
The shortdark/socket composer package was originally copy-pasted from somewhere it was being used with some slight tweaking to make it work. With the code being old and because it was originally being written for a specific task it had bugs and needed some improving.
In order to show it working, I've used stock market data. You can select one or more stocks and pass them into the graph in an array. You are able to look at one stock (with 50 day and 200 day moving averages), or you can compare up to 10 stocks on one graph.
You can change the start date of the data as far back as I have the data collected. The end date will always be the previous weekday unless the stock(s) being shown have ceased trading.
Apart from my time, two things cost money with this project: the server and the API. I wanted to only make an API call when I needed to. The data does not change over the weekend, so I do not make any API calls for the weekend days. Then there are several other checks the script goes through before we even hit the API. The list of stocks only includes active stocks that has a start date and not an end date, then it checks that we do not have the data already. So, we only hit the API when we're pretty sure we need to.
Now, that we're calculating moving averages for the stocks ourselves, when we collect the data we update the moving averages. Calculating two averages per stock should not create too much of a load, but even so I do not want to go attempting a calculation if it is not needed. Before I attempt to create an average I make some more checks so that I'm not wasting computing power.
An anomaly concerned with getting the data from the API was Reckitt Benckiser changing it's name 1. The issue here was that it also changed its Tradable Instrument Display Mnemonic ("TIDM"), or ticker, from "RB.L" to "RKT.L". Unfortunately, on the day the name changed the old ticker stopped working and the new ticker began. The API call identifies the stock from its ticker, so we had a problem. The two tickers are set up as two different stocks with the correct start and end dates. Once we have all the data the old ticker is made inactive, so it will not appear in the list of stocks to update. I'm sure this will happen again, so it's good to know there's a quick fix.
The data is just taken from the API and all the graph is doing is taking an array and displaying it. The hardest part of the project is processing the data into an array that will give us the graph we want. The pain we have gone to in only calling the API the minimum number of times should mean that the data is perfect, but it may not be. Whatever data we've got on the database we need to get the composer package an array it can make a graph out of, i.e. we have to present the array in the correct format for the graph to read it.
Creating the graph with a composer package ensures the final stage of the process is completely separate. It's on a separate GIT repo. Being able to modify both repos mimics a larger website where there may be different parts of the website may be completely separate to others.
The graph class itself is pretty simple, and I've tried to break it down into methods that are fairly small and well-named. I also wanted it to be as customizable as possible, so instead of hard-coding anything we are able to change the settings when we call the class. Other than the user defined settings there are some rules concerning the data.
There were four anomalies that arose during developing the package...
I'm going to discuss these anomalies and show with some screenshots to show how some bugs were fixed.
Most of the issues arose because the code was copy-pasted from old code and had a specific use case. It really didn't need a huge amount of untangling, there were a couple of places that mainly needed simplifying.
The NVDA stock had a 4:1 stock split on 2021/07/20 2. Below, is how this affected the data we were collecting from the API.
The easy way to fix this was to update all the stock data to match the new post-split data. Then the chart looked less crazy and showed the change in stock price more accurately.
The 50 day and 200day moving average data is missing in the new data. The moving averages came from the API and were not calculated.
Calculating the averages on-the-fly on every page load would be too much workload. Creating and storing the averages in the database would ensure that the graph would not load any slower. Which I eventually did.
Another change in this version is that I have made the week numbers across the top optional. As no-one is using this composer package but me I made the default not to show them. Ideally when making some functionality optional the default would be to make the previous functionality the default, so that people who are using the code do not update composer and wonder why something is missing.
I have seen a few different websites all give different values for 50 and 200 day moving averages. I'm not sure if there is one universally-recognised way to calculate them. My way is different to the previous values, but I have seen different versions that are different again.
There is an API that gives stock split information. If I trust the API I could deal with stock splits automatically in the future.
While I don't mind hitting an API once a day for a test project like this, I do not want the possibility of an API causing this project to do a lot of unecessary work without my knowledge. The safest thing would be for the API to flag up a possible stock split, then I could manually approve it.
New listings have no data before the IPO date. That just means that the earliest the single graph can show is the IPO date.
The graph composer package simply displays the data you send to it. It does not make decisions on what to do with different data.
When we display the data for a single stock the graph will only show the dates it receives in the array. When we're comparing a stock that has recently been launched to other older stocks, we need to make a decision which data we want to put in the array we send to the graph.
From a previous version of the package we have the Paypal and Square data from the start of the year, so we can compare the two like this...
But, if we want to compare Paypal and Square to the newly public Coinbase, we have a choice to make. Either we show the two lines in full and only begin the third line when it starts trading, or we simply start the graph when all three stocks are trading. I have decided that we have to start at the IPO date or later. This is controlled by the array we give the graph, if we wanted to display different data we change the array.
At the time I took these screenshots, the graph was able to squash the graph so that the full range of data it receives is displayed, but it did not stretch so that shorter date ranges fill the whole graph. I have now changed this, but at the time the issue was that there was a bug that meant the graph lines were crossing the Y-axis.
Originally gaps in the data caused problems in the graph. This was due to the original implementation not expecting any gaps at all. This new composer package was therefore not processing the missing data correctly so the graph was not getting displayed correctly.
We need each line to be continuous without gaps so filling in the public holiday data with the data from the previous working day. This was a simple and cheap solution because I did not need to make any API calls. Once the public holiday days had data the arrays going into the graphs did not have any gaps and so the graphs looked normal. This was a one-off script.
The next step was to modify the graph code to allow gaps in the graph lines. This means that if the API stops sending data for a period of time the overall graph will still be able to be displayed as normally as possible. Now, we are able to have graph lines that can be broken (start-stop-start-stop).
The day after Nvidia's stock split, on July 21st, 2021, Salesforce completed the acquisition of Slack 3.
Acquisitions or de-listings should be similar to new listings, and they are basically the same. As long as the data going into the graph is correct it should present a graph that stops on the final closing date. The value of the stock after acquisition can't be zero because then on the percentage graph you would get something like this...
So, the solution is to make sure the value after acquisition is null. Then, if the first (most recent) data point is null the graph must know how to deal with this.
In the same way that public holidays were not correctly dealt with if the data was missing, the graph package did not deal with the first data point being null very well either. Changing the data in the array from zero to null, and fixing the graph to deal with null values better meant that the graph looked less crazy...
Missing data points (null values) are not displayed but all the data points that are supplied get displayed.
My TODO list for this is mainly around calculating the widths of the branding and legends boxes automatically, without the user having to specify them. Other than that the composer package seems usable at the moment.
Making an SVG with a PHP composer package may not be the best thing to do for all use cases. I've talked about this SVG graph composer package with some history to show where the idea came from.
The issues that arose when using the package initially were sometimes down to the data, the processing of the data or the graph. We can't always control the quality of the data from the API, so by making the data as complete as possible when we collect it, then understanding what data may/may not exist we can create a graph that shows what we want to show. Many of the bugs were solved by simplifying and modernizing the code. These changes also make any future changes much easier.
Previous: Learning a new Coding Language