March 19, 2025

ikayaniaamirshahzad@gmail.com

Changing Forecasts for Python Questions on Stack Overflow


[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

I recently conducted a small time series workshop session for AI+ training hosted by ODSC. It went really well, and I’d be happy to offer longer interactive workshops going forward (please reach out if your team would like one!).

One of the examples I shared was derived from the following graph presented in The Incredible Growth of Python on Stack Overflow.

UntitledImageUntitledImage

In this graph the x-axis is the date (by month) the y-axis is what percent of Stack Overflow question views are in each of the labeled curves. Later released graphs seem to be question formation, not views. All dates prior to September 2017 have happened when the graph was prepared, and everything past this date is a projection into the future.

The question was: how reasonable was this late 2017 prediction? The answer was: quite reasonable, good methodology (seasonal with LOESS), reasonable prediction duration (shorter than the training period), and related future data ended up matching the prediction when it arrived.

UntitledImageUntitledImage

In my talk I also mentioned fitting a more radical model, such as the Bass product diffusion model. As I have mentioned before, the Bass model is radical, in that it assumes all things end (and often quite soon, with or without evidence).

I’ve played with a few fitting methodologies, and settled on using Stan to do a full Bayesian inference on the model parameters from past data. This appears to be less biased than the fit and non-linear transform methods I was using before. The idea from Sanjiv Ranjan Das, Data Science: Theories, Models, Algorithms, and Analytics to fit the relation between events and cumulative events is critical. The various Bass curves are so self-similar that curve fitting through the time oriented dots does not strongly constrain the inferred parameters.

This let me prepare the following graph. In this graph each curve is a model fit on data before a given date. And all of the different predictions are different lines. In this case, even the Bass model predicts future growth until observations near the ChatGPT introduction become available.

Unnamed chunk 4 1Unnamed chunk 4 1

This isn’t a perfect fit. But it is interesting and evocative how the later models do start to digest the changing trends. One doesn’t have to fully fit a Bass model to known growth can stop. It is enough to know there are common trends (like the Bass model or the S-curve) that look like exponential growth, until they are not.

R code and data to produce these graphs is here.





Source link

Leave a Comment