Today was ... interesting. If you followed me for the past months over on the shitbird site, you might have seen a bunch of angry German words, lots of graphs, and the occassional news paper, radio, or TV snippet with yours truely. Let me explain.
In Austria, inflation is way above the EU average. There's no end in sight. This is especially true for basic needs like energy and food.
Our government stated in May that they'd build a food price database together with the big grocery chains. But..
the responsible minister claimed it's an immense task and will take til autumn. It will only include 16 product categories (think flour, milk,etc.). And it will only be updated once a week.
Given how Austria works, some corp close to the minister would have gotten the contract for a million on two to create a POS just enough so the minister can say "look, I did something!"
Well. I heard that and build a prototype for all products of the two biggest chains in 2 hours. The media picked it up...
Here's a selection of media coverage of the entire thing.
https://heisse-preise.io/media.html
It spread like wild fire and made the minister look like an idiot.
I took the thing down in fear of retaliation by the grocery chains. My plan: get a big NGO, news outlet or political party to host the thing and be a legal shield for the endevour.
Almost every NGO, media outlet and political party got in contzct with me (not the other way around). There were lots of promises and big words but zero action.
All these orgs only had their self-interest in mind. After two weeks of this bullshit, I figured I might as well gamble and put this thing up in my own name.
Surely the grocery chains won't sue me. The bad PR would easily outweigh whatever little inckme loss they'd suffer from a few hundred people using the site to find the cheapest product.
You see, I'm basically just crawling the stores online stores. Most of them have an API. I then normalize the data across the stores, and expose it.
The whole thing runs client-site. The server fetches the latest data from the stores once a day. All data fits into 5mb of gzipped JSON. Small enough for the client to do anything. The server just serves 8 static files. It can handle serve all of Austria easily and could be scaled trivially. It's just static files.
Being the idiot I am, I also made it open-source:
https://github.com/badlogic/heissepreise
And as usual, people flocked to it and contributed. In no time we had all stores in Austria in there.
Then we also got German and Slovenian stores. Then we normalized product categories across stores and added some light data science techniques to match the same or similar products across stores to make prices more easily comparable. You know, iterative improvements.
And then some anomymous guy in Twitter send me the data he crawled for the two biggest chains. Starting in 2017. And that's when thinga really got interesting...
@badlogic Rookie question: in the countries where the data has been scraped to date, is the data on the grocers' websites behind a login, or is it just available?
Here you have to log in to even see the products and the prices.
@sabrinadent they aren't behind a login. In Austria, most have a REST API to handle product searches on the store website. That's where we get our data from They expose their entire online product line-up which matches local stores for the most part.
@badlogic No APIs for any grocer in Ireland and no online shopping for Lidl or Aldi in Ireland. So even if we scraped data (slowly and quietly, obviously) the dataset would not be of value without those retailers.
Fucksocks.
Anyway, kudos on your heroic endeavours and outstanding results.