FOIMonkey's Avatar

FOIMonkey

@foimonkey

Recovering FOI enthusiast and polyglot with a developing machine learning habit.

16
Followers
6
Following
40
Posts
23.11.2024
Joined
Posts Following

Latest posts by FOIMonkey @foimonkey

I'm working on a project right now where I want to pull content from authority disclosure logs. I've combined a detection script for these failed redactions with one that finds hidden data in spreadsheets, which should hopefully reduce the risk of inadvertently spreading breach data.

26.02.2026 10:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This type of failed redaction is detectable programatically. Every PDF is basically a list of drawing instructions with a sequence number. This means you can detect text with a black box on top or a black box with dark text on top. If it is ambiguous (1% of cases) you can use OCR vs underlying text.

26.02.2026 10:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We've always described failed PDF redaction as "copying the text from behind the black boxes". I just ran a z-order analysis on 14,007 failed redactions and only 4% had the text underneath. The text has been IN FRONT of the box this whole time 🀯

26.02.2026 10:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Went out to eat tonight, and was sat at a table next to someone famous enough for me clock other people helf recognising them. Obviously I had no idea who they were πŸ™ˆ I eventually worked it out from clues about a charity. That's probably the right amount of famous to be, if you wanted such a thing.

25.02.2026 20:33 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Something @zarino.co.uk said at the end of the AI session at TICTeC last year stuck with me and continues to inspire. Currently doing exactly what was described to get a new tool online. I wanted to say that out loud, as people don't always get to know that they've motivated someone in that way :)

25.02.2026 15:16 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Screenshot of a form asking 'Does your request have a serious purpose or value?' with three radio button options: 'Yes - it's about a matter of genuine concern', 'I think so, but I can see how the authority might disagree', and 'Honestly, I'm not sure - I might have got carried away' with the third option selected.

Screenshot of a form asking 'Does your request have a serious purpose or value?' with three radio button options: 'Yes - it's about a matter of genuine concern', 'I think so, but I can see how the authority might disagree', and 'Honestly, I'm not sure - I might have got carried away' with the third option selected.

I'm building a refusal advice tool for FOISA requesters and you can tell that I'm an eternal optimist.

22.02.2026 01:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think I'm going to stick with ones that are formally required, as the argument/outcome extraction is already per exemption, so these can be resurfaced that way. That was my original intent for it, but this data seems useful still, so worth pausing to think about.

05.02.2026 11:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Now I need to decide if I want to use that for only formal required tests, or to note if the public interest was considered at all. Section 38(1)(b) sometimes uses public interest like arguments when considering the rights and freedoms of the data subject etc, but that's probably a step too far.

05.02.2026 11:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Every time I think it's safe to use a boolean, the edge cases come and say hi. 36(2) of FOISA is absolute, but the public interest can decide whether a breach is actionable. The extraction logic captures the public interest arguments, but validation says it's absolute, so pit_conducted should be 0

05.02.2026 11:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This has got me wondering if the picture is the same for the rest of the UK (a side quest for later perhaps). I found that the commissioner was more likely to find that the public interest favoured disclosure in EIR cases vs FOISA, which was in line with what I was expecting.

03.02.2026 12:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I’ve extracted and analysed all the Public Interest arguments made in every decision ever issued by the Scottish Information Commissioner. It's completely changed my understanding of how PITs work in practice. Looking at the 25 most common "winning" factors, there were some I'd not thought of.

03.02.2026 12:31 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

2025 maybe?

02.01.2026 14:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It might actually be worse, because I edited things after the cut-off date. Only 2,477 of them are now defunct.

28.11.2025 12:11 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

TIL that over the years, I have added or updated 34,724 unique public authorities on WhatDoTheyKnow. That's 74% of the total. It's all my fault πŸ˜…

28.11.2025 12:10 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The decision by a number of local councils to run adverts on their websites didn't quite sit right with me. I couldn't quite work out why until I saw an advert for a credit card on the crisis loan page.

14.11.2025 12:11 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

When I find failed redactions or accidental releases of PII in FOI responses I will usually notify the authority. The response is mixed, but far too often there is just silence. Often the only way I know they've got my message is by seeing if the file has disappeared. You'd think they'd want to know

22.09.2025 11:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Microsoft seems to have pulled the larger vibevoice TTS model from huggingface, and the github repo 404s github.com/microsoft/Vi.... It's not been out for long, but I can't be alone in having both downloaded and it's MIT licensed, so there is nothing to stop mirrors. I wonder what the issue is? πŸ€”

04.09.2025 09:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I suspect the bigger issue will be with AI use that's off the books. It must be tempting to use your own account if you knew you'd get a better response that way. The way copilot is built into windows makes that easy enough to do by mistake anyway. There's a real gap for a proper privacy aware tool.

28.08.2025 16:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Of course it's still linked to my account, but the price of 10m free tokens a day is letting them train on what is sent. I get to approve the masked version before it goes, and it adds about 10 seconds on. Don't know if there is a commercial tool - big corporate seems to just trust copilot/azure.

28.08.2025 15:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I built an anonymisation layer to sit between me & the api. I wouldn't trust it with anything really sensitive, but getting an llm running locally to detect things to mask & paraphrase was pretty easy. It stores the keys, so unmasks on the return. Big corporate seems to just trust co-pilot/Azure.

28.08.2025 15:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The SSD drive of shame has hit 2,752,077 files. Of course I haven't plugged in a new one rather than face clearing it πŸ˜…

27.08.2025 12:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
HMC83/request_writer_smol Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

As a bit of fun, I used some of the WDTK keywords that I created last week to create synthetic FOI requests using mistral-small. I then finetuned SmolLM2-360M-Instruct on those outputs to generate requests from 3 keywords and the authority name: huggingface.co/HMC83/reques...

18.08.2025 12:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I've uploaded descriptive keywords for over 1 million public FOI requests. I have left it at with request_id, keywords and the name of the public authority for now: github.com/FOIMonkey/fo...

15.08.2025 12:21 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

As a side effect of this, I have generated keywords for over 1 million FOI requests. Figuring out how to do that well in the most lightweight way was a journey in and of itself. I've not looked yet, but combined with authority and outcome data, it should be possible to spot some interesting trends.

14.08.2025 13:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Turns out you don't need to read an FOI response to start to be able to guess the outcome. I trained a TF-IDF classifier with a 73% macro F1-score in predicting success using just 3 keywords about the request and metadata. Adding the full request text hits 76% & a snippet of the response email 84%.

14.08.2025 13:43 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
12 hour Public Transport challenge Objective Travel on as many different types of public transport as possible within a 12β€―hour window, starting and finishing in the same location.

I wrote up some quick notes on yesterday's journey to nowhere:
foimonkey.github.io/posts/12-hou...

07.08.2025 14:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think it may be possible to do more. Once the new electric hydrofoils reach the Solent, that's certainly going to be true.

Best: Hovercraft, by a margin
Worst: Cable Car

06.08.2025 17:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Made it back to Cowes in 11 hours 39 minutes. Taking the floating bridge x 2, a double decker bus, a Hovercraft, a single decker bus, a tram, the overground, DLR X 2 (got on the wrong train), the cable car, a catamaran, the underground, an automatic people mover, 3 x trains, and the vehicle ferry.

06.08.2025 17:26 πŸ‘ 0 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image

Cable car βœ…
Catamaran βœ…
Underground βœ…

Just the Car ferry back to the Island to go now.

06.08.2025 13:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

βœ… Cable car

06.08.2025 11:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0