annaeus.bsky.social (@annaeus)

Quintiliānī Institūtiõnēs Ōrātōriae, IX.2.67: “Sed enim fīet, ut jūdex quaerat illud nescio quid ipse, quod fortasse nōn crēderet sī audīret, et eī, quod ā sē inventum existimat, crēdat.”

Sunt rēs quī numquam mūtant.

31.12.2025 01:26 👍 1 🔁 0 💬 0 📌 0

I don’t have an opinion about emojis in bios, but when I saw your “red flag” comment, I assumed you were making some kind of pun. And I’ll make another: I don’t see any red flags in that person’s bio.

30.12.2025 20:34 👍 0 🔁 0 💬 0 📌 0

Cuius Exemplum monstrat: “testāmentō quīdam iussit pōnī statuam auream hastam tenentem. Quaeritur, statua hastam tenens aurea esse dēbeat an hasta esse aurea in statuā alterīus māteriae?” (Ibid.)

27.10.2025 23:59 👍 0 🔁 0 💬 0 📌 0

Maximē mihi placet vidēre apud Quintiliānum dē genere ambiguitātis quae mē saepe vexat: “per collocātiōnem, ubi dubium est, quid quō referrī oporteat, ac frequentissimē, cum quidem medium est, cum utrimque possit trahī” (Inst. VI.9)

27.10.2025 23:59 👍 1 🔁 0 💬 1 📌 0

Nunc videō eum linguā Latīnā priscā scriptum esse.

23.10.2025 05:38 👍 1 🔁 0 💬 0 📌 0

Commentārium ēlectōrum recentis biduī legens locum quem mihi super vīrēs invēnī: sigmaticī futūrī verba atque aliās rēs grammaticās mihi ignōtās habet. Quod nōn omnīnō displicet prasertim quia hīs diēbus nōn saepe accidit.

23.10.2025 03:30 👍 2 🔁 0 💬 2 📌 0

Herī vesperī cēnāvī cum amīcō linguae Latīnae perītō. Duās vel trēs hõrās collocũtī sumus, omnīnō Latīnē. Jūcundum fuit.

02.08.2025 19:44 👍 4 🔁 0 💬 0 📌 0

I worked in Belfast for a while. One of the admins sent out a weekly trivia quiz. Shortly after I joined, he gave it an American theme. Coworkers asked me, “What do Americans call courgettes?” I said, “I don’t know. What’s a courgette?”

14.07.2025 16:46 👍 1 🔁 0 💬 0 📌 0

The error that gave me the most trouble today was in Quintilian book 5: “…, cum pardem ējēcissent”. I think it’s “patrem”, but Perseus says “pardem” is an adverb meaning “equal”, and the Latin Library does not appear to have this section in it at all.

01.06.2025 22:21 👍 1 🔁 0 💬 0 📌 0

I’ve been thinking about how future paleographers will talk about OCR: “The misspelling of rein for rem suggests this comes from a text that was originally digitized in the 1990s…”

01.06.2025 21:48 👍 0 🔁 0 💬 0 📌 0

Ōrātiō apud Quintillium quae mihi placet: Ut certa manus ūnō tēlō potest esse contenta, incerta plūra spargendā sunt, ut sit et fortūnātē locus.

15.05.2025 20:58 👍 2 🔁 0 💬 0 📌 0

I’ve actually heard some people argue that Nixon might have survived Watergate if he had managed the economy more responsibly.

07.04.2025 23:50 👍 0 🔁 0 💬 0 📌 0

I’ve long known the English word “imprimatur”, but today is the day I realized it comes directly from Latin.

03.03.2025 01:44 👍 2 🔁 0 💬 0 📌 0

As I said before, the validation views for the latest version of the model immediately looked more sensible than previous versions. It feels good to think I solved a general, fundamental data science issue while simultaneously being able to show clear business benefits to my boss.

16.02.2025 01:18 👍 0 🔁 0 💬 0 📌 0

And that means I could add the importance score for the two models to get a single measure of importance for the original variable. Since SHAP values are also additive, I could do the same thing for all my validation views.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

Another thing I did was to created a single importance score for these hybrid variables. For the variable selection phase, I used “total gain” in XGBoost instead of the default “average gain”. Totals can be meaningfully added together, unlike averages.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

This isn’t entirely new, but it’s not built into most open-source data science packages. I wrote some code this week to successfully implement this as part of an XGBoost model, and I could immediately see the benefits when I generated validation views for my model.

16.02.2025 01:18 👍 1 🔁 0 💬 1 📌 0

The model can use the numerical variable to understand how customers with calculated values behave and use the categorical variable to make any adjustments necessary to accurately represent the behavior of the other types of customers.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

The categorical variable will then record which category the customer falls in: “Regular value”, “True missing value”, “No loans”, “No loans in 12 months”, “No minimum payments”, etc.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

The numeric variable has the actual values for customers when it can be calculated and either a missing value or an imputed value for those customers for whom it cannot be calculated.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

This week what I did was to turn this single variable into two variables: a numeric variable and a categorical variable.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

This variable that initially sounded like a numeric variable, has categories of customers for which it cannot be calculated. It’s partly numeric and partly categorical. How do we handle this?

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

For some customers, that is a straightforward calculation: 100% or 66.7% or 125%. But for other customers it makes no sense. Some might not have any loans, some might not have any in the past 12 months, some might not have a minimum payment amount.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

Take an example variable: the ratio of a customer’s actual payments to their minimum required payments on all their loans in the past 12 months.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

All of this so far is stuff that you are likely to see in intro statistics and data science textbooks. One thing that rarely gets mentioned in textbooks is that some variables can be partially numeric and partially categorical. That’s what I was working on this week.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

There are lots of techniques for coding categorical variables, and some algorithms (like XGBoost) do a good job of handling most of this for us.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

But we have to be careful how we do this. If we’re not careful, the predictions of our model will be based more on how we coded our data rather than any actual patterns in the data itself.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

However, When we want to use categorical variables in statistical or machine learning models, we have to convert them to numbers somehow. Like “Red = 1, Orange = 2, etc”.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

Categorical variables are variables for which we can’t meaningfully use numbers. It doesn’t usually make sense to say “Red < Orange” or “Red - Orange = Blue” or “Red / Orange = Green”. Categorical variables do not have any inherent ordering or any numerical relationships between the categories.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

First, definitions: a numeric variable is a variable that takes on numeric values. For numeric variables, we can sensibly say things like “3 > 2” or “3 - 2 = 1” or “3 / 2 = 1.5”. It is relatively straightforward to incorporate these variables in statistical or machine learning models.

16.02.2025 01:18 👍 0 🔁 0 💬 1 📌 0

annaeus.bsky.social

Latest posts by annaeus.bsky.social @annaeus