Erkan Karabulut (@erkankarabulut)

📊 Results (3/3)

TFM-learned rules are more generalizable to unseen data in comparison to algorithmic or neurosymbolic methods.

Co-authored with @dfdazac.bsky.social, @p-groth.bsky.social, Martijn C Schut, and @vdegeler.bsky.social.

🧵7/7

19.02.2026 10:30 👍 0 🔁 0 💬 0 📌 0

📊 Results (2/3)

TFMs can scale up to mid-size tables (~50 columns, ~10.000 rows) under a minute on affordable resources for association rule learning. However, on larger datasets, neurosymbolic methods (PyAerial) are still the only scalable approach.

🧵6/7

19.02.2026 10:30 👍 0 🔁 0 💬 1 📌 0

📊 Results (1/3)

TFMs with our TabProbe can address both rule explosion and remain robust in low data scenarios in terms of statistical rule quality as well as in interpretable rule-based classification tasks.

🧵5/7

19.02.2026 10:30 👍 0 🔁 0 💬 1 📌 0

📖 The research gap (2/2)

Neurosymbolic methods, such as our PyAerial, can address rule explosion and scale to larger datasets. However, neurosymbolic methods suffer from degraded performance in low-data scenarios.

🧵5/7

19.02.2026 10:30 👍 0 🔁 0 💬 1 📌 0

📖 The research gap (1/2)

Algorithmic rule miners, without an effective search space reduction (which requires heuristics over the data), lead to rule explosion, which is an excessive number of rules that are hard to interpret and post-process.

🧵4/7

19.02.2026 10:30 👍 1 🔁 0 💬 1 📌 0

❓What's new? (2/2)

TabProbe - We instantiated our framework with TFMs with a classification objective (TabPFNv2.5, TabICL, and TabDPT), and showed that TFMs can be re-purposed to learn association rules out-of-the-box with no additional training and without frequent itemset mining!

🧵3/7

19.02.2026 10:30 👍 0 🔁 0 💬 1 📌 0

📜 Preprint: arxiv.org/pdf/2602.146...
🐍 Code: github.com/DiTEC-projec...

❓What's new? (1/2)

A model-agnostic rule learning framework - We show that any conditional probabilistic model that supports querying for features conditioned on others can be re-purposed to learn association rules.

🧵2/7

19.02.2026 10:30 👍 0 🔁 0 💬 1 📌 0

Tabular foundation models can learn association rules out of the box!

For the non-expert:
If you work with tables and want to discover interesting patterns, tabular foundation models (TFMs) can quickly find significant, interesting, and generalizable patterns with no additional training.

🧵1/7

19.02.2026 10:30 👍 0 🔁 1 💬 1 📌 0

No. 2 #BJGPTop10 AI in lung cancer detection: AI algorithm identifies risk of lung cancer 4 months earlier doi.org/10.3399/BJGP...

#LungCancer #CancerResearch #AIHealthcare #DigitalHealth #EarlyDetection #MedicalInnovation #ArtificialIntelligence #PrimaryCare #BJGP #research

26.01.2026 10:15 👍 1 🔁 2 💬 0 📌 0

TRL Seminar | TRL Lab

Join the TRL Seminar tomorrow for talks on query ambiguity in open-domain data analysis @daniel-gomm.bsky.social, the SQaLE dataset for training text-to-SQL models @cowolff.bsky.social, and pattern/rule inference from tabular data @erkankarabulut.bsky.social.

Info: trl-lab.github.io/trl-seminar/

22.01.2026 17:09 👍 4 🔁 3 💬 1 📌 0

PyAerial documentation is now much more comprehensive and has a lot of examples: pyaerial.readthedocs.io/en/latest/in...

If you are interested in interpretable machine learning or knowledge discovery, check out PyAerial!

14.11.2025 14:34 👍 0 🔁 0 💬 0 📌 0

UvA Data Science Center Seminar: Scalable Knowledge Discovery A walkthrough of scalable knowledge discovery from tabular data using neurosymbolic methods, covering association rule mining, its challenges, and neurosymbolic solutions with Aerial.

Yesterday, I gave a seminar on Scalable Knowledge Discovery with Neurosymbolic Rule Learning at the University of Amsterdam’s Data Science Center.

Here’s a blog post with code samples and experiments on knowledge discovery from tabular data:

📄 erkankarabulut.github.io/blog/uva-dsc....

12.11.2025 09:51 👍 0 🔁 0 💬 0 📌 0

Another question came to my mind about this definition: can such consciousness be transferred to a new body?

Or if one’s body were cryopreserved, would it retain the same consciousness—would it still be the same person?

Practically, we’d want consciousness to be transferable.

26.10.2025 10:29 👍 0 🔁 0 💬 0 📌 0

The more you are aware (I guess of yourself, others, and the connections), the higher your consciousness level is.

So a baby (or an AI), over time, learns more about itself, others, and its surroundings, and by doing so gets more conscious.

26.10.2025 10:29 👍 0 🔁 0 💬 1 📌 0

One particularly interesting view I learned was from Psychologist Markus Ofner. Summarizing with my own words:

Consciousness is in the air and in us. We don't own it, but we get conscious. We are all individuals, but also part of a whole, all interconnected.

26.10.2025 10:29 👍 0 🔁 0 💬 1 📌 0

Similar to the question of "do numbers (or math) exist in nature or did we invent it?", does consciousness exist in nature or should we invent it?

Assuming that we can't define it, hence understand what it is, can't we come up with a definition that works better for us, similar to math?

26.10.2025 10:02 👍 0 🔁 0 💬 1 📌 0

Why is defining consciousness not trivial?

I am attending the ECAI2025 workshop ACAI - Awakening Consciousness in Artificial Intelligence. And hearing different sound and consistent, at first, definitions or assumptions around consciousness made me think of this question.

26.10.2025 10:02 👍 1 🔁 0 💬 1 📌 0

If you're attending and interested in knowledge discovery, interpretable machine learning, or XAI, say hi!

📜 arxiv.org/pdf/2509.20113
🐍 tinyurl.com/3z8cmuhw
🐍 Python Library: github.com/DiTEC-projec...

Co-authored with @dfdazac.bsky.social, @p-groth.bsky.social, and @vdegeler.bsky.social.

🧵2/2

22.10.2025 14:25 👍 0 🔁 0 💬 0 📌 0

Discovering Association Rules in High-Dimensional Small Tabular Data

I'm excited to present our new method for enhancing knowledge discovery from tabular data using tabular foundation models at the #ECAI2025 conference workshops (ANSyA) this Sunday, October 26th.

See our paper, Discovering Association Rules in High-Dimensional Small Tabular Data, below 👇.

🧵1/2

22.10.2025 14:25 👍 1 🔁 1 💬 1 📌 0

🧠 Neurosymbolic rule learning methods start a paradigm shift in knowledge discovery by allowing us to utilize prior knowledge while discovering new knowledge!

Together with @dfdazac.bsky.social, @p-groth.bsky.social, and @vdegeler.bsky.social.

🧵8/8

26.09.2025 15:09 👍 1 🔁 0 💬 0 📌 0

📊 On 5 real-world gene expression datasets with 18K+ columns and <100 rows, both methods led to significantly higher quality association rules in terms of confidence and association strength, with limited increase in execution time (1.2 - 2 times at most).

🧵7/8

26.09.2025 15:09 👍 0 🔁 0 💬 1 📌 0

1️⃣ Aerial+WI. Weight initialization of Aerial+ based on tabular data embeddings from a foundation model, using a projection encoder.

2️⃣ Aerial+DL. Tabular embeddings are aligned with Aerial+'s reconstructions via a projection encoder and joint loss, ensuring a better semantic column alignment.

🧵6/8

26.09.2025 15:09 👍 1 🔁 0 💬 1 📌 0

💡In the scope of knowledge discovery via rule learning, we adapted and evaluated two transfer learning methods that utilize table embeddings from a tabular foundation model, TabPFN.

🧵5/8

26.09.2025 15:09 👍 0 🔁 0 💬 1 📌 0

🔍 Therefore, we introduce the problem of discovering association rules in high-dimensional small tabular data for the first time.

📗Tabular foundation models have addressed this issue by pre-training on large datasets and transfer learning to small datasets for predictive tasks.

🧵4/8

26.09.2025 15:09 👍 0 🔁 0 💬 1 📌 0

❓However, Neurosymbolic methods inherit limitations of neural networks into rule learning, particularly reduced performance in low-data regimes. In rule learning, this translates to not being able to capture high-quality patterns in the data.

🧵3/8

26.09.2025 15:09 👍 0 🔁 0 💬 1 📌 0

🐍 Library: github.com/DiTEC-projec...

📊 We show that knowledge discovery from high-dimensional tables, as in gene expression datasets (~18K columns), is scalable with Neurosymbolic rule learning, Aerial+, a method we have proposed earlier (arxiv.org/pdf/2504.19354, presented at NeSy 2025).

🧵2/8

26.09.2025 15:09 👍 0 🔁 0 💬 1 📌 0

LinkedIn This link will take you to a page that’s not on LinkedIn

Tabular foundation models can boost knowledge discovery from high-dimensional tabular data with few instances (e.g., gene expression or rare disease data, 1K+ columns and <100 rows)!

📄 Short Paper (Accepted at ECAI 2025 workshops, ANSyA): arxiv.org/pdf/2509.20113
🐍 Code: tinyurl.com/3z8cmuhw

🧵1/8

26.09.2025 15:09 👍 3 🔁 1 💬 1 📌 0

PyAerial runs 1–2 orders of magnitude faster than major rule mining libraries in C/C++, R, and Python.

Using an under-complete Autoencoder, it avoids non-informative patterns (rule explosion), captures most significant associations, higher confidence, stronger links, and full coverage.

🧵8/8

14.09.2025 12:21 👍 0 🔁 0 💬 0 📌 0

It can also be integrated with rule visualization software(s), such as NiaARM.

(coming out soon) PyAerial supports transfer learning from a tabular foundation model (such as TabPFN), e.g., by re-using model weights, or semantical alignment to a given set of table embeddings.

🧵7/8

14.09.2025 12:21 👍 1 🔁 0 💬 1 📌 0

Rules of different forms, e.g., classification rules, item constraints, can be learned as part of the exposed Aerial+ interfaces.

PyAerial can run on parallel threads and on a GPU, and scales on high-dimensional tables (1K+ columns).

🧵6/8

14.09.2025 12:21 👍 0 🔁 0 💬 1 📌 0

Erkan Karabulut

Latest posts by Erkan Karabulut @erkankarabulut