#LanguageModel — Bluesky Posts

Large Language Model Evaluation in Traditional Chinese Medicine for Stroke: Quantitative Benchmarking Study Background: The application of large language models (LLMs) in medicine is rapidly advancing. However, evaluating LLM capabilities in specialized domains such as traditional Chinese medicine (TCM), which possesses a unique theoretical system and cognitive framework, remains a sizable challenge. Objective: This study aimed to provide an empirical evaluation of different LLM types in the specialized domain of TCM stroke. Methods: The Traditional Chinese Medicine-Stroke Evaluation Dataset (TCM-SED), a 203-question benchmark, was systematically constructed. The dataset includes 3 paradigms (short-answer questions, multiple-choice questions, and essay questions) and covers multiple knowledge dimensions, including diagnosis, pattern differentiation and treatment, herbal formulas, acupuncture, interpretation of classical texts, and patient communication. Gold standard answers were established through a multiexpert cross-validation and consensus process. The TCM-SED was subsequently used to comprehensively test 2 representative LLM models: GPT-4o (a leading international general-purpose model) and DeepSeek-R1 (a large model primarily trained on Chinese corpora). Results: The test results revealed a differentiation in model capabilities across cognitive levels. In objective sections emphasizing precise knowledge recall, DeepSeek-R1 comprehensively outperformed GPT-4o, achieving an accuracy lead of more than 17% in the multiple-choice section (96/137, 70.1% vs 72/137, 52.6%, respectively). Conversely, in the essay section, which tested knowledge integration and complex reasoning, GPT-4o’s performance notably surpassed that of DeepSeek-R1. For instance, in the interpretation of classical texts category, GPT-4o achieved a scoring rate of 90.5% (181/200), far exceeding DeepSeek-R1 (147/200, 73.5%). Conclusions: This empirical study demonstrates that Chinese-centric models have a substantial advantage in static knowledge tasks within the TCM domain, whereas leading general-purpose models exhibit stronger dynamic reasoning and content generation capabilities. The TCM-SED, developed as the benchmark for this study, serves as an effective quantitative tool for evaluating and selecting appropriate LLMs for TCM scenarios. It also offers a valuable data foundation and a new research direction for future model optimization and alignment.

2 months ago

Claude Codeについての書籍 "実践Claude Code入門 | 技術評論社" https://gihyo.jp/book/2026/978-4-297-15354-0 #LanguageModel #book

0 0 0 0

EqualyzAI

@equalyzai.bsky.social

3 months ago

This week is another chance to equalize opportunity. To design for inclusion. To make sure no one is left behind by technology.

Welcome to the week. Let’s do today’s work with tomorrow in mind.

#EqualyzAI #NewWeek #MondayMotivation #LanguageModel

0 0 0 0

JMIR Publications

@jmirpub.bsky.social

3 months ago

JMIR Formative Res: Large Language Model Evaluation in Traditional Chinese Medicine for Stroke: Quantitative Benchmarking Study #TraditionalChineseMedicine #TCM #StrokeRecovery #LanguageModel #HealthcareInnovation

0 0 1 0

Azurro

@azurropl.bsky.social

4 months ago

Skąd biorą się dane, na których uczone są modele AI i dlaczego to one często decydują o jakości modeli?

Zapraszamy do przeczytania naszego nowego artykułu o datasetach, transparentności i etyce danych w AI - azurro.pl/skad-biora-s...

#innovation #ArtificialIntelligence #LLM #AI #languagemodel

1 0 0 0

Stuff I found Backup account

@stuffifound-backup.politicalmemes.org

5 months ago

The dumbest person you know is being told "you're absolutely right" by ChatGPT

Interesting how ChatGPT knows so much about things I know nothing about and is wrong about 70% of the time on topics I'm an expert in. #chatgpt #ai #artificialinteligence #googlegemini #microsoftcopilot #digitalera #chatbot #languagemodel

1 1 0 0

LLM Training Data Services for Fine-Tuning & RLHF Boost your AI development with LLM training data services tailored for fine-tuning, RLHF, annotation, and RAG. Get high-quality, domain-specific datasets.

5 months ago

AI Agentはデザインシステムを理解していないという話 "Storybook Design Systems with Agents RFC · storybookjs/ds-mcp-experiment-reshaped · Discussion #1" github.com/storybookjs/ds-mcp-exper... #LanguageModel

1 0 0 0

HitechBPO

@hitechbpo.bsky.social

5 months ago

Fuel Your LLM with High-Quality Training Data

Scale smarter. Train faster. Perform better.

Learn more: shorturl.at/BJZIA

#LLM #DataServices #Data #MachineLearning #GenerativeAI #TrainingData #DataAnnotation #LanguageModel #NLP

1 0 0 0

Habiledata

@habiledata.bsky.social

5 months ago

LLM Training Data Services for Fine-Tuning & RLHF Boost your AI development with LLM training data services tailored for fine-tuning, RLHF, annotation, and RAG. Get high-quality, domain-specific datasets.

Fuel Your LLM with High-Quality Training Data

Scale smarter. Train faster. Perform better.

Learn more: shorturl.at/BJZIA

#LLM #DataServices #Data #MachineLearning #GenerativeAI #TrainingData #DataAnnotation #LanguageModel #NLP

1 0 0 0

@garbatronic2005.bsky.social

5 months ago

LM StudioのAPIを使ったepubの翻訳ツール "sumik5/llm-translate" https://github.com/sumik5/llm-translate/tree/main #translate #LanguageModel

0 0 0 0

Garbatronic_2005

5 months ago

It's not AI. It's a Language Model.
#ItsNotAI #LanguageModel #PrecisionTechnology #StopConfusion #RealEngineering

1 0 0 0

Pure Tech

@puretech.news

5 months ago

How to Run a RAG Powered Language Model on Android With the Help of MediaPipe A couple of months ago, I gave a talk about running on device SLMs in apps. It was well received and refreshing to be able to give a talk where mobile apps can gain an advantage from the rise in language...

How to Run a RAG Powered Language Model on Android With the Help of MediaPipe #Technology #EmergingTechnologies #ArtificialIntelligence #LanguageModel #MediaPipe #AIOnAndroid

1 0 0 0

5 months ago

Latent Thought Modeling Improves Data Efficiency in LM Pretraining

A 1B-parameter language model boosted data efficiency via latent-thought inference, gaining improvements after three EM cycles without an external teacher model. Read more: getnews.me/latent-thought-modeling-... #languagemodel #latentthought

0 0 0 0

5 months ago

DiDi‑Instruct Boosts Language Generation Speed by Up to 64×

DiDi‑Instruct speeds language generation up to 64× and reaches a perplexity of 62.2 with just eight NFEs. Training time drops about twenty‑fold versus standard fine‑tuning. getnews.me/didi-instruct-boosts-lan... #didiinstruct #languagemodel #ai

0 0 0 0

JCLS

@jcls-io.bsky.social

5 months ago

The graph with green domesticity score dots shows a rising trendline in the 19th century.

How “domestic” is a #Victorian novel?
Guhr et al. fine-tune a #LanguageModel to detect implicit domestic spaces – rooms, gardens, even #ships – beyond obvious keywords like 'house' or 'home.' – A new way to read #19th-century #fiction through the lens of #space and study the rise of #domesticity.

1 0 0 0

5 months ago

Backdoor Detection for Language Models Faces Robustness Challenges

A new EMNLP paper (Sept 2025) finds backdoor detection drops when training intensity is either aggressive or very low, exposing limits of current tools. Read more: getnews.me/backdoor-detection-for-l... #backdoor #languagemodel #security

0 0 0 0

David H. Deans

@dhdeans.bsky.social

5 months ago

Training and running LLMs can cost millions and require massive AI computing infrastructure. SLMs, on the other hand, require significantly less computational power, allowing them to be trained and fine-tuned on a single GPU. buff.ly/uNwzK7r

#AI #LanguageModel #Research

1 0 0 0

5 months ago

Mechanistic Study Reduces Language Confusion in English‑Focused LLMs

Researchers identified a handful of neurons causing language switches in English‑centric LLMs; editing them cut confusion points on the Language Confusion Benchmark. Read more: getnews.me/mechanistic-study-reduce... #languagemodel #neuraledits

0 0 0 0

CreativeAct Technologies AI-powered solutions: Exploration into Emerging Technology Systems and real world applications.

5 months ago

Rethinking Linguistic Rules in AI Language Model Evaluation

A new paper urges moving past strict rule‑based tests, noting benchmarks like GLUE and SuperGLUE still favor binary grammaticality despite language’s gradient nature. Read more: getnews.me/rethinking-linguistic-ru... #languagemodel #evaluation

0 0 0 0

Jewelz.UFO

@jewelzufo.bsky.social

5 months ago

Creativeact.net

Try out our beta prompt enhancement agent today!

Input a prompt and receive a professional quality reusable template.

#llm #promotengineering #chatgpt #claude #ai #languagemodel

1 0 0 0

John Medardo V Vergara

@bigboarph.bsky.social

6 months ago

Can i have an opinion for qwen AI? Is it good? I think it's the most funny and exaggerated LLM out there but Is it reliable?
#qwen #ai #llm #languagemodel #question #artificialintelligence
#artificial
#chatgpt
#google

1 0 0 0

6 months ago

Copilot Code Review特定のファイルに対してのinstructionをapplyできるように "Copilot code review: Path-scoped custom instruction file support - GitHub Changelog" github.blog/changelog/2025-09-03-cop... #Github #LanguageModel

1 0 0 0

Azurro

@azurropl.bsky.social

7 months ago

Czym są benchmarki dla LLM i czy naprawdę mówią, który model jest „lepszy”? 🤖
Wyjaśniamy:
– co mierzą,
– które są najpopularniejsze,
– jakie mają ograniczenia.

Zapraszamy na bloga azurro.pl/jak-porownac...

#innovation #HelloWorld #ArtificialIntelligence #largelanguagemodels #AI #languagemodel

0 1 0 0