Home New Trending Search
About Privacy Terms
#
#LanguageModel
Posts tagged #LanguageModel on Bluesky
Preview
Sarvam AI 오픈소스 모델이 혁신적인 이유 2가지 - 기술 덕후 한가닥 2026년 2월 18일 인도 뉴델리에서 열린 AI 서밋에서 Sarvam AI가 새로운 거대 언어 모델을 공개하며 시장의 이목을 집중시켰습니다. 이번 발표는 고비용의 미국 및 중국산 AI 시스템에 의존하지 않고 독자적인 기술력을 확보하려는 인도의 강력한 의지가 담겨 있습니다. 효율성과

Sarvam AI 오픈소스 모델이 혁신적인 이유 2가지

https://bit.ly/4kJp2l4

#SarvamAI #OpenSourceAI #IndianAI #AIInnovation #LanguageModel #TechLeadership #GlobalAI

0 0 0 0
"As illustrated in Fig 1, the system follows a loop that mirrors how clinicians gather evidence, generate a provisional explanation, and reassess whether their reasoning is sufficiently supported. At each iteration, the model retrieves context passages, produces an answer and r ationale, then evaluates that rationale through a scoring module. If parts of the rationale are unsupported or contradictory, the system reformulates the query to target missing information and repeats retrieval and generation. This reflection cycle allows Self MedRAG to progressively strengthen factual grounding while ensuring that the final answer and rationale remain clinically coherent, and evidence based."

"As illustrated in Fig 1, the system follows a loop that mirrors how clinicians gather evidence, generate a provisional explanation, and reassess whether their reasoning is sufficiently supported. At each iteration, the model retrieves context passages, produces an answer and r ationale, then evaluates that rationale through a scoring module. If parts of the rationale are unsupported or contradictory, the system reformulates the query to target missing information and repeats retrieval and generation. This reflection cycle allows Self MedRAG to progressively strengthen factual grounding while ensuring that the final answer and rationale remain clinically coherent, and evidence based."

"Medical question answering (QA) benchmarks evaluate a model’s ability to generate clinically reliable, evidence-grounded responses. Widely used datasets include MedQA for diagnostic reasoning from medical exams [and] PubMedQA for evidence-based biomedical inference over research abstracts...."

"Medical question answering (QA) benchmarks evaluate a model’s ability to generate clinically reliable, evidence-grounded responses. Widely used datasets include MedQA for diagnostic reasoning from medical exams [and] PubMedQA for evidence-based biomedical inference over research abstracts...."

"The results presented in Table 1. demonstrate the performance trends across retrieval strategies and critic configurations. For Base RAG methods, hybrid retrieval using the combination of both BM25 and Contriever using Reciprocal Rank Fusion (RRF) achieves substantially stronger performan ce than any single retriever on both PubMedQA and MedQA dataset. While BM25 and Contriever individually reach accuracies of 66.80% and 67.90% on PubMedQA, their fusion through RRF slightly increases their performance accuracy to 69.10 %. The effect is more pronounced on MedQA, where the method introduces a large jump of performance from 41.74% (BM25 alone) and 43.30% (Contriever alone) to 80.00% accuracy. This dramatic improvement proves that the fused retrieval using RRF provides broader coverage of clinically relevant evidence by integrating both high precision lexical signals from BM25 and semantically aligned passages recovered by Contriever."

"The results presented in Table 1. demonstrate the performance trends across retrieval strategies and critic configurations. For Base RAG methods, hybrid retrieval using the combination of both BM25 and Contriever using Reciprocal Rank Fusion (RRF) achieves substantially stronger performan ce than any single retriever on both PubMedQA and MedQA dataset. While BM25 and Contriever individually reach accuracies of 66.80% and 67.90% on PubMedQA, their fusion through RRF slightly increases their performance accuracy to 69.10 %. The effect is more pronounced on MedQA, where the method introduces a large jump of performance from 41.74% (BM25 alone) and 43.30% (Contriever alone) to 80.00% accuracy. This dramatic improvement proves that the fused retrieval using RRF provides broader coverage of clinically relevant evidence by integrating both high precision lexical signals from BM25 and semantically aligned passages recovered by Contriever."

"...both critics surpass the non -critic, non-itera baseline, demonstrating that the improve in performance is due to the iteration mechanism itself, rather than the specific critic choice."

"Fig 3 details the cumulative impact of the iterative process done on Self Reflective module for both accuracy and F1 scores. We observe a substantial performance leap between the first and second iterations across both datasets, with MedQA accuracy rising from 79.3% to 86.1% and PubMedQA from 69.8% to 83.3%. The upward trend confirms the potential performance gains done by the Self-Reflective module in identifying and correcting unsupported rationales. Extending the process to a third iteration, however, seems to result in a diminishing return, with performance either plateauing for PubMedQA or slightly declining for MedQA."

"...both critics surpass the non -critic, non-itera baseline, demonstrating that the improve in performance is due to the iteration mechanism itself, rather than the specific critic choice." "Fig 3 details the cumulative impact of the iterative process done on Self Reflective module for both accuracy and F1 scores. We observe a substantial performance leap between the first and second iterations across both datasets, with MedQA accuracy rising from 79.3% to 86.1% and PubMedQA from 69.8% to 83.3%. The upward trend confirms the potential performance gains done by the Self-Reflective module in identifying and correcting unsupported rationales. Extending the process to a third iteration, however, seems to result in a diminishing return, with performance either plateauing for PubMedQA or slightly declining for MedQA."

Can Socratic reflection improve #AI answers to medical questions?

Adding a critic to a #languageModel pipeline improved performance on two measures of medical question-answering.

The improvement didn't depend on the critic's model.

doi.org/10.48550/arX...

#tech #medicine #edu

2 0 1 0
Preview
Evaluating Spanish Translations of Emergency Department Discharge Instructions by a Large Language Model: Tool Validation and Reliability Study When given a sample of 100 emergency department discharge instructions, Claude Sonnet, a large language model, produced accurate Spanish translations as evaluated by Spanish-speaking physicians and medical interpreters.

JMIR Formative Res: Evaluating Spanish Translations of Emergency Department Discharge Instructions by a Large Language Model: Tool Validation and Reliability Study #SpanishTranslations #EmergencyMedicine #HealthcareResearch #LanguageModel #MedicalInterpreting

0 0 0 0

Claude Codeについての書籍 "実践Claude Code入門 | 技術評論社" https://gihyo.jp/book/2026/978-4-297-15354-0 #LanguageModel #book

0 0 0 0

This week is another chance to equalize opportunity. To design for inclusion. To make sure no one is left behind by technology.

Welcome to the week. Let’s do today’s work with tomorrow in mind.

#EqualyzAI #NewWeek #MondayMotivation #LanguageModel

0 0 0 0
Preview
Large Language Model Evaluation in Traditional Chinese Medicine for Stroke: Quantitative Benchmarking Study Background: The application of large language models (LLMs) in medicine is rapidly advancing. However, evaluating LLM capabilities in specialized domains such as traditional Chinese medicine (TCM), which possesses a unique theoretical system and cognitive framework, remains a sizable challenge. Objective: This study aimed to provide an empirical evaluation of different LLM types in the specialized domain of TCM stroke. Methods: The Traditional Chinese Medicine-Stroke Evaluation Dataset (TCM-SED), a 203-question benchmark, was systematically constructed. The dataset includes 3 paradigms (short-answer questions, multiple-choice questions, and essay questions) and covers multiple knowledge dimensions, including diagnosis, pattern differentiation and treatment, herbal formulas, acupuncture, interpretation of classical texts, and patient communication. Gold standard answers were established through a multiexpert cross-validation and consensus process. The TCM-SED was subsequently used to comprehensively test 2 representative LLM models: GPT-4o (a leading international general-purpose model) and DeepSeek-R1 (a large model primarily trained on Chinese corpora). Results: The test results revealed a differentiation in model capabilities across cognitive levels. In objective sections emphasizing precise knowledge recall, DeepSeek-R1 comprehensively outperformed GPT-4o, achieving an accuracy lead of more than 17% in the multiple-choice section (96/137, 70.1% vs 72/137, 52.6%, respectively). Conversely, in the essay section, which tested knowledge integration and complex reasoning, GPT-4o’s performance notably surpassed that of DeepSeek-R1. For instance, in the interpretation of classical texts category, GPT-4o achieved a scoring rate of 90.5% (181/200), far exceeding DeepSeek-R1 (147/200, 73.5%). Conclusions: This empirical study demonstrates that Chinese-centric models have a substantial advantage in static knowledge tasks within the TCM domain, whereas leading general-purpose models exhibit stronger dynamic reasoning and content generation capabilities. The TCM-SED, developed as the benchmark for this study, serves as an effective quantitative tool for evaluating and selecting appropriate LLMs for TCM scenarios. It also offers a valuable data foundation and a new research direction for future model optimization and alignment.

JMIR Formative Res: Large Language Model Evaluation in Traditional Chinese Medicine for Stroke: Quantitative Benchmarking Study #TraditionalChineseMedicine #TCM #StrokeRecovery #LanguageModel #HealthcareInnovation

0 0 1 0
Post image

Skąd biorą się dane, na których uczone są modele AI i dlaczego to one często decydują o jakości modeli?

Zapraszamy do przeczytania naszego nowego artykułu o datasetach, transparentności i etyce danych w AI - azurro.pl/skad-biora-s...

#innovation #ArtificialIntelligence #LLM #AI #languagemodel

1 0 0 0
The dumbest person you know is being told "you're absolutely right" by ChatGPT

The dumbest person you know is being told "you're absolutely right" by ChatGPT

Interesting how ChatGPT knows so much about things I know nothing about and is wrong about 70% of the time on topics I'm an expert in. #chatgpt #ai #artificialinteligence #googlegemini #microsoftcopilot #digitalera #chatbot #languagemodel

1 1 0 0

AI Agentはデザインシステムを理解していないという話 "Storybook Design Systems with Agents RFC · storybookjs/ds-mcp-experiment-reshaped · Discussion #1" github.com/storybookjs/ds-mcp-exper... #LanguageModel

1 0 0 0
Preview
LLM Training Data Services for Fine-Tuning & RLHF Boost your AI development with LLM training data services tailored for fine-tuning, RLHF, annotation, and RAG. Get high-quality, domain-specific datasets.

Fuel Your LLM with High-Quality Training Data

Scale smarter. Train faster. Perform better.

Learn more: shorturl.at/BJZIA

#LLM #DataServices #Data #MachineLearning #GenerativeAI #TrainingData #DataAnnotation #LanguageModel #NLP

1 0 0 0
Preview
LLM Training Data Services for Fine-Tuning & RLHF Boost your AI development with LLM training data services tailored for fine-tuning, RLHF, annotation, and RAG. Get high-quality, domain-specific datasets.

Fuel Your LLM with High-Quality Training Data

Scale smarter. Train faster. Perform better.

Learn more: shorturl.at/BJZIA

#LLM #DataServices #Data #MachineLearning #GenerativeAI #TrainingData #DataAnnotation #LanguageModel #NLP

1 0 0 0

LM StudioのAPIを使ったepubの翻訳ツール "sumik5/llm-translate" https://github.com/sumik5/llm-translate/tree/main #translate #LanguageModel

0 0 0 0
Video thumbnail

It's not AI. It's a Language Model.
#ItsNotAI #LanguageModel #PrecisionTechnology #StopConfusion #RealEngineering

1 0 0 0
Preview
How to Run a RAG Powered Language Model on Android With the Help of MediaPipe A couple of months ago, I gave a talk about running on device SLMs in apps. It was well received and refreshing to be able to give a talk where mobile apps can gain an advantage from the rise in language...

How to Run a RAG Powered Language Model on Android With the Help of MediaPipe #Technology #EmergingTechnologies #ArtificialIntelligence #LanguageModel #MediaPipe #AIOnAndroid

1 0 0 0
Latent Thought Modeling Improves Data Efficiency in LM Pretraining

Latent Thought Modeling Improves Data Efficiency in LM Pretraining

A 1B-parameter language model boosted data efficiency via latent-thought inference, gaining improvements after three EM cycles without an external teacher model. Read more: getnews.me/latent-thought-modeling-... #languagemodel #latentthought

0 0 0 0
DiDi‑Instruct Boosts Language Generation Speed by Up to 64×

DiDi‑Instruct Boosts Language Generation Speed by Up to 64×

DiDi‑Instruct speeds language generation up to 64× and reaches a perplexity of 62.2 with just eight NFEs. Training time drops about twenty‑fold versus standard fine‑tuning. getnews.me/didi-instruct-boosts-lan... #didiinstruct #languagemodel #ai

0 0 0 0
The graph with green domesticity score dots shows a rising trendline in the 19th century.

The graph with green domesticity score dots shows a rising trendline in the 19th century.

How “domestic” is a #Victorian novel?
Guhr et al. fine-tune a #LanguageModel to detect implicit domestic spaces – rooms, gardens, even #ships – beyond obvious keywords like 'house' or 'home.' – A new way to read #19th-century #fiction through the lens of #space and study the rise of #domesticity.

1 0 0 0
Backdoor Detection for Language Models Faces Robustness Challenges

Backdoor Detection for Language Models Faces Robustness Challenges

A new EMNLP paper (Sept 2025) finds backdoor detection drops when training intensity is either aggressive or very low, exposing limits of current tools. Read more: getnews.me/backdoor-detection-for-l... #backdoor #languagemodel #security

0 0 0 0
Post image

Training and running LLMs can cost millions and require massive AI computing infrastructure. SLMs, on the other hand, require significantly less computational power, allowing them to be trained and fine-tuned on a single GPU. buff.ly/uNwzK7r

#AI #LanguageModel #Research

1 0 0 0
Mechanistic Study Reduces Language Confusion in English‑Focused LLMs

Mechanistic Study Reduces Language Confusion in English‑Focused LLMs

Researchers identified a handful of neurons causing language switches in English‑centric LLMs; editing them cut confusion points on the Language Confusion Benchmark. Read more: getnews.me/mechanistic-study-reduce... #languagemodel #neuraledits

0 0 0 0
Rethinking Linguistic Rules in AI Language Model Evaluation

Rethinking Linguistic Rules in AI Language Model Evaluation

A new paper urges moving past strict rule‑based tests, noting benchmarks like GLUE and SuperGLUE still favor binary grammaticality despite language’s gradient nature. Read more: getnews.me/rethinking-linguistic-ru... #languagemodel #evaluation

0 0 0 0
Preview
CreativeAct Technologies AI-powered solutions: Exploration into Emerging Technology Systems and real world applications.

Creativeact.net

Try out our beta prompt enhancement agent today!

Input a prompt and receive a professional quality reusable template.

#llm #promotengineering #chatgpt #claude #ai #languagemodel

1 0 0 0
Post image

Can i have an opinion for qwen AI? Is it good? I think it's the most funny and exaggerated LLM out there but Is it reliable?
#qwen #ai #llm #languagemodel #question #artificialintelligence
#artificial
#chatgpt
#google

1 0 0 0

Copilot Code Review特定のファイルに対してのinstructionをapplyできるように "Copilot code review: Path-scoped custom instruction file support - GitHub Changelog" github.blog/changelog/2025-09-03-cop... #Github #LanguageModel

1 0 0 0
Post image

Czym są benchmarki dla LLM i czy naprawdę mówią, który model jest „lepszy”? 🤖
Wyjaśniamy:
– co mierzą,
– które są najpopularniejsze,
– jakie mają ograniczenia.

Zapraszamy na bloga azurro.pl/jak-porownac...

#innovation #HelloWorld #ArtificialIntelligence #largelanguagemodels #AI #languagemodel

0 1 0 0

プロンプトマークアップ言語 "microsoft/poml: Prompt Orchestration Markup Language" https://github.com/microsoft/poml #LanguageModel #program

1 1 0 0
Preview
ChatGPT-5 from OpenAI: What the New AI Model Can Do Smarter, faster, clearer, and with minimal nonsense – that’s how the new language model ChatGPT-5 is being presented. It’s expected to soon replace the current lineup of models. Most importantly, it will be available even in the free version. Reportedly, it won’t just be for geeks and professionals – even casual users who’ve only clicked […] Post ChatGPT-5 from OpenAI: What the New AI Model Can Do at Root-Nation.com.

ChatGPT-5 from OpenAI: What the New AI Model Can Do #ChatGPT5 #OpenAI #ArtificialIntelligence #LanguageModel #AI

1 1 0 0
¿Qué es el Language Model (Modelo de Lenguaje) en IA – Fundamentos de la IA
¿Qué es el Language Model (Modelo de Lenguaje) en IA – Fundamentos de la IA YouTube video by Juan Garcia Asensio

Fundamentos de la #IA: ¿Qué es el #LanguageModel (Modelo de Lenguaje) en #InteligenciaArtificial youtube.com/shorts/9xUCF...

0 0 0 0
Post image

AI w sporcie to codzienność. Analiza meczów, spersonalizowane treningi, wykrywanie kontuzji, wsparcie kibiców – to tylko część zastosowań. Sprawdź, jak technologia zmienia sport na naszych oczach azurro.pl/ai-w-sporcie...

#innovation #LLM #AI #languagemodel #ModelJęzykowy #technews #tech

2 1 0 0

Latin American nations to launch own BO model in September, entering global BO race.
A key goal is preserving #Indigenous #languages.
potatonews.com/ai-news/lati...
via @cybernews.bsky.social
#xl8 #latamgpt #ai #languagemodel #langsky

0 0 0 0