No existing corpus that fits your niche research topic? Build your own corpus! With seed words, the corpus theme might be anything – even Christmas. www.sketchengine.eu/guide/create...
#textdata #textcorpus
Very interesting work on extracting 'scalar constructs' from #TextData with #LLMs by @haukelicht.bsky.social and colleagues: arxiv.org/abs/2509.03116
Are you an R user tired of missing out on the LLM craze?
In my new tutorial I show how to use OpenAI’s GPT and Google’s Gemini models to classify political texts. I connect to the APIs directly from R using reticulate.
alhdzsz.net/posts/llms_r...
#rstats #dataviz #ai #python #textdata
Shoutout to @ivelasq3.bsky.social and @posit.co for the opportunity to write a blog post about how I'm using `library(mall)` and integrating large language models into our energy security research! #textdata #LLM #energy #energysecurity #socialscience #datascience #NLProc
In this paper, we demonstrate that different #topics in #textdata exhibit varying degrees of #geospatiality, with some containing more #geographic mentions or #geotagged #locations than others.
The new Lithuanian Web corpus 2021.
Following our recent update for Lithuanian, we’re introducing the new Lithuanian Web corpus 2021! It's lemmatized, part-of-speech tagged, and classified by genres and topics.
#corpuslinguistics #digitalhumanities #textdata
www.sketchengine.eu/lttenten-lit...
1/2. 🖼️📝🤖 AI advances with clustering swap prediction for image-text pre-training, enhancing data efficiency and model performance. www.azoai.com/news/2024053... #AI #Innovation #Technology #MachineLearning #DataScience #Efficiency #ModelTraining #VisualData #TextData #Future
Sources reveal that OpenAI has explored training GPT-5 on public YouTube video transcripts. Additionally, experts suggest that the AI industry's demand for high-quality text data could surpass supply within the next two years. #OpenAI #GPT5 #YouTubeTranscripts #TextData #AIIndustry
Frequently cited QLR papers (2019):
1) Latent Dirichlet Allocation for #textdata
2) Test-retest reliability via intraclass correlation
3) #SysReview of impacts of patient reported outcome measures
#HRQoL