On the Importance of Pretraining Data Alignment for Atomic Property Prediction
Yasir M. Ghunaim, Hasan Abed Al Kader Hammoud, Bernard Ghanem
Action editor: Changyou Chen
https://openreview.net/forum?id=jfD9BsrDTb
#dataset #datasets #inception
But large #datasets bring challenges:
• Bias in digital data sources
• Measurement validity issues
• Risks of overfitting models
Therefore, validation and replication are essential in CSS research.
resumen ejecutivo del informe de datasets españoles en Zenodo
Ya está publicado el informe de #datasets de universidades españolas en #Zenodo con datos de diciembre-2025. Más conjuntos pero menor nivel de descripción. No se debe bajar la guardia. Las bibliotecas universitarias algo deben de hacer. www.javima.info/ciencia-abie...
#CienciaAbierta
👀 📣 To all users of eye-tracking-while-reading datasets: check out our comprehensive, filterable dataset overview!
Dataset overview: dili-lab.github.io/datasets.html
Preprint: arxiv.org/abs/2602.19598
Add or edit your dataset: www.cl.uzh.ch/en/research-...
#FAIR #eyetracking #datasets
"By analyzing massive #datasets .. #researchers uncovered networks involving “paper mills,” brokers, and compromised journals that systematically produce and sell fake #research, authorship slots, and #citations.": buff.ly/YJ4bqBU
via sciencedaily
#science #MedSky #research #ResearchJournals
Enter 100% verified active #AustriaWhatsApp #numberdata from trusted #WhatsAppDatabase companies. These premium #datasets offer a #gamechanging solution for #telemarketing and direct call marketing #campaigns, delivering unmatched accuracy, and ROI
buywhatsappdatabase247.blogspot.com/2026/03/aust...
The scryptIQ #machinelearning module covers both supervised and unsupervised learning methods: namely the classification and clustering of different #biological #datasets, including images.
scryptiq.ai
Science is more than papers
153M+ research outputs in the #OpenAIREGraph are linked to #datasets & #software
A growing web of connections allowing us to see how knowledge is built across publications, data & code, not just the final paper.
Explore connections
🔗 #GraphAPI shorturl.at/oRotk
🔗 #OpenAIRE EXPLORE shorturl.at/RIZoh
New #J2C Certification:
Probabilistic Pretraining for Improved Neural Regression
Boris N. Oreshkin, Shiv Kumar Tavker, Dmitry Efimov
https://openreview.net/forum?id=F6BTATGXaf
#datasets #tabpfn #regression
BGS' BritPits map shows the distribution of worked mineral commodities across the UK - tinyurl.com/5ydmtaf6
#Aspermont #BritishGeologicalSurvey #BritPits #MineralResources #MineralPlanningAuthority #Geology #Datasets
From Reflection to Repair: A Scoping Review of Dataset Documentation Tools" (new preprint via ArXiv) arxiv.org/abs/2602.15968 #data #datasets #rdm
Discussing AI in the sphere of geological modelling with respect to the tunnelling industry - tinyurl.com/54bxc7bs
#Aspermont #COWIfonden #UniversityofStrathclyde #TechnicalUniversityofDenmark #COWI #AI #Tunnelling #GroundInvestigation #DataSets #GeologicalModelling
How can AI classify multilingual research datasets?
doi.org/10.1108/EL-0...
Why read? It shows a practical pipeline using a fine-tuned Qwen2 to assign CLC codes to multilingual datasets.
Next step: More detailed cross-language evaluation (authors).
#ShortReview #AI #LLM #Classification #Datasets
Industry holds some of the richest #ocean #datasets — yet only 3% reach global #biodiversity repositories (Tides of Transparency, 2024).
📺 Ocean Literacy Webinar 2
🗓️ 17 March 2026 | Online
Register now on our website! 🔗 tinyurl.com/3993rj9t
#agentarium
#intelligence_module
#cognitive_infrastructure
#vdb
#ai
#data
#datasets
#agenticai
#rag
#graphrag
Occam’s Razor for SSL: Memory-Efficient Parametric Instance Discrimination
Eric Gan, Patrik Reizinger, Alice Bizeul et al.
Action editor: Georgios Leontidis
https://openreview.net/forum?id=GFNTbsVFlP
#supervised #regularization #datasets
1) Do #datasets have #DOIs? How are #data cited?
"At Pensoft we can do it in 2 ways: authors can cite both Data Papers and/or #Dataset. We recommend to cite both, and this is in our opinion the right way to do that" - Prof. Penev.
#lovedata26
@lovedataweek.bsky.social
AllenAI Introduces #AutoDiscovery: Automated Scientific Discovery Now Available in Asta Labs allenai.org/blog/autodis... #AI #datasets #data @ai2.bsky.social #research
List of Ethical Requirements for the study "Co-Design of a Trustworthy AI-based Prognostic Tool for Predicting Patient Outcome in Acute Stroke" zenodo.org/records/1848... #hvhebron #datasets #neuro [Text complet]
I pre-registered our #qualitative #datasets: Jeanrenaud et al. (in review): Digitalisierung als Chance für Frauen in MINT (digiMINT) doi.pangaea.de/10.1594/PANG...
Webinars | This morning, we hosted a webinar on the HiQLCD database version 1.4.0, together with the HiQLCD team.
If you missed it, the recording is available on our YouTube channel: www.youtube.com/watch?v=73jg...
#openLCA #webinar #database #chinese #update #datasets #lifecycleassessment
Real Estate Data Explained - Examples, Datasets & Top Providers
The real estate market is expected to grow at an annual rate of 2.69% (CAGR 2025-2029).
Learn about real restate data and top providers: www.hitechbpo.com/blog/real-es...
#realestatedata #datasets #realestatedataprovider
Do you want to create synthetic datasets for your AI projects? Try Creative Dataset Maker. Creative Dataset Maker let you use OpenAI LLMs to create datasets that could be useful for creating realistic datasets.
#AI #datasets #LLM
zerooneeta.gumroad.com/l/hhlwt
Resilience in Times of Crisis: Strengthening Open Science Against Geopolitical Pressures (via @leidenmadtrics.bsky.social) www.leidenmadtrics.nl/articles/res... #openscience #datasets @datarescueproject.org
On a Related Note...
#Guidelines and Best Practices For Making Government #Datasets Ready For #AI (via Gov of UK) www.gov.uk/government/p... #bestpractices #data
ICYMI UPDATE on the UK Government Effort to Create a National #Data Library www.gov.uk/government/p... #datasets
SPONGE: Competing Sparse Language Representations for Effective Knowledge Transfer
Jens-Michalis Papaioannou, Alexei Figueroa, Conor Fallon et al.
Action editor: Changjian Shui
https://openreview.net/forum?id=OevFdPgk3h
#nlp #annotated #datasets
FetchSeries - www.fetchseries.com
Freely Downloadable Data Sets Updated Daily
#Datasets #Commodities #Transport #EnvironmentAndClimate #Finance #Macroeconomics