mastodon.ie is one of the many independent Mastodon servers you can use to participate in the fediverse.
Irish Mastodon - run from Ireland, we welcome all who respect the community rules and members.

Administered by:

Server stats:

1.8K
active users

#llm

124 posts110 participants14 posts today

Call for participation: *ClimateCheck* Shared Task ( sdproc.org/2025/climatecheck.h)

@NFDI4DS members Raia Abu Ahmad, Aida Usmanova and Georg Rehm are organizing a shared task “Scientific Fact-Checking of Social Media Claims on Climate Change (ClimateCheck)” on July 31 or August 1st, 2025 in Vienna, Austria, hosted as part of the SDP 2025 Workshop.

Deadline for system submissions: May 16, 2025

#NLP
#LLM
#climatecheck
#socialmediaposts
#climatechange
#misinformation
#SDP2025
#ACL2025
#Vienna
#NFDI4DS

sdproc.org5th Workshop on Scholarly Document Processing4th Workshop on Scholarly Document Processing

I wonder if..

..tagging online, and for that matter offline content to indicate support for '#woke' ideals, you know, evil things like equality, anti-fascism, diversity, abortion access and especially #trans rights..

..would help in various ways, other than just the declaration of support. Such as:

- self censorship by bigots and their lackies depriving them of access to the value we create
- reduced scraping and appropriation (eg for #LLM)
- making it harder to spot and censor #wokeness
etc

#GenerativeAI’s “black box” nature has brought quality assurance into sharp focus. From evals and benchmarking to guardrails, these approaches all play a crucial role in improving the reliability and accuracy of this technology.

Here are our top picks for understanding #LLM evals -
Large language model evaluation: A key to GenAI success: ter.li/cn7pbk

LLM benchmarks, evals and tests: A mental model : ter.li/juy34e

AI testing, benchmarks and evals: ter.li/smjohc

Search Engine Journal: AI Researchers Warn: Hallucinations Persist In Leading AI Models. “Despite billions in research investment, AI factuality remains largely unsolved. According to the report, even the most advanced models from OpenAI and Anthropic ‘correctly answered less than half of the questions’ on new benchmarks like SimpleQA, a collection of straightforward questions.”

https://rbfirehose.com/2025/04/01/ai-researchers-warn-hallucinations-persist-in-leading-ai-models-search-engine-journal/

No, #AI frontier models don't "just guess words", it's far more complicated than that.

#Anthropic built an #LLM "brain scanner" (so far AIs have been black boxes).

According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

This is a nice write up on my stance with #AI as well: sgnt.ai/p/hell-out-of-llms/. I have nothing against #machinelearning but against #llm's that are general can't be great at everything. It's what is the problem with #IT in general: everyone must be great at everything (#devops) and that will not be the case; #software #craftsmanship is suffering.

sgnt.aiGet the hell out of the LLM as soon as possible | sgnt.aiDon’t let an LLM make decisions or implement business logic: they suck at that.

You know, we invented systems before there were computers.
'Forms' were on paper, rather than on screens.
An 'in tray' was an actual metal wire, or wooden tray, for paper letters, notes, memos and forms.
A database was called a 'filing cabinet'.
An 'interface' was a mail box.
A 'front end' was a person, with a job title like administrator, or clerk.
These systems were described, in excruciating detail, in procedure manuals.
The processes were run not by CPUs, but by people.
'Bugs' were when people made mistakes.

Systems were difficult to understand, even harder to diagnose, and very very hard to fix or change.
To change the way a department worked, for e.g. accounts receivable was so hard that most companies never even tried.

And yet somehow people are under the impression that it is the code that is the difficult bit about modern business systems.
So they try and make the code part easier.
#LowCode #LoCode #NoCode #AI #GenAI #LLM

It was never the code. Code was never the bottleneck.

raganwald.com/2012/01/08/duck-

raganwald.comDuck Programming

"If I’m 4 years old and my partner is 3x my age – how old is my partner when I’m 20?"
Do you know the answer?

🤥 An older Llama model (by Meta) said 23.
🤓 A newer Llama model said 28 – correct.

So what made the difference?

Today I kicked off the 5-day Kaggle Generative AI Challenge.
Day 1: Fundamentals of LLMs, prompt engineering & more.

Three highlights from the session:
☕ Chain-of-Thought Prompting
→ Models that "think" step by step tend to produce more accurate answers. Sounds simple – but just look at the screenshots...

☕ Parameters like temperature and top_p
→ Try this on together.ai: Prompt a model with “Suggest 5 colors” – once with temperature 0 and once with 2.
Notice the difference?

☕ Zero-shot, One-shot, Few-shot prompting
→ The more examples you provide, the better the model understands what you want.