The phenomenon of "I can't not do this"... until "I can't do this" underpins so much of this work. #FaceInterface
The phenomenon of "I can't not do this"... until "I can't do this" underpins so much of this work. #FaceInterface
Audience comment: Look at the foundations you're building it on. With software, realized it wasn't a stable foundation and had to rewrite a lot of code. A tool nearly collapsed because of MacOS updates, took multiple people 18 months to fix, nearly crippled a whole corner of the ecosystem.
Audience comment: so many projects are passions and hobbies, 80%+ donated time. Started thinking about business models and revenue streams from day 1, instead of making it and open-sourcing it and realizing I couldn't maintain it any longer. #FaceInterface
@tsmullaney.bsky.social Little taxes here and there (SFO landing fees, ICANN fees) enable people to get things done. This is how ICANN can run weeklong workshops to sort out non-Latin script domains, or curated exhibits at SFO airport. No one's going to argue about .99 surcharge on a plane ticket.
Marc Weber: the question of freezing language in amber -- the ship sailed with writing. If you don't update to the current changes in writing, you're doing harm. It's a much more weighty decision to orphan the languages. Necessary but not sufficient to encode languages. #FaceInterface
Hrant Papazian: "Preservation is for museums. Young people don't like to go to museums because they don't have a past. Young people want to make a future, and the future is the only way out. We have to build the future and get tools for the future. Young women matter more for NΓΌshu." #FaceInterface
@tsmullaney.bsky.social "Open source" is the great-great-grandchild of this romantic idea, there are communities that don't buy it. How does one confront this conflict between "shared heritage" and linguistic/cultural ownership. "Uni-" has a history, are we structurally repeating? #FaceInterface
@tsmullaney.bsky.social Notion of shared human heritage is a free market concept, an open sound-stage of existence. Romantic vision of a shared destiny, beauty, etc. But when that vision was literally guiding principle of marching gunboats to remove an obstacle... π #FaceInterface
Kamal Mansour: "What we're speaking about here is enabling digital writing, which is not the same as language. By representing the writing of particular languages in Unicode, enable them to create digital patrimony if they choose. It doesn't mean that's all they create." #FaceInterface
Audience comment: "I'm constantly running into how Unicode screws up my world. It doesn't take into consideration the full things necessary for the expression of the language. Nothing about the rules of typography and representation in layout. When fonts disagree you have chaos." #FaceInterface
Anushah Hossain: Holding up Unicode inclusion or digitization as the ultimate goal... is that too simplistic? What's the goal we're trying to achieve?
Peter Constable: "We should be enablers of local choice. And yes, that means that some languages will die. Make sure local communities know they have choices, choices are there, and we're there to support when they choose a path where we can help." #FaceInterface
Discussion question: Are we preserving language or enabling communication? Are we creating a Tower of Babel? Are we ossifying language through the unchanging nature of Unicode? #FaceInterface
@tsmullaney.bsky.social "Do you want to see the world saved, or do you want to be the one who saved the world?" In a structural way, this leads to the fragmentation and silo-ization. Ego plays a part. What organization says 'let's merge, and I'll take your name'?" #FaceInterface
@tsmullaney.bsky.social wrapping up #FaceInterface. This is the third iteration of the conference: urge to create "compostable" organizations, whose work can obviate the need for it to exist. Can get drunk on the legacy of it, never really want to fully solve the problem, want people to come back.
Related to LoCoS is a preceding and influential pictographic writing system, Blissymbolics: letterformarchive.org/news/blissym... #FaceInterface
Image from the Face / Interface conference of Dr. Tawfik Jelassi, UNESCO Assistant Director-General for Communication and Information
βLinguistic diversity is not just a technical challenge to be solved; it is a cultural treasure to be cherishedβ β Dr. Tawfik Jelassi, UNESCO Assistant Director-General for Communication and Information #FaceInterface #langsky
@tsmullaney.bsky.social at #FaceInterface "Privilege is the ability to move onto cooler problems", in the context of digitally-disadvantaged languages. For English we can worry about VR/AR, color fonts, etc. Most languages still need basic OCR/HTR/NLP. #MultilingualDH
Embroidered cover
Close up of embroidered cover
NΓΌshu script
NΓΌshu script
Eason Lu passed around a souvenir example of NΓΌshu script / 3rd day letter (with some embroidery on the cover! #DHmakes #FaceInterface #MultilingualDH
dolma
Sina Ahmadi closes out the #FaceInterface slides the best way: "these are some dolma my wife and I made."
Sina Ahmadi: Goal is to create a machine translation system. Limited amount of data, so fine-tuning existing models. Meta's No Language Left Behind model covers 200 languages. Super-low BLEU score for these languages with NLLB, fine-tuning had big improvements. #FaceInterface
Sina Ahmadi: Hawrami had almost 10 people contributing. 46 hours of speech data collected using DOLMA speech bot. Used same multilingual corpus, asked people to select language and read sentences. 28k utterances! #FaceInterface
Sina Ahmadi: Gave volunteers a set of sentences in a highly resourced language they know and in English. Community-driven multilingual parallel corpus, > 50,000 sentences total. Previously some of the languages only had 100 sentences online. All sentences aligned with English. #FaceInterface
Sina Ahmadi: Some skepticism, "adding fuel to cultural hegemony of Turkish language". (Someone on Reddit was mad because of dolma reference.) Dolma is a food, but thought it'd be a good name because it'd be outside of politics. It's an acronym! Nothing to do with Turkish! #FaceInterface
Sina Ahmadi: Vision was community building, data collection, NLP development, scientific dissemination, sustainability & impact -- in that order. Intensive outreach campaign last fall, publishers, language experts, academics, native speakers. 30 highly active volunteers. #FaceInterface
NLP support
languages
Last but not least, SILICON Practitioner Sina Ahmadi on "Dolma-NLP: Developing Language Technologies for Middle Eastern Languages". Middle East is a linguistically complex place, rich diversity of languages. Only a few are officially recognized. Goal is sustainable lang tech community #FaceInterface
LoΓ―c Marleix: As Ota said at the end of his book, the work cannot be done alone, and it's up to us to continue.
More on NΓΌshu from Lisa Huangβs 2021 Letterform Lecture: letterformarchive.org/events/view/...
One thing that makes @stanfordsilicon.bsky.socialβs #FaceInterface a unique conference: despite hailing from many different disciplines (engineering, design, linguistics), almost everyone here knows by heart the difference between βcharacterβ & βglyphβ.
(Okay, maybe also the Unicode Conference.)
LoΓ―c Marleix: Next week, symbol maker will be available. Database will be available, open-source, no gatekeeping here. Website is just the starting point; Ota's goals 15 years ago are still valid now. Wanted Ota to see that people are still interested. #FaceInterface