
Placeholders can protect personal data from AI chatbots.
So-called large language models provide the basis for established AI systems. Many of these were trained using all available online data – only some respect copyright laws.
Now, a Swiss-based research team is focusing on only mining data that is publicly accessible and legally usable. 15 billion data units from 1,000+ different languages were fed into their LLM, of which around 60% came from English-speaking sources. Personal data was removed.
Both small and large versions are available, released under an open-source licence, which allows use in the fields of education and research, as well as for societal or commercial purposes.
Training AIs using amplified laser light also saves a lot of energy.



