Do you think Internet SEARCH has gone sucky-sucky-so-so? Can you imagine a better experience? Do you have some coding (dis)ability, perhaps even friends-with-similar-benefits?
Then you -- yes, you -- might be interested in a project a bunch of European research institutions have been working on for the past two years, and now -- June 6 -- have released to the public.
The project -- imaginatively named the Open Web Search Initiative -- offers all elements of a modern day search engine in convenient open source packages; along with 6.61 billion urls, 923 TiB total, and 1 TiB of daily crawled data. The only thing left for you to do is to download a partial index of all that data to your own server(s) and develop your own custom software on top of that. Then ...
- Off to some VC millions
- ???
- Internet billions!!!
Please do return a percent of your revenue to this site though -- those private massages for the editor do not come cheaply, you know -- and an additional percent for this sub's author. Thank you's!
(Postscript: in case you're looking for funding as an open source developer; also, there's a free event on June 19-20 in Brussels.)
(Score: 0, Insightful) by DadaDoofy on Wednesday June 11, @11:51AM (16 children)
"Free, open and unbiased access to information"
Just who do they think they are fooling? This is a product of the very same EU that jails people for posting internet memes. What could possibly go wrong?
(Score: 4, Touché) by epitaxial on Wednesday June 11, @01:33PM (4 children)
You seem to be confused about the first amendment and geography.
(Score: 1, Touché) by Anonymous Coward on Wednesday June 11, @04:35PM (3 children)
Maybe you are the one that's confused... The EU has no first amendment to protect real free speech rights. And no, the UN charter is no substitute, doesn't even come close.
So, if this thing becomes sufficiently popular it will be heavily regulated
(Score: 1, Insightful) by Anonymous Coward on Wednesday June 11, @08:00PM
Neither does the US, when the constitution is not even upheld by those in power, especially the head cheetos.
(Score: 3, Informative) by pe1rxq on Wednesday June 11, @09:50PM (1 child)
Not a UN charter, but the "The European Convention of Human Rights".
It covers freedom of expression. Not exactly the same as "free speech" but pretty damn close. A big difference is that it explicitly mentions limitations while americans live with a fantasy of 'absolute rights'.
Article 10 – Freedom of expression
Everyone has the right to freedom of expression. This right shall include freedom to hold opinions and to receive and impart information and ideas without interference by public authority and regardless of frontiers. This article shall not prevent States from requiring the licensing of broadcasting, television or cinema enterprises.
The exercise of these freedoms, since it carries with it duties and responsibilities, may be subject to such formalities, conditions, restrictions or penalties as are prescribed by law and are necessary in a democratic society, in the interests of national security, territorial integrity or public safety, for the prevention of disorder or crime, for the protection of health or morals, for the protection of the reputation or rights of others, for preventing the disclosure of information received in confidence, or for maintaining the authority and impartiality of the judiciary.
(Score: 0) by Anonymous Coward on Sunday June 15, @04:09AM
Exactly.. "Congress shall make no law..." without explicitly mentioning those limitations is pretty absolute. To limit speech in the US, the constitution must be amended to explicitly include those limitations. Our problem is the corrupt judges that have decided to infer limitations that are not explicitly written into the law, which is a violation of the law as explicitly written. It's all pretty straight up.
(Score: 5, Insightful) by JoeMerchant on Wednesday June 11, @02:46PM (4 children)
Everything is biased.
I prefer my bias to be toward verifiable and/or reproducible fact oriented information, but that's just my bias.
🌻🌻🌻 [google.com]
(Score: 0, Troll) by DadaDoofy on Wednesday June 11, @03:26PM (3 children)
I prefer impose my own biases on an uncensored flow of information rather than be subject to the biases of authoritarian censors in Brussels. To each, his own.
(Score: 3, Insightful) by JoeMerchant on Wednesday June 11, @04:07PM
>an uncensored flow of information
Censored or uncensored, every flow is biased.
🌻🌻🌻 [google.com]
(Score: 5, Insightful) by Thexalon on Wednesday June 11, @04:08PM
Oh, I see. Your problem is one or both of the following:
A. You don't know what words mean. The top authorities in Brussels were either elected directly by EU citizens, or appointed by governments that were elected by EU citizens. Everybody else can be overridden. Ergo, they aren't authoritarian in the way that, say, Vladimir Putin, Kim Jong Un, or Xi Jingping are authoritarians.
B. You are in favor of stuff that has gotten people into trouble with the EU government. Since the only speech that has ever gotten people into trouble with the EU government is explicit bigotry, particularly espousing the beliefs of the Nazi Party, I must conclude that you would like more bigotry.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 4, Touché) by epitaxial on Wednesday June 11, @04:22PM
Recycled brexit talking points are funny.
(Score: 4, Interesting) by gnuman on Wednesday June 11, @08:43PM (5 children)
EU has Convention of Human Rights, which co-incidentally, is more expansive than your idiotic "first amendment",
https://www.coe.int/en/web/human-rights-convention/our-rights [coe.int]
You see, in EU corporations don't have a right to unlimited political spending .... furthermore, individual states could have restrictions on individual speech that does not violate these charters. You are *NOT* free to shit on minorities and incite hatred. Sadly, the lessons of this doesn't seem to have been learned, here and definitely not in current fascist America.
Please, go fuck yourself with your fake news. Thank you. -- Reality
There are not even such a thing as "EU jails". You might as well start bitching about UN jails here. No such thing.
(Score: 4, Informative) by linuxrocks123 on Wednesday June 11, @11:08PM (4 children)
No, it's not. It's drastically less expansive. See this case about a man convicted in the UK for silent, non-disruptive, expressive religious conduct. Any attempt to prosecute someone in the US for something like that would be laughed out of court.
https://reason.com/2024/10/17/british-man-convicted-of-criminal-charges-for-praying-silently-near-abortion-clinic/ [reason.com]
Not fake. Really happened. Dude silently prayed to himself on public property and got arrested and convicted for wrongthink. You guys really don't know how to do free speech right.
The specific governmental entity that runs the jails in your part of the world is a semantic point irrelevant to the topic of whether you can unfairly be put in one for saying things other people don't like. Guess you're focusing on irrelevant bullshit since you don't have any good counterpoints.
(Score: 4, Informative) by quietus on Thursday June 12, @05:47PM
That man was not arrested for praying, but because he breached a safe zone around an abortion clinic, and refused to move after an hour and 40 minutes [bbc.com] of polite requests to please move away.
(Score: 3, Informative) by gnuman on Friday June 13, @11:57AM (2 children)
From your very link
Trespassing.
You can set up your free speech zone in middle of Walmart or their parking lot too, but that will get you arrested for same reason. And your argument about freedom of speech would be laughed out of court too.
(Score: 2) by linuxrocks123 on Saturday June 14, @04:23PM (1 child)
Nope. Not trespassing. The so-called "buffer zone" is public property and does not belong to the clinic. A law purporting to ban silent prayer on public property would get laughed at by the judge and declared unconstitutional in any court in the US.
(Score: 3, Informative) by gnuman on Sunday June 15, @10:09AM
https://www.euronews.com/green/2023/04/25/berlin-activists-glue-themselves-to-roads-causing-massive-disruption-across-the-city [euronews.com]
look at that ... silent protest too. Not even a word uttered. Yet still arrested? Heck, some end up with jail sentences too. I'm certain you are saying that these things would be completely legal in US, right?
You should actually read actual history of what happens in America instead of believing imaginary bullshit about what happens in America.
https://en.wikipedia.org/wiki/Free_speech_zone [wikipedia.org]
(Score: 2, Interesting) by Anonymous Coward on Wednesday June 11, @04:43PM (4 children)
How does this compare to YaCy [yacy.net]?
(Score: 5, Informative) by Unixnut on Wednesday June 11, @05:20PM (2 children)
They seem completely different.
YACY is:
Open Web Search Initiative is (based on what I've gleaned from searching on their website):
Also so far I can't seem to find any ability to actually test their search engine. I guess being an "initiative" means nothing is yet built. The website is typical of the EU (pages and pages of waffle, with very little useful content, and very little achieved so far except funding rounds and Symposiums).
To me, based on what I've read so far, this "Open Web initiative" is inferior to the current centralised search engines available. Perhaps in future that will change depending on the direction the initiative goes and whether they actually get something up and running that can be tested.
As things stand, if you can't/don't want to use YACY you are probably better off using one of the US-based search engines IMO. Saying that, the way the web is going I really should put some effort into setting up my own YACY node, that may well be the last chance at a privacy protecting uncensored search engine.
(Score: 3, Informative) by quietus on Thursday June 12, @05:09PM
Here's their gitlab repository [it4i.eu]. They use the MIT license.
(Score: 3, Insightful) by Lester on Saturday June 14, @03:52AM
I can't understand how your comment is rated informative. It is plain wrong.
First, it is not a search engine, it's a crawling engine and a public index database of crawling results.
Second. Everything is open source.
Everyone will be able to offer a search service using that databases (even Google)
On the other hand, you are right. There is nothing yet, good intentions but little done.
(Score: 2) by quietus on Thursday June 12, @05:32PM
This initiative is also aimed at decentralized web indexing, but they're not aiming to build a search engine; only to provide the infrastructure on which others -- for free -- can build (prototype) alternatives to the existing search engine landscape. Their ultimate aim -- under the umbrella of the Open Search Foundation [opensearchfoundation.org] -- is for search to become a public (i.e. non-commercial) utility. From their faq:
On a side note (be it an important one), it seems that the EU -- after the privacy pillar -- is looking towards open source as an alternative model vis-a-vis reliance on Big Tech.
(Score: 4, Insightful) by ShovelOperator1 on Wednesday June 11, @06:24PM (1 child)
Another "carrot on the string" initiative, maybe they will even try to tempt some free software advocates to push for working on this. The problem is that Google, while they have their share to make the current Internet as it is (mostly by prioritizing commercial initiatives and amplifying artificial scarcity), is no way the culprit there. It's rather an indicator of the situation, and even the best search engine will not find the thing if it's not there. It's not there because websites with useful, practical knowledge are mostly dead and modern Internet is focused on "content", being the passive things to stuff ads into. Language models are good at generating "content", as the "content" may not be useful, it should be entertaining and it may be even totally fake - just like LMs provide fake info if they cannot get to it.
So sorry, as in this Monty Python's scene, the parrot is dead, and it doesn't matter how many times we will say that it's resting it will be dead as the Internet is now a playground of corporate-sponsored ad-sharing bots, instead of communities.
(Score: 2) by quietus on Thursday June 12, @05:12PM
If you won't even try to imagine a better Internet than, yes, the Internet is dead as a dodo. What are you doing here, anyway?
(Score: 2) by isj on Thursday June 12, @05:31PM (3 children)
The web page shows that they already build an index on the crawled data. I wonder if they take care of all the special cases with errors, eg. incorrect diacritics.
Years ago I worked on the Findx search engine an encountered a lot of weird corner cases, eg digraphs versus ligatures, the many forms of superscript/subscript, etc. I also hope they do the language detection well, because there are lots of erroneous but useful web pages.
I have some of my blog entries from back then at http://i1.dk/privacore_findx_blog/ [i1.dk]
(Score: 2) by quietus on Thursday June 12, @05:41PM (2 children)
Interesting, thanks. From a look at the Open Web Index datasets [openwebindex.eu], it looks like they have language or country specific crawls. One supposes that you'd need to be aware of ligatures in the German language.
(Score: 2) by isj on Thursday June 12, @05:56PM (1 child)
Ligatures exist in many typographies. Whether they are treated as ligatures or as digraphs depend on the language/orthography.
One particularly nasty one is the dutch ij. I seem to remember that it was treated as its own letter once, but they didn't have space to put on modern typewriters, etc. And now you end up with ij (two letters) and ij (ligature), and sometimes ÿ when OCR-scanned.
(Score: 2) by hendrikboom on Tuesday June 17, @02:23AM
And when ij occurs at the start of a sentence, both the I and the J get capitalized. But it is possible for an i to be followed by a j without the pair of them being the compound letter ij. In this case they would not both get capitalized.