Since it wraps existing browsers (chromium/safari/electron/webkit) is it really fair to not say that in the readme? Sure it mentions that you should install them under "quickstart", but all the other parts compare itself directly against other browsers when this really seems to fill more of a "plugin-like" role than a full browser.
Also, when doing readme's for ambitious projects it's probably best to mark what features are complete and which ones are just planned.
Right now it looks like the whole V-lang debacle IMO with lots of features promised and no way to know what is implemented and not, while wrapping other projects without acknowledging it.
> Also, when doing readme's for ambitious projects it's probably best to mark what features are complete and which ones are just planned.
I think that's what issues and milestones are for. I tried to document as much as possible. Currently, a lot of UI features are not ready, and the HTML and CSS parser both need major rework.
> Right now it looks like the whole V-lang debacle IMO with lots of features promised and no way to know what is implemented and not, while wrapping other projects without acknowledging it.
Not a single external library included. Pretty much everything implemented by myself over the last 1 1/2 years.
Please don't confuse a Browser with a Browser Engine. I never claimed to replace WebKit, Blink and neither Gecko.
For now, I, as a single person working on the project, just didn't have the time to fork Servo into a reduced library that can be bundled anyhow - because that's a shitload of work to be done. For now I'm focussing on parsing and filtering, because the underlying network concept was proven to work. I had to build a test runner, too, to be able to even test these behaviours.
Currently the prototype of the UI is implemented in HTML5, reusing as much as possible from the nodejs-side ES2018 modules.
I'm not saying that you have to fork or implement a browser, I'm just saying that if you wrap an existing browser it's best to be upfront about it and not make it sound like you have implemented a browser. I wouldn't have objected if the readme said something like "It uses chromium/electron or other browsers under the hood" at the beginning.
This is especially critical if you claim improved privacy or security.
> For now, I, as a single person working on the project
I'm not complaining about the quality or the scope of the work. I'm saying that what the readme presents as the current status of the project does not match reality.
I saw that you updated the readme to include that it is based on other browsers, great!
The feature list still seems intact though, so if that is intended to represent the current state perhaps you can help me understand these points (And please forgive me if you just intended to update it later):
> It uses trust-based Peers to share the local cache. Peers can receive, interchange, and synchronize their downloaded media
What is a trusted peer? How is trust established?
> It is peer-to-peer and always uses the most efficient way to share resources and to reduce bandwidth, which means downloaded websites are readable even when being completely offline.
What P2P tech is used here? Again, how is trust handled for resources downloaded over P2P networks?
If the network isn’t free and data is centralized, one day you think you have it all and the next you could have nothing. Tor pretends to be secure, but is dark and compromised. This project seems to understand that and wants to try again to fix via P2P in a way that has promise.
The simple implementation of web forms is broken in today’s web. It’s an input field or other element styled as an input field that may or may not be grouped or in a form object, possibly generated dynamically. Websockets, timers, validations, ... it’s a huge PITA.
The DOM is a freaking mess. It’s not there until it’s there, it’s detached, it’s shared. It’s been gangbanged so much, there’s no clear parent anymore.
ECMAScript- which version and which interpretation should Babel translate for you, and would you like obscufation via webpack, and how about some source maps with that, so you can unobscufate to debug it? Yarn, requireJS, npm, and you need an end script tag, should it go in the body or the head? You know the page isn’t full loaded yet, and it won’t ever be. There, it’s done, until that timer goes off. Each element generated was just regenerated and the old ones are hidden, but the new ones have the same script with different references. Sorry, that was the old framework, use this one, it’s newer and this blog or survey says more people use it.
For a P2P open data sharing network over https, the proxy could allow a request to get someone else down the path. Not everything is direct.
> Tor pretends to be secure, but is dark and compromised.
Citation needed. Please stop with the "tor is compromised" meme... and what do you even mean by "dark"? What the hell... Tor is by no means a perfect anonymity solution but it's to my knowledge the best we've got. It's certainly way better than a VPN or no anonymization at all.
More specifically, tor anonymity is limited by the fact that it's low-latency. This is a fundamental limitation of any low-latency transport layer and not the fault of the tor developers or any obscure forces. In particular, if your attacker has control of both your entry point (your tor guard node or your ISP) and your exit point (tor exit node, or the tor hidden service or website you are connecting to) , it becomes possible to de-anonymize your connection (to the specific exit point in question) through traffic analysis. There's just no way around that for a network meant to transport real-time traffic (as opposed to plain data or email for instance). And yes, it stands to reason that various intelligence agencies will have invested in running exit nodes or entry nodes but this is just unavoidable. What you can do counteract this is to run your own nodes or to donate to (presumably) trustworthy node operators.
I think it's also worth noting that although tor can by no means 100% guarantee that you will be free from government surveillance at all times, it does make mass surveillance more difficult and more error-prone, and to me that's the whole point. Furthermore, although government surveillance cannot be thwarted 100%, tor does make corporate surveillance basically impossible (assuming you can avoid browser fingerprinting; this is what the tor browser is for).
All in all, I can't claim tor is perfect (because it can't be!) but the more people use it the better it gets and it's certainly better than anything else, so please stop spreading FUD and encourage people to use it instead.
Also, it's unclear to me how Stealth helps at all with hiding the IP addresses of its participants... It claims to be "private" but the README doesn't say anything about network privacy...
The code doesn’t strike me as concerning itself with protecting privacy so much as changing who will get to log your traffic. Interesting effort though; I’ll hope for more details from them in the future!
Chill, bro. I said “seems hand-wavy” and “I’d love to be wrong”. I was hedging my bets and clearly indicating this was a surface-level read. I shouldn’t have to have a better alternative on deck to point out something in the codebase that didn’t seem to be privacy-friendly. No offense was meant.
Since you asked how I would do things: I would have had a clear and detailed security-specific document or section of the readme to detail in what ways it is peer-to-peer and in what ways it is private. I would have probably gestured towards the threat model I used when designing the protocols, but —- let’s be honest —- I’d probably be too lazy to document it adequately. As far as I can tell, there’s one paragraph in its developer guide on security and two paragraphs on peer-to-peer communication and I wasn’t able to get a good read on its concrete design or characteristics.
> Note that the DNS queries are only done when 1) there's no host in the local cache and 2) no trusted peer has resolved it either.
This wasn’t clear to me from my first spelunk through the readme or the docs. Are you affiliated with the project? Is there a good security overview of the project you know of?
> I mean, DNS is how the internet works. Can't do much about it except caching and delegation to avoid traceable specificity.
What I meant to say is, I was not so sure that the google public dns could be considered private. But nevermind on that, I can’t confirm their logging policies. I’m probably just paranoid about how easy google seems to build a profile on me. So yeah, as mentioned, just my initial read.
Hey, my comment wasn't meant in a defending manner...I'm just curious whether I maybe missed a new approach to gathering DNS data :)
I've seen some new protocols that try to build a trustless blockchain inspired system, but they aren't really there yet and sometimes still have recursion problems.
When I was visiting a friend in France I first realized how much is censored there by ISPs and cloudflare/google and others, so that's why I decided it might be a good approach to have a ronin here.
I totally agree that threat model isn't documented. Currently the peer to peer stuff is mostly manual, as there's no way to discover peers (yet). So you would have to add other local machines yourself in the browser settings.
Security wise there's currently a lot of things that are changing, such as the upcoming DNS tunnel protocol that can use dedicated other peers that are connected to the clearnet already by encapsulating e.g. https inside dns via fake TXT queries etc.
> public dns could be considered private
Totally agree here, I tried to find as many DoT and DoH dns servers as possible, and the list was actually longer before.
In 2019 a lot of dns providers went either broke or went commercial (like nextdns which now requires a unique id per user, which defeats the purpose of it completely)... But maybe someone knows a good DoH/DoT directory that's better than the curl wiki on github?
Thanks for following up with added info! I’ll look forward to seeing the project progress; It’s an area I’m super interested in. As far as naming systems better at privacy than DNS, I’m not aware of any serious options. Personally, I’m working on implementing something that hopes to improve the verifiability of naming resolutions, but thats a long ways off: https://tools.ietf.org/html/draft-watson-dinrg-delmap-02
Large scale actors (read: ISPs and government agencies) have a huge amount of entry and exit nodes. They can simply measure timestamps and stream bytesizes, which allows them to trace your IP and geolocation.
They do not have to decrypt HTTPS traffic for that, because the order of those streams is pretty unique when it comes to target IPs and timestamps.
Yes, hidden services are safe (well, no system is really safe). But if e.g. a hidden service includes a web resource from the clearnet, it can be traced.
I was talking about the "using tor to anonymize my IP" use case, where exit nodes get a huge amount of traffic per session.
In order to be really anon you would need a custom client side engine that randomizes the order of external resources, and pauses/resumes requests (given 206 or chunked encoding is supported), and/or introduces null bytes to have a different stream bytesize after TLS encryption is added.
Hidden services are safer in the sense that your connection can't be deanonymized with the help of your third relay (which would have been an exit node in the case of a clearnet connection) but if the hidden service in question were to be a honeypot and your entrypoint (ISP or tor guard node) were to be monitored by the same entity (this second requirement also holds for clearnet connection monitoring BTW), it would be possible to deanonymize your connection to the hidden service.
How easy it is to perform the traffic analysis would have to depend on the amount of data being transferred, if I had to guess, so downloading a video would probably be worse than browsing a plaintext forum like hackernews. But if we're talking about a honeypot, your browser could be easily tricked into downloading large-enough files even from a plaintext website (just add several megabytes of comments in the webpage source for instance).
> In order to be really anon you would need a custom client side engine that randomizes the order of external resources, and pauses/resumes requests (given 206 or chunked encoding is supported), and/or introduces null bytes to have a different stream bytesize after TLS encryption is added.
It's unclear to me how any of this helps avoid traffic analysis. I believe tor already pads data into 512-byte cells, which might help a little bit.
https is used primarily. If there's only http available, trusted peers are asked for two things: their host caches for that domain and whether or not the data was transferred securely via https (port and protocol).
If either of those isn't statistically confirmed, it is assumed that the targeted website is compromised.
Currently I think this is an as good as possible approach, but otherwise I have no idea on how to verify that the website is legit without introducing too much traffic overhead for the network.
Personally, I wouldn't trust any http or https with tls < 1.2 website anyways. But whether or not that assumption can be extrapolated...dunno.
Do you have another way to verify its authenticity in mind?
I don't. Authentication is a pain, and the later in the stack you try to solve it the harder and hackier it gets. We have the gross hack of certificate authorities because we failed to deliver authenticated information via the domain name system.
> their host caches for that domain
Hm, would this trip a MITM flag if someone switched hosting providers? Like, if example.com is http only and is moved to a new datacenter, is there a way to distinguish this from someone MITMing the traffic?
Other posters have already pointed at some examples, it boils down to that you are taking in a relatively very large attack surface and one that is rather difficult to validate effectively and exhaustively.
Not exactly, the chromium/chrome sandbox isn’t dependent on how and what code you execute the electron/node one is and that is because the latter were designed to execute code across many more privilege levels than what “dedicated” browser needs.
If I download and build chromium (as long as I don’t disable the sandbox altogether) I don’t actually need to think about those issues while I do need to do that with Electron.
Between this and the Sciter (edit: added missing r, thanks everyone :-) project the other day and a number of other projects I'm starting to get optimistic that the web might soon be ready for a real overhaul.
I'm not sure if GP meant sciter or scite.ai (both of which had a few posts about them recently), or even SciTE.
However I don't see how either of those indicate some "real overhaul of the web", as sciter seems to be "just" a embeddable HTML/CSS engine which doesn't seem like a big change compared to e.g. webview.
> but it would indeed not overhaul the web in the slightest.
Kind of agree.
What I am hoping for is
either a new rendering engine that purposely incompatible with abusive websites and so much faster that people like us will use it anyway for everyplace it works and just keep a mainstream browser in backup for abusive web pages,
Author of the project here. I didn't expect this to be posted on HN because the project is kind of still in its infancy.
The motivation behind the Browser was that I usually "use the Web" as a knowledge resource and am reading articles online, on blogs, on news websites, on social media and so on. But there are a couple of problems when seeing "what a Browser is" currently. A Browser currently is made for manual human interaction, and not for self-automation of repetitive tasks. These are currently only available at the mercy of programming or extensions, which I do not think is the reasonable way to go.
Why block everything except 2 things on a website when you could just grab the information you expect a website to contain?
I'm calling it a Semantic Web Browser because I want to build a p2p network that understands the knowledge the websites contain with its site adapters (beacons) and workflows (echoes), whereas the underlying concept tries to decentralize as much automation aspects as possible.
In the past a lot of websites were taken down (some for BS reasons, some not), but what's more important to me is the knowledge that is lost forever. Even if the knowledge is web-archived, the discovery (and sharing) aspects are gone, too.
My goal with the project is to be a network that tries to find truth and knowledge in the semantic content of the web, whereas I'm trying to build something that understands bias in articles, the authors of said articles, and history of articles that were (re-)posted on social media with biased perspectives.
I know that NLP currently isn't that far, but I think with swarm intelligence ideas (taken from Honeybee Democracy and similar research on bees) and the compositional game theory, it might be possible to have a self-defending system against financial actors.
Currently the Browser doesn't have a fully functioning UI/UX yet, and the parsers are being refactored. So it's still a prototype.
It is a decentralized Browser in the sense that if you've trusted peers in your local (or global) network, you can reuse their caches and share automation aspects with them (your friend's or your family's browser(s)), which allows fully decentralized (static) ways to archive websites and their links to other websites.
I'm not sure where the journey is heading to be honest, but I think the Tholian race and the naming makes it clear: "Be correct; we do not tolerate deceit." pretty much sums up why I built this thing.
Currently I don't have funding, and I'm trying to build up a startup around the idea of this "Web Intelligence Network" whereas I see a potential business model for large scale actors that want to do web scraping and/or gathering via the "extraction adapters" of websites that are maintained by the network.
I think this project turned out to be very important to me, especially when taking a glimpse at the post-COVID social media that contains so much bullshit that you could easily lose hope for humanity.
This project looks amazingly promising, thank you for creating it and I wish you the best of luck in its success.
One humble suggestion/idea I offer to think about, related to:
> It uses trust-based Peers to share the local cache. Peers can receive, interchange, and synchronize their downloaded media. This is especially helpful in rural areas, where internet bandwidth is sparse; and redundant downloads can be saved. Just bookmark Stealh as a Web App on your Android phone and you have direct access to your downloaded wikis, yay!
Trusted peers with a shared web cache is a good start, but how about _trustless_ peers? Is this possible?
Possibly using something like https://tlsnotary.org - which uses TLS to provide cryptographic proof of the authenticity of saved HTTPS pages (but unfortunately only works with TLS 1.0)
All requests are shareable. Conditions for this are:
1. You have a trusted peer with a local IP configured (peer A knows Peer B and vice versa)
2. Peer A is downloading the url currently (stash) or is done downloading (cache)
3. Peer B can then reuse the same stream or download the file via Peer A
Note that stealth has for this reason also an HTML5 UI. Download a video on desktop, let stealth running and go to your Android or iOS tablet...connect to desktop-ip:65432. Open up the video and get the same stream, too :)
Any proxy could act as a MIM, so someone using a malicious fork of Stealth may cause problems.
But, the net is like this already. One site may send you to another site that tricks you into stealing your data. And, a relatively recent vulnerability subverted any WebKit-based browser from stating whether the site’s URL was using the correct server, so you’d have no visible way of knowing a site using HTTPS was legitimate.
Using a VPN could be better, but it’s sometimes worse, because you change who is trusted more (the VPN provider), as they know one of the addresses you’re coming from and everything you’re doing, and can record and sell that data.
I would highly recommend trying Antidetection Browser GoLogin for Multi-accounting
All profiles are separated and protected
Each profile is in a separate container so that their data do not conflict with each other.
Before using the browser, GoLogin will open a page with your connection data, so you can make sure it is safe and anonymous. Antidetect without installing software
You only need to have a regular browser and Internet access, you are not tied to a specific place.
Automate any emulation process in a real browser. This will make your digital fingerprints look natural and your accounts will definitely not be blocked.
One-click access to any profile for each team member without any risk of blocking or leaking account data.