https://www.cryptokitties.co/. You buy a virtual cat with virtual currency. And then, unlike regular cats, you pay to get them knocked up and have kittens. But its ok, its with virtual currency.
Is this the end-goal of the ‘I own nothing’ world we are entering, where uber + self-driving cars instead of owning, airbnb instead of buying, etc? Where we own virtual pets paid for by numbers we’ve never seen, created by people we’ve never seen? Its very, um, ephemeral.
So, lets look at a sample kitten. Unlike a real cat website (e.g. kittenwar.com … go there now, i demand it!), there’s not much in the way of pictures. This one, ‘Swampgreen!’ has this to say:
Bio Ciao! I’m Swampgreen !. If you also can’t stand the smell of wet food, we’re going to be fast friends. Honestly, eating lasagna is all I care about at this stage in my life. I like your face.
Swampgreen! is yours for the low low price of 0.0149eth. If we convert that to a nominal USD amount, that is $13.08. You’d be a fool not to, given that ‘Raga+3xMorning’ is going for 0.0798Eth ($70.07).
Now the most expensive on here are asking 1M Eth (so about $1B), probably just hoping Jeff Bezo’s clicks by accident.
OK, fess up. Most of you run adblock(plus/adaware/pihole/…). And you love it. Other than Forbes, the world is a kinder gentler place.
But recently some of you may have noticed the battery on your laptop is poor, your fans run more. You are a victim of drive-by crypto-mining (probably monero, but could be bitcoin or ethereum). O noes!. How did this sneak in? Did you forget to update your adblock list? But it is. Hmm.
Well, have a look at antipopads. Do you think this will scale for you? It seems the malvertising people have found the same Domain Generation Algorithm (DGA) that the botnets have been using for a decade. They rotate the domain name using a predictable (to them) algorithm, and yet stymie your /etc/hosts or adblock approach, which is dictionary based.
This slide (from cisco) shows an example of how these domains might be generated. And each malware is different, meaning you can’t easily code the algorithm.
What should we do? Should firewall and adblock vendors give up?
Well, you and I, looking at the domain list, might have an inkling of good versus bad domain. Lets try. here is a list of ~10 generated malware sites, and ~10 normal sites.
Can you, gentle reader, see which are real and which are not? Yes, it turns out you can. Now, lets examine your algorithm:
can i pronounce it (englishness)
is it too short? too long?
It just looks like/not like a real domain
OK, so I can hire you to fix this problem right? You sit in my basement, and each DNS name I lookup you hit a red/green button? And I sometimes reward you/punish you? Ha!
OK, so if we can teach you to do this, we can teach a machine. More specifically, we teach a machine to teach itself. Here is an example implementation (from a paper here). We have a ‘good’ set (alexa top N), and a ‘bad set’ (from here). These are called labelled data. We then teach a machine algorithm (in this case a Long-Short-Term-Memory (LSTM)). It has a type of memory, it can ‘vaguely’ remember things (sort of like you do), e.g. ‘yahoo.com’ looks kind of like what i expect.
So we create a neural-network. We feed it some known good, and known bad, and correct its guesses, letting it iterate. Eventually it gets good enough we let it rip. We then integrate it in w/ dnsmasq on our firewall. Each new domain that comes through, we ask it: ‘good or dga’? If it says ‘dga’ (it gives a probability, so we would set the threshold to e.g. 80% likely), we respond 127.0.0.1.
So everyday there is a new site which fesses up that they have been pwned. Someone came in, stole the lot, but don’t worry, they only got something minor. And then a few days later, well, it seems there was a bit more. And by the time you stop caring about the story, it comes out that they got the universe.
And you, despite being a loyal reader of this blog, have used the same password on two sites. And you are pwned. [[ Side note: if you stop reading now, go to this link and check yourself out https://haveibeenpwned.com/ ]]
If you administrate a ‘real $ network’ one of your concerns is your team. You are only as strong as the weakest link. And you just know someone on your team uses the same password on some irrelevant blog as on your key customer data server.
So you concoct a plan. You will download all the leaks, and build a big database of them. A little google, a little dark-web-fu, you are there. You will make some pre-check script on your password db that checks people’s proposed passwords.
And then you run into a bit of an issue. You really need to have this on everyone’s desktop. And its kind of big. And its not obvious you want to do that.
So the path I have been researching is to use a type of AI called a ‘Generic Adversarial Network’. The idea is to train a model on this dataset. You then ship the model to each desktop (and the model cannot be reversed since its lossy). The model would say “this password you propose it is *similar* to the dataset, and thus you should not use it. But, i found this pretty difficult to get correct-enough to use.
This is a tough slog of a read (167 pages), but there is a proposal from France on cyber security (the parent web page here). I tried a machine-translation to English, but, well, the fonts are embedded as images somehow. Hmm, its like a scan, its a set of pictures. Boo. OCR? What is this, 1990? Plus the font is some sort of comic-sans/cursive script, with accents, I’m not optimistic. So I’ll slog on w/ my amateur French.
In a nutshell:
Don’t lose individual privacy or freedoms while fighting oppressors
Private companies cannot fight back
Companies can be liable for cyber-vulnerability in their products as long as commercially available
And should release source and docs at end of life
A hotline (think the red phone from the cold-war) to call other ‘actors’
Sovereign state can use cyber-offense as part of defense
Education of people
249 systematically important organisations
PS, I tried the OCR. Tesseract has a Long-Short-Term-Memory machine-learning approach. The training set is small, which surprises me.
DeSs MONACES en éVOUTION emecm emcem ssem m osmme me smme mem emme vamem n eee 11 1.1.1. — lL‘espionnage .c ccc ec ecec rsc rs r rrr rrr rs rs 11 1.1.2. — La CYDEPCTIMÎIMAUTÉ ccce ce sec ose se se se sem eme se se eme se 12 1.1—3, LÀ L SALLOÔTI ssme nsme eme nsem ne ee ee eeme o 13
From this input.
So, i’m not going to spend more time on this. I mean, yeah, its kinda working, and i guess finding and training the font, etc.
So, in the comments… If you are a native cyber-french speaker, what did you find in the doc?
Also, if you are a OCRist-extraordinaire, and want to have a crack, let me know.