- DNS name. Perhaps the user who created this cloud service registered it in internal DNS and a zone-transfer will find?
- Inference. A service or host we know of is using it. If enough of them are, we can use a venn-diagram or bayesian-style approach to infer this shared service is internal
- Firewall logs, *Flow logs. We can see flows from our internal users going there (but what if our users are nomadic or never work from the office?)
- HTML scraping. Perhaps it gets linked in your Wiki?
- Follow the link. Once you find one app, perhaps it has pointers to others?
Now, an interesting taxonomy, what if something could, with zero initial knowledge, wander around your network and tell you:
- First party apps on premise
- First party apps in cloud [sanctioned]
- First party apps in cloud [not yet sanctioned]
- 3rd party apps in cloud [sanctioned]
- 3rd party apps in cloud [not yet sanctioned]
That would be an interesting tool. And, I think that the appropriate Python (using Python WebKit or perhaps Lighthouse, TensorFlow, Numpy, and maybe BayesPy ) could achieve this goal. Perhaps give it a feed from the DNS (either from your DNS server, or just a packet capture on port 53), and let it wander around your network for a bit.
If you are (un)lucky you might even find another category, the covert exfiltration category, and watch your data going to somewhere else!