String extraction is one of the main tasks that all programmers need. Sometimes it's getting even harder because we don't get an easy string presentation to extract useful data/information. Here some,
We need to extract all MAC address from an arbitrary string
mac = "ads fs:ad fa:fs:fe: Wind00-0C-29-38-1D-61ows 1100:50:7F:E6:96:20dsfsad fas fa1 3c:77:e6:68:66:e9 f2"Using Regex
The regex should supports windows and Linux mac address formats.
lets to find our mac
mac_regex = /(?:[0-9A-F][0-9A-F][:\-]){5}[0-9A-F][0-9A-F]/i
mac.scan mac_regexReturns
["00-0C-29-38-1D-61", "00:50:7F:E6:96:20", "3c:77:e6:68:66:e9"]
We need to extract all IPv4 address from an arbitrary string
ip = "ads fs:ad fa:fs:fe: Wind10.0.4.5ows 11192.168.0.15dsfsad fas fa1 20.555.1.700 f2"ipv4_regex = /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/
Let's to find our IPs
ip.scan ipv4_regexReturns
[["10", "0", "4", "5"], ["192", "168", "0", "15"]]
https://gist.github.com/cpetschnig/294476
http://snipplr.com/view/43003/regex--match-ipv6-address/
Assume we have the following string
string = "text here http://foo1.example.org/bla1 and http://foo2.example.org/bla2 and here mailto:test@example.com and here also."**Using Regex** ```ruby string.scan(/https?:\/\/[\S]+/) ```
Using standard URI module This returns an array of URL's
require 'uri'
URI.extract(string, ["http" , "https"])Using above tricks
require 'net/http'
URI.extract(Net::HTTP.get(URI.parse("http://rubyfu.net")), ["http", "https"])or using Regex
require 'net/http'
Net::HTTP.get(URI.parse("http://rubyfu.net")).scan(/https?:\/\/[\S]+/)require 'net/http'
Net::HTTP.get(URI.parse("http://pastebin.com/khAmnhsZ")).scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i).uniqAssume we have the following HTML contents and we need to get strings only and eliminate all HTML tags.
string = "<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is another <strong>contents</strong>.</p>
</body>
</html>"
puts string.gsub(/<.*?>/,'').stripReturns
Page Title
This is a Heading
This is another contents.