php - How do I find all YouTube video ids in a string using a regex? -


i have textfield users can write anything.

for example:

lorem ipsum dummy text. http://www.youtube.com/watch?v=duqi_r4sgwo of printing , typesetting industry. lorem ipsum has been industry's standard dummy text ever since 1500s, when unknown printer took galley of type , scrambled make type specimen book. has survived not 5 centuries, leap electronic typesetting, remaining unchanged. http://www.youtube.com/watch?v=a_6gnzckaju&feature=relmfu popularised in 1960s release of letraset sheets containing lorem ipsum passages, , more desktop publishing software aldus pagemaker including versions of lorem ipsum.

now parse , find youtube video urls , ids.

any idea how works?

a youtube video url may encountered in variety of formats:

  • latest short format: http://youtu.be/nlqaf9hrvby
  • iframe: http://www.youtube.com/embed/nlqaf9hrvby
  • iframe (secure): https://www.youtube.com/embed/nlqaf9hrvby
  • object param: http://www.youtube.com/v/nlqaf9hrvby?fs=1&hl=en_us
  • object embed: http://www.youtube.com/v/nlqaf9hrvby?fs=1&hl=en_us
  • watch: http://www.youtube.com/watch?v=nlqaf9hrvby
  • users: http://www.youtube.com/user/scobleizer#p/u/1/1p3vcrhsygo
  • ytscreeningroom: http://www.youtube.com/ytscreeningroom?v=nrhvzbjvx8i
  • any/thing/goes!: http://www.youtube.com/sandalsresorts#p/c/54b8c800269d7c1b/2/pps-8dmran4
  • any/subdomain/too: http://gdata.youtube.com/feeds/api/videos/nlqaf9hrvby
  • more params: http://www.youtube.com/watch?v=spdj54kf-vy&feature=g-vrec
  • query may have dot: http://www.youtube.com/watch?v=spdj54kf-vy&feature=youtu.be
  • nocookie domain: http://www.youtube-nocookie.com

here php function commented regex matches each of these url forms , converts them links (if not links already):

// linkify youtube urls not links. function linkifyyoutubeurls($text) {     $text = preg_replace('~(?#!js youtubeid rev:20160125_1800)         # match non-linked youtube url in wild. (rev:20130823)         https?://          # required scheme. either http or https.         (?:[0-9a-z-]+\.)?  # optional subdomain.         (?:                # group host alternatives.           youtu\.be/       # either youtu.be,         | youtube          # or youtube.com or           (?:-nocookie)?   # youtube-nocookie.com           \.com            # followed           \s*?             # allow video_id,           [^\w\s-]         # char before id non-id char.         )                  # end host alternatives.         ([\w-]{11})        # $1: video_id 11 chars.         (?=[^\w-]|$)       # assert next char non-id or eos.         (?!                # assert url not pre-linked.           [?=&+%\w.-]*     # allow url (query) remainder.           (?:              # group pre-linked alternatives.             [\'"][^<>]*>   # either inside start tag,           | </a>           # or inside <a> element text contents.           )                # end recognized pre-linked alts.         )                  # end negative lookahead assertion.         [?=&+%\w.-]*       # consume url (query) remainder.         ~ix', '<a href="http://www.youtube.com/watch?v=$1">youtube link: $1</a>',         $text);     return $text; } 

; // end $youtubeid.

and here javascript version exact same regex (with comments removed):

// linkify youtube urls not links. function linkifyyoutubeurls(text) {     var re = /https?:\/\/(?:[0-9a-z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\s*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig;     return text.replace(re,         '<a href="http://www.youtube.com/watch?v=$1">youtube link: $1</a>'); } 

notes:

  • the video_id portion of url captured in 1 , capture group: $1.
  • if know text not contain pre-linked urls, can safely remove negative lookahead assertion tests condition (the assertion beginning comment: "assert url not pre-linked.") speed regex somewhat.
  • the replace string can modified suit. 1 provided above creates link generic "http://www.youtube.com/watch?v=video_id" style url , sets link text to: "youtube link: video_id".

edit 2011-07-05: added - hyphen id char class

edit 2011-07-17: fixed regex consume remaining part (e.g. query) of url following youtube id. added 'i' ignore-case modifier. renamed function camelcase. improved pre-linked lookahead test.

edit 2011-07-27: added new "user" , "ytscreeningroom" formats of youtube urls.

edit 2011-08-02: simplified/generalized handle new "any/thing/goes" youtube urls.

edit 2011-08-25: several modifications:

  • added javascript version of: linkifyyoutubeurls() function.
  • previous version had scheme (http protocol) part optional , match invalid urls. made scheme part required.
  • previous version used \b word boundary anchor around video_id. however, not work if video_id begins or ends - dash. fixed handles condition.
  • changed video_id expression must 11 characters long.
  • the previous version failed exclude pre-linked urls if had query string following video_id. improved negative lookahead assertion fix this.
  • added + , % character class matching query string.
  • changed php version regex delimiter from: % a: ~.
  • added "notes" section handy notes.

edit 2011-10-12: youtube url host part may have subdomain (not www.).

edit 2012-05-01: consume url section may allow '-'.

edit 2013-08-23: added additional format provided @mei. (the query part may have . dot.

edit 2013-11-30: added additional format provided @cronus: youtube-nocookie.com.

edit 2016-01-25: fixed regex handle error case provided cronus.


Comments