Some help with a regexp, please ? :)

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3.00 out of 5)
Loading...
By Oliver (AKA the Admin) on 18 comments
in Categories: Just Talking

I suck at search-and-replacing as soon as regular expressions come into play, so, please, if this is the kind of stuff you’re good at, could I humbly ask you for help ?

No worries if that’s not your strong suit, just move on then ;)

The reason I’m asking : the old posts of Hentairules, in which I used tables inside the posts (images on the left, text on the right), break the in-testing layout of the future new theme for the blog. I can search-and-replace for all tables-related HTML code, and replace with with nothing ( = deletion), but I’m unable to get rid of strings containing variable data like never-the-bloody-same height and width.

I want to be able to replace
<td width=”xxx” height=”yyy”><img src=”
with
<img width=”xxx” height=”yyy” src=”

or, if that is impossible,
simply delete all occurences of
<td width=”xxx” height=”yyy”>

Precaution warning : we cannot search-and-replace that simple code:
width=”xxx” height=”yyy”
because TONS of images also use it, legit images

That can be ran in 3 environments:
1 – ideally, within the SQL window of PhpMyAdmin
2 – if there is no choice, inside Notepad++ (but, really, really, PhpMyAdmin would be better suited, because, file size limits in N++ oblige, I’ll have to split the table of the posts and reintegrate it separately, which is to be frank a pain in the ass… unless you know of a text editor accepting huge files and ALSO accepting to run regexp operations)
3 – a bash script ran against an .sql dump of the database

Please, would you know how to do it ?
If, you, you can find how to get it to work, thank you so much to share how to do it, I’ll be grateful!! :D
(The comments engine of Hentairules will most likely alter or eat away special characters, I strongly recommend to use a pastebin-like to share code.)

Subscribe
Notify of
guest

18 Comments
oldest
newest most voted
Inline Feedbacks
View all comments
Oliver AKA The Admin
Admin
8 years ago

Wow, I wrote above that you should use a pastebin-like website to provide code in answers, but I have, myself, neglected that my blog engine itself was going to rewrite all straight double quotes with tilted double quotes, haha.

So, me too, I'll be providing a pastebin :D
http://pastebin.com/qw9vQFSF

godginrai
godginrai
8 years ago

http://ix.io/jpb

Call it like this: ./script file

Also, if you want it to edit the file inplace, add a -i flag in the sed command before the -E.

klawthathawu
klawthathawu
8 years ago

<td width="(.*)" height="(.*)"><img src="

<img width="$1" height="$2" src="

klawthathawu
klawthathawu
8 years ago
Reply to  klawthathawu

for Search/replace with Regexp in notepad++

RegExpRules
RegExpRules
8 years ago
Reply to  klawthathawu

Nope, that won't work.
You either need to put lazy quantifiers (.*?) or specify the character you want to be repeated (d*), or you might end up gobbling everything till the last occurrence.

Oliver AKA The Admin
Admin
8 years ago
Reply to  klawthathawu

Thank you, however that won't work directly in N++, as my database's dump is over the file for which N++ accepts to open files for me.

I can dump only the posts table, but then reintegrating it is more complicated. Too bad for me ^^

Also, there are RegExpRules' notes below, apparently more is required… I'll look at that now…

nope
nope
8 years ago

I would try sed, probably smth like that:

sed -i -E ‘s/<img src="/<img width="\1" height="\2" src="/g' dump.sql

http://pastebin.com/PvNBU8gz

Oliver AKA The Admin
Admin
8 years ago
Reply to  nope

Thanks Nope!

May I ask you, that part,
width="\1" height="\2"
(or whatever remains of it after I click to send this comment ^^;;)
is it supposed to paste back the contents of the previous width and height ?

No shame in admitting, I don't see what a \1 and \2 would work for :o

John Doe
John Doe
8 years ago

Yes. Each N pastes the content of the Nth parenthesis block of the search pattern, for sHort.

olivier
olivier
8 years ago

do a script in whatever language to do the job : retrieve data, if the data match your regexp, update the line.

Oliver AKA The Admin
Admin
8 years ago
Reply to  olivier

Yeah, but that's beyond my skill. I always screw up with regexps, it's as if I had a block :-/

RegExpRules
RegExpRules
8 years ago

Notepad++ does handle large files (100+ MB) and does allow regexp operations on it. Not sure about the performances, though.
Anyway:
search for: <td width="(d+)" height="(d+)"><img ([^>]+)>s*</td>
replace with: <img width="$1" height="$2" $3>
This should get rid of the closing </td> tag (granted there are just white spaces between <img> and </td>).

Use regexr dot com to test your regular expressions.

Oliver AKA The Admin
Admin
8 years ago
Reply to  RegExpRules

Thank you :)

First things first, N++ cannot handle the size of my entire database dump (I imagine it's linked to RAM and how the internal Scintilla engine of N++ dislikes large files, in my case it's above 400 MB that N++ gives up on opening files), which is why I'm reluctant to using N++ for this, as I would have to extract only my posts table, and reintegrating it alone would be fairly more complicated.

Could you provide in a pastebin the code you were giving please ?
It's already proven the comments engine eats backslashes for dinner, supper, breakfast and eleven'o'clock snack, maybe some other chars are eaten away.

Lastly : there's a website allowing to TEST how regular expressions are taken in consideration ?!? O_o

RegExpRules
RegExpRules
8 years ago

Yes, 400 MB is quite over reasonable boundaries for Nodepad++, but there's a chance it can work anyway.
In fact, N++ can search and replace in files without even opening it (Search in files…). This will also permanently change the file, so make backups!

This is the pastebin: /zjQQYa1E
Anyway, it only swallowed backslashes in front of d+ and s*.

And yes, regexr will allow you to test your regular expressions, but beware: its regexp engine is quite feature rich, while most environment don't support as many regexp features. Notepad++ used to be quite poor with regexps, but improved a lot with version 6.x.

toomyzoom
toomyzoom
8 years ago

Hi, this is the solution using sed

1st:

sed -r -e ‘s/<img src="/ If you want to replace the output only, no change to original file.

Example:

printf ‘<img src="' | sed -r -e 's/<img src="/<img width="\1" height="\2" src="/g'

Output: <img width="500" height="600" src="

This one piped your original text and output your desired text so you can save it to a new file

2nd:

sed -r -e -i 's/<img src="/<img width="\1" height="\2" src="/g' yourfilenamehere

Pay attention to the -i switch, it actually modifies your text inplace, not the ouput.

toomyzoom
toomyzoom
8 years ago
Reply to  toomyzoom

Sorry, your site doesn't allow inline code in comment
Here is what I tried to say http://pastebin.com/fbQsT3cy

David
David
8 years ago

Hello Oliver.

FINALLY I can help Oliver.
My first comment as well. THANK YOU for your hentai :) the best site on the web :)

Does this help?

FIND:
<td width="(.*?)" height="(.*?)"><img src="

REPLACE:
<img width="$1" height="$2" src="

GLR
GLR
8 years ago

What a nice challage! Ok, I wrote and casually tested a MySQL function that do just this thing
http://pastebin.com/iKQ99VW6

Notes: It works on strings up to 4kb long, may be slow (analyzes the string char by char) and may fail on some non-valid corner case scenarios.