An update regarding my database encoding issues
Hello guys, thank you very much for the help and suggestions you gave regarding my database encoding issues!
I haven’t replied to anyone, but there has been lots of feedback, I’m truly grateful
(If you missed it, I explain everything in the call for help I posted 2 days ago)
I thought I could post an udpate on the topic. Once again, I’m humbly asking for your opinion and feedback regarding the progress so far
*
-1- Several suggestions I received were too technical and/or too generic for me. Telling that’s a broad problem requiring to check on everything from Apache to the headers… well, yeah… but I’m skilled in neither of these. Jack of all trades, master of none, I may host websites but it’s not my job and I’m self-taught with what I know.
-2- I received 2 groups of solutions that would work. They’re not 100% satisfactory because they would cure the symptoms of the problem but not address the problem itself, would you see the idea? If I apply them, I suspect that, the next time I must dump and re-import the database of Hentairules from a server to a new server, the problem may happen again.
*
Those solutions that would work are:
*
(A) A massive list of SQL commands to run, UPDATE wp_posts SET post_content = REPLACE, targeting each of the characters that was transformed. Issues: some characters pose a problem (e.g. damn you semicolon), and there’s a strong chance to forget some rarely used bug-characters in the process.
I occasionally replace myself strings like that in the database of Hentairules (unless you ask the waybackmachine, you’ll never have proof I made the same spelling mistake in a certain verb during 5 years, oh the embarrassment), so it’s a procedure I’m not stranger to.
*
(B) the Notepad++ route. I was kindly suggested by a commenter called Andrew M to open the .SQL database dump in Notepad++, click to Encode > Convert to ANSI, and then, afterwards, Encode > Encode in UTF-8.
That method makes almost everything become normal again, save rare exceptions such as the characters in the middle of kaomojis like ?\_(?)_/?
EDIT: oh, god damnit. There’s another issue. I pasted a working kaomoji and got a question mark in its middle when I save my post. On the previous server, I could post kaomojis into my posts, they would work. Screenshot:
I thought it wouldn’t work with me, as the Notepad++ would freeze on me when I asked it to convert even a naked dump of just wp_posts (“only” 170 MB for its sql file).
However, in fact, if after I click to ask for a conversion or encoding, I leave the program alone for several minutes without clicking anymore inside N++, then when I’m back to my PC, the N++ is responding again and agrees to save the file, WOOT!! YESSSSS!!!
I’ve tested that wp_posts table, reintegrated into a test database, and indeed, the problem is gone for all the old posts.
…
… And yet, I’m not sure it is REALLY the final fix I must apply.
Please, tell me if I’m wrong, but my problem, in the past, came from the fact my database was in Latin1 instead of UTF8, or even better, UTF8-mb4, but I attempted to make it work in an UTF-8 environment, right?
So, even though I ask the Notepad++ to convert the file to Notepad++, after I upload the .sql file back to my server and execute it as a query to re-import it, my database engine will be none the wiser and will still think it’s latin1, right?
I confirmed it with phpmyadmin, the test database, its wp_posts was still marked as being latin1.
Thus, the same improper export-import could happen again in the future if I’m unlucky ?
Following that, should I make additional operations, to “officially” make it an UTF-8 or UTF-8-mb4 database?
Or is it now okay, really okay, no future worries, to leave it as latin-1? What would you think?
(If the info is relevant, here is the encoding of my blog tables)
*
Unless someone comes up with the magical SQL query that fixes every problem and leaves me with an UTF-8mb4 database
*
So, yeah, I think at the end of the day, the question would be: do you think I can safely go with the Notepad++ convert+encode route, and feel safe for the future, or must additional operations be performed? In your opinion?
Once again, thank you if you can provide feedback or help, I’m seriously grateful to you guys
*
Super late edit: scroll up, control-F “kaomoji”: and there’s the problem I can’t paste kaomojis anymore into the posts without their central character becoming corrupted, I could post them on the previous server that hosted hentairules, and yet, I tripled-checked, still latin1_swedish_ci, same encoding in phpmyadmin on the two servers, old and new.
It can’t be normal, maybe it’s also a symptom, or a clue? O_o
If you send a list of tables with fields, i can do a sql script that applies the solution on the link you posted before.
Link > https://coderwall.com/p/gjyuwg/mysql-convert-encoding-to-utf8-without-garbled-data
Email just sent ^_^
If you don’t receive it, lmk or check your spam folder.
Hello Oliver
Have you removed the site RSS feed?
I get an error trying to access it from my Reader, and the validator gives a 500 Error.
Someone else reported about it.
Apparently, the plugin for the blog feed that I have been using without pause over the last ten years, and that has never received a single update since then, isn’t compatible with latest server and wordpress software anymore.
I had to deactivate it, because it would still attempt to hijack the feed generation but would die in the process, leading to a whole lot of nothing.
Hopefully (I’ve got SO MUCH on my plate, I really really hope it’s enough!), give it an hour or two for the cache to be cleared, and it will work again, even if it’s a much more basical way.
LMK if it’s still broken, like, tomorrow?
The feed began feeding again. Yey!
Now, that’s a relief.
Does it work normally, say?
For me it seemed to, for one batch of posts at least (which I checked against the site to make sure nothing was missing). I’ll keep an eye on it as well.
It was working yesterday, but now it seems to be invalid.
The feed? I tested just now in Chrome (never worked for me in Firefox, even in old times, it launched the download of a file), and it works O_o
The RSS reader I use is quite fast to refuse a half-wrong feed, but I also tried with Thunderbird and it refuses it.
If I open it on Firefox, the last entry is Mujaki Na Kaibutsu, and coincidentally the validator points to an error in the following post (azukiko-single-mother-to-issho-ni-boku-no-mamakatsu-1)
http://www.feedvalidator.org/check.cgi?url=http%3a%2f%2fwww.hentairules.net%2ffeed%2f
Do a mySQL dump schema on both databases (the old one and the new version), and post it here. There’s a setting for character set when you create the table. You probably did it wrong/different in the new database.
https://www.mysqltutorial.org/mysql-character-set/
https://electrictoolbox.com/mysqldump-schema-only/
Erm, thanks for the suggestion FSK, if you want to take a look, me I only see as true difference the change from utf-8 (“the fake”) to utf-8 mb4 (“the true”).
old
https://www.hentairules.net/misc/structureoldserver.sql
new
https://www.hentairules.net/misc/structurenewserver.sql
Which is weird, no? As in phpmyadmin, well… this is like that:
Hey Oliver,
not only the charset matters, also the collate (which is what you see in your screenshot). It is in charge of how things are sorted when you do comparison SQL Queries.
But still in both dumps your charset is still either latin1 or utf8 and not the utf8mb4 you want to use ;-)
Mathias https://mathiasbynens.be/notes/mysql-utf8mb4 has written an excelent guide on that matter, but it is very technical.
I replied in another comment you wrote :o
Hey Oliver,
Notepad++ in gerneal is fine, but as you noticed, doing replaces on huge files is not the task it was made for. For such use a linux cli tool called sed, with this little champ you could work on 50GB+ Datafiles ;-)
Still people are correct, when they state you got to check other stuff as well. Currently your HTML Headers are fine, they state utf8. But the php connect command also has to use utf8 on connecting. Last but not least (current WIP problem) you data has to be stored in utf8.
p.s. no magic SQL command, but some that could get quite close.
But to answer your questions:
– generally move to utf8mb4
– Notepad++ convert+encode route: fesable but would not recommend. Use tools like sed, iconv etc. which are made for the job
– first fix your data encoding
– second fix maybe old leftover stuff with sed regex replace commands
– third fix table struc charset and collate
– forth import data correctly
– fifth use mysqldump with correct parameters to backup your DB (if possible) and not PHPmyadmin
Have a good day sir and stay healthy!
A quick answer, it’s getting late for me:
– thanks

– yep, my bad, your other comment, I didn’t see that line with utf8mb4 was a comment type, argh. So, ut8 and later latin1, orz.
– notepad vs sed or iconv: the problem is that… wait, you know that old saying, that technical things that are too advanced are accepted without thinking anymore, and treated like magic, right? This is it with Notepad++ fixes almost all encoded characters: no fucking clue what happened and why it worked, but it worked
To reproduce it it with iconv (which I very rarely used in the past, not even a clue why anymore, it’s old) or sed (I may have heard of it before; or not, not even sure), while I have no idea *what* has actually happened in N++, that’s… not very feasable, I shall delicately say.
I tried one or two iconv lines, I mention them in the previous post, but there’s no telling what lied behind the thing N++ called ANSI, to make it even trickier.
And maybe that’s not even why it worked with N++.
– “first fix your data encoding” – not sure what you mean, it’s too generic in my unskilled eyes. Do you mean convert latin1 tables to utf8mb4?
Haaaaaaa! I just thought. I could test that, the Notepad++ method, re-import it into the test database, and then convert the posts table to utf8mb4. Hell, nothing to lose, that’s what test databases are for, a control-S to tell my blog to use the test database, an F5 to see what it’s like, and back to the wordpress config file to tell it to use the official database again.
– fix the leftover stuff: wordpress search and replace queries are an old friend of mine. However, that will be ran last, you don’t patch a broken thing, you patch an almost working thing.
I said I would be quick and look at the time it took me to respond. I’ll end it now or I’ll lose precious sleep
Take care, and if my weirdly written replies mean something, feel free to react lol
Haha its late for me too. What I mean with fix the data is the big SQL Dump you have (from whatever it is to uft8 encoding, then the wierd leftover stuff). Then create the correct table stucture and import.
I also send you a very technical email :-) which bascially says make a plan, then execute and test
Oh my, that kaomoji is cutely lewd! You pervy Oliveey~?
Your RSS feed is broken now.