Major multilinguism URL issues

  • Further, as you create new URLs for separate languages, why don't you do so for all pages?


    https://www.domain.com/
    https://www.domain.com/members-list/
    https://www.domain.com/gallery/
    https://www.domain.com/blog/


    Etc.


    What this means is that parts of a multilingual website will get indexed and rank for SEO normally BUT some will compete with each other and confuse Google.


    When searching Google for Woltlab you can find many problems.


    Basically if a new URL is created Google indexes one for DE and one for EN, which is fine. New URL for new language.


    But if the page URL is the same and only the language changes, Google gets confused. With the app Startpages, memberlist, entrylist, imagelist and many other pages, there is no new URL for each language.


    It means that Google doesn't know which language to display in results. My homepage could one day be displayed in English and another in Russian depending what Google saw when it crawled that day, what lang it chose.


    I would consider both these issues bugs and this second one very serious.


    There are numerous documents from Google clearly stating different languages need to be on new URLs. They should all be domain.com/de/ or de.domain.com as a hard rule.

  • @Alexander Ebert @Marcel Werk


    Here is a list of woltlab.com user profile pages indexed in Google.


    https://www.google.co.uk/search?hl=en&safe=off&q=site:community.woltlab.com&btnG=Search&gws_rd=cr,ssl&ei=jAXRVMfSLNLWatrJgcgO#safe=off&hl=en&q=site:community.woltlab.com%2Fuser%2F


    Because you use the same URL for users pages, regardless of the language selected Google gets confused and indexes some in German and Some in English.


    The next time they crawl the pages may be indexed in a different language again, depending what Google sees that day, which is pure chance.


    Granted user profiles are not that important.


    But what happens with other important pages?


    The forum homepage for example.


    One day its listed in Google in English, the next day German.


    Do you think that is good for rankings, changing a pages main language on a daily basis?


    I'm afraid you really need to give this some thought, the only solution I can see is global secondary pages when multilinguism is enabled.


    So all pages must have a new URL when another language is enabled. Everything...

  • The content Google sees depends on some factors. One of them is the browser language. The software automatically switches the language depending on the browser language.


    Multiple content languages in Google are just a result of multiple crawlers using multiple user agents and browser languages.


    If that's not the reason, Google needs to be stopped from calling the website using the l-Parameter (using robots.txt for example), because that's how the language switches "manually".


    The other "issues" you mentioned are also depending on some factors. One of them is the filename of the used controller. While the default controller set is always named in english, others might use german names and there's no way to translate them in the URL (so you cannot call /Beispiel/ if the controller is named ExamplePage for example to access the same controller).


    Cyrillic letters are not being displayed in the URL because they are not allowed to be a part of it nor to be a filename. There can't be a file named "примерPage.class.php" so no URL /пример. If a topic title is being added to the URL, Cyrillic letters might be stripped out or urlencoded. That's how the internet works :)

  • Thanks for the reply. I do understand why this all happens, the question is how to fix it.


    Regarding Google crawling. A new URL should be created for all pages in another language created using the multilingualism feature.


    It must be implemented completely.


    There are two areas, non user generated and user generated.


    Non user generated like forum categories are already served via two URLs, one for each language. This needs to be extended for all pages. Homepages, list pages, terms and conditions, everything.


    User generated pages are also currently catered for, but only in part. Profiles should also be multilingual like threads and blog posts because if a user signs up in Russian, they will fill their profile in Russian, so why serve that page from an English url, with English menu etc. Then on the next day let Google decide again.


    To conclude, all pages mush have their own URL, two languages cannot share the same URL, it is not the way it is done. It is not the way "THE INTERNET WORKS" ;)


    Regarding Cyrillics in URLs i understand it is not possible now, so for now lets focus on the big issue.

  • To conclude, all pages mush have their own URL, two languages cannot share the same URL, it is not the way it is done. It is not the way "THE INTERNET WORKS"

    Actually, that is what the Accept-Language Header is for, and thats also what Woltlab uses if the l-parameter is absent.


    If the l parameter is set, the proper language is used as denoted by the pramater. So yeah, if the parameter is set, you already have two URLs for each language. If it is not set, then the Accept-Language header of the request is checked, which is a perfectly valid way to determine the preferred language of the client.


    I agree with you that some optimizations can be done - however, some things should also be considered:
    Cyrillic letters in URLs will never happen, as the protocol simply doesn't allow them. You would always only have url encoded values, which, depending on the browsers, look way more ugly then simply english urls. Take example.com/пример for example. If you encode this to be a valid URL, you'd get example.com/%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80. Some, but not all, browsers might pretty-print it as "example.com/пример". So I doubt WL will implement such a thing, as it would be dependent on browser-specific wether it looks good or not. An alternate approach might be to use the ISO 9 system to transcribe russian characters into latin ones (test it here http://www.lexilogos.com/keyboard/russian_conversion.htm).


    You are right, offering a way to translate routes e.g. example.com/user/1-xxx to example.com/член/1-xxx would be nice (please don't nail me on the accuracy of that translation, I just put that through google translate). Maybe I'll develop a plugin for that.


    Basically, all your issues mentioned boild down to the same source: the controller part of the URL is not translated.

    "A life is like a garden. Perfect moments can be had, but not preserved, except in memory. LLAP" — Leonard Nimoy

  • ?l= is only appended for the first page load, subsequent page views remove it, meaning it is NOT a new URL. It is simply for tracking, like cookies and is not best practice.


    This is not about detecting language, it is about duplicate content, mixed language URLs, massive technical shortcomings regarding SEO, crawling and search engines.


    If a page content changes significantly, ie it is in a completely new language, it should be on a new URL of it's own. Period.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!