Developer forum

Forum » CMS - Standard features » How does the "normalise characters" feature work?

How does the "normalise characters" feature work?

Signe
Reply

Hi all

We need to redirect a lot of URLs on a website, because links have been indexed without beeing "normalised". So we want to enable the "Normalize latin characters (ø->oe, é->e etc.)" feature in the Management Center, but since it unfortunately doesn't redirect the old indexed URLs, we need to do this manually.

So we need to know which rules DW uses to rewrite the URLs when "normalize latin characters" is enabled. Ie like explained "ø" becomes "oe", but what about all the other characters like í etc.? There must be some sort of list somewhere :)

And also: What about russian letters - what are the rules here?

 


Replies

 
Nicolai Pedersen
Reply

Hi Signe

In Dynamicweb 8 we have a character table that is used to make the replace of characters (Originally crafted by a guy called Evan Stein).

The table is below. The base rules is that most characters we simply remove the diacritics - that would be the / from ø the 'bolle' over å, the acute from é etc. With a few exceptions - i.e. the ø replaced with oe. Which version are you using? We can probably make a simple change to do the replace automatically.

In Dynamicweb 9 we have a different approach.

A: A
B: B
C: C
D: D
E: E
F: F
G: G
H: H
I: I
J: J
K: K
L: L
M: M
N: N
O: O
P: P
Q: Q
R: R
S: S
T: T
U: U
V: V
W: W
X: X
Y: Y
Z: Z
a: a
b: b
c: c
d: d
e: e
f: f
g: g
h: h
i: i
j: j
k: k
l: l
m: m
n: n
o: o
p: p
q: q
r: r
s: s
t: t
u: u
v: v
w: w
x: x
y: y
z: z
ª: a
º: o
À: A
Á: A
Â: A
Ã: A
Ä: A
Å: Aa
Æ: Ae
Ç: C
È: E
É: E
Ê: E
Ë: E
Ì: I
Í: I
Î: I
Ï: I
Ð: D
Ñ: N
Ò: O
Ó: O
Ô: O
Õ: O
Ö: O
Ø: Oe
Ù: U
Ú: U
Û: U
Ü: U
Ý: Y
Þ: Th
ß: s
à: a
á: a
â: a
ã: a
ä: a
å: aa
æ: Ae
ç: c
è: e
é: e
ê: e
ë: e
ì: i
í: i
î: i
ï: i
ð: d
ñ: n
ò: o
ó: o
ô: o
õ: o
ö: o
ø: oe
ù: u
ú: u
û: u
ü: u
ý: y
þ: th
ÿ: y
Ā: A
ā: a
Ă: A
ă: a
Ą: A
ą: a
Ć: C
ć: c
Ĉ: C
ĉ: c
Ċ: C
ċ: c
Č: C
č: c
Ď: D
ď: d
Đ: D
đ: d
Ē: E
ē: e
Ĕ: E
ĕ: e
Ė: E
ė: e
Ę: E
ę: e
Ě: E
ě: e
Ĝ: G
ĝ: g
Ğ: G
ğ: g
Ġ: G
ġ: g
Ģ: G
ģ: g
Ĥ: H
ĥ: h
Ħ: H
ħ: h
Ĩ: I
ĩ: i
Ī: I
ī: i
Ĭ: I
ĭ: i
Į: I
į: i
İ: I
ı: i
IJ: I
ij: i
Ĵ: J
ĵ: j
Ķ: K
ķ: k
ĸ: k
Ĺ: L
ĺ: l
Ļ: L
ļ: l
Ľ: L
ľ: l
Ŀ: L
ŀ: l
Ł: L
ł: l
Ń: N
ń: n
Ņ: N
ņ: n
Ň: N
ň: n
ʼn: 'n
Ŋ: NG
ŋ: ng
Ō: O
ō: o
Ŏ: O
ŏ: o
Ő: O
ő: o
Œ: OE
œ: oe
Ŕ: R
ŕ: r
Ŗ: R
ŗ: r
Ř: R
ř: r
Ś: S
ś: s
Ŝ: S
ŝ: s
Ş: S
ş: s
Š: S
š: s
Ţ: T
ţ: t
Ť: T
ť: t
Ŧ: T
ŧ: t
Ũ: U
ũ: u
Ū: U
ū: u
Ŭ: U
ŭ: u
Ů: U
ů: u
Ű: U
ű: u
Ų: U
ų: u
Ŵ: W
ŵ: w
Ŷ: Y
ŷ: y
Ÿ: Y
Ź: Z
ź: z
Ż: Z
ż: z
Ž: Z
ž: z
ſ: s
ƀ: b
Ɓ: B
Ƃ: B
ƃ: b
Ƅ: 6
ƅ: 6
Ɔ: O
Ƈ: C
ƈ: c
Ɖ: D
Ɗ: D
Ƌ: D
ƌ: d
ƍ: d
Ǝ: E
Ə: E
Ɛ: E
Ƒ: F
ƒ: f
Ɠ: G
Ɣ: G
ƕ: hv
Ɩ: I
Ɨ: I
Ƙ: K
ƙ: k
ƚ: l
ƛ: l
Ɯ: M
Ɲ: N
ƞ: n
Ɵ: O
Ơ: O
ơ: o
Ƣ: OI
ƣ: oi
Ƥ: P
ƥ: p
Ʀ: YR
Ƨ: 2
ƨ: 2
Ʃ: S
ƪ: s
ƫ: t
Ƭ: T
ƭ: t
Ʈ: T
Ư: U
ư: u
Ʊ: u
Ʋ: V
Ƴ: Y
ƴ: y
Ƶ: Z
ƶ: z
Ʒ: Z
Ƹ: Z
ƹ: Z
ƺ: z
ƻ: 2
Ƽ: 5
ƽ: 5
ƾ: ´
ƿ: w
ǀ: !
ǁ: !
ǂ: !
ǃ: !
DŽ: DZ
Dž: DZ
dž: d
LJ: Lj
Lj: Lj
lj: lj
NJ: NJ
Nj: NJ
nj: nj
Ǎ: A
ǎ: a
Ǐ: I
ǐ: i
Ǒ: O
ǒ: o
Ǔ: U
ǔ: u
Ǖ: U
ǖ: u
Ǘ: U
ǘ: u
Ǚ: U
ǚ: u
Ǜ: U
ǜ: u
ǝ: e
Ǟ: A
ǟ: a
Ǡ: A
ǡ: a
Ǣ: Ae
ǣ: Ae
Ǥ: G
ǥ: g
Ǧ: G
ǧ: g
Ǩ: K
ǩ: k
Ǫ: O
ǫ: o
Ǭ: O
ǭ: o
Ǯ: Z
ǯ: Z
ǰ: j
DZ: DZ
Dz: DZ
dz: dz
Ǵ: G
ǵ: g
Ƕ: hv
Ƿ: w
Ǹ: N
ǹ: n
Ǻ: A
ǻ: a
Ǽ: Ae
ǽ: Ae
Ǿ: O
ǿ: o
Ȁ: A
ȁ: a
Ȃ: A
ȃ: a
Ȅ: E
ȅ: e
Ȇ: E
ȇ: e
Ȉ: I
ȉ: i
Ȋ: I
ȋ: i
Ȍ: O
ȍ: o
Ȏ: O
ȏ: o
Ȑ: R
ȑ: r
Ȓ: R
ȓ: r
Ȕ: U
ȕ: u
Ȗ: U
ȗ: u
Ș: S
ș: s
Ț: T
ț: t
Ȝ: Z
ȝ: z
Ȟ: H
ȟ: h
Ƞ: N
ȡ: d
Ȣ: OU
ȣ: ou
Ȥ: Z
ȥ: z
Ȧ: A
ȧ: a
Ȩ: E
ȩ: e
Ȫ: O
ȫ: o
Ȭ: O
ȭ: o
Ȯ: O
ȯ: o
Ȱ: O
ȱ: o
Ȳ: Y
ȳ: y
ȴ: l
ȵ: n
ȶ: t
ɐ: a
ɑ: a
ɒ: a
ɓ: b
ɔ: o
ɕ: c
ɖ: d
ɗ: d
ɘ: e
ə: e
ɚ: e
ɛ: e
ɜ: e
ɝ: e
ɞ: e
ɟ: j
ɠ: g
ɡ: g
ɢ: G
ɣ: g
ɤ: y
ɥ: h
ɦ: h
ɧ: h
ɨ: i
ɩ: i
ɪ: I
ɫ: l
ɬ: l
ɭ: l
ɮ: lz
ɯ: m
ɰ: m
ɱ: m
ɲ: n
ɳ: n
ɴ: N
ɵ: o
ɶ: OE
ɷ: o
ɸ: ph
ɹ: r
ɺ: r
ɻ: r
ɼ: r
ɽ: r
ɾ: r
ɿ: r
ʀ: R
ʁ: r
ʂ: s
ʃ: s
ʄ: j
ʅ: s
ʆ: s
ʇ: y
ʈ: t
ʉ: u
ʊ: u
ʋ: u
ʌ: v
ʍ: w
ʎ: y
ʏ: Y
ʐ: z
ʑ: z
ʒ: z
ʓ: z
ʔ: '
ʕ: '
ʖ: '
ʗ: C
ʘ: O˜
ʙ: B
ʚ: e
ʛ: G
ʜ: H
ʝ: j
ʞ: k
ʟ: L
ʠ: q
ʡ: '
ʢ: '
ʣ: dz
ʤ: dz
ʥ: dz
ʦ: ts
ʧ: ts
ʨ:
ʩ: fn
ʪ: ls
ʫ: lz
ʬ: w
ʭ: t
ʮ: h
ʯ: h
ʰ: h
ʱ: h
ʲ: j
ʳ: r
ʴ: r
ʵ: r
ʶ: R
ʷ: w
ʸ: y
ˡ: l
ˢ: s
ˣ: x
ˤ: '
ᴀ: A
ᴁ: Ae
ᴂ: Ae
ᴃ: B
ᴄ: C
ᴅ: D
ᴆ: TH
ᴇ: E
ᴈ: e
ᴉ: i
ᴊ: J
ᴋ: K
ᴌ: L
ᴍ: M
ᴎ: N
ᴏ: O
ᴐ: O
ᴑ: o
ᴒ: o
ᴓ: o
ᴔ: oe
ᴕ: ou
ᴖ: o
ᴗ: o
ᴘ: P
ᴙ: R
ᴚ: R
ᴛ: T
ᴜ: U
ᴝ: u
ᴞ: u
ᴟ: m
ᴠ: V
ᴡ: W
ᴢ: Z
ᴣ: EZH
ᴤ: '
ᴥ: L
ᴬ: A
ᴭ: Ae
ᴮ: B
ᴯ: B
ᴰ: D
ᴱ: E
ᴲ: E
ᴳ: G
ᴴ: H
ᴵ: I
ᴶ: J
ᴷ: K
ᴸ: L
ᴹ: M
ᴺ: N
ᴻ: N
ᴼ: O
ᴽ: OU
ᴾ: P
ᴿ: R
ᵀ: T
ᵁ: U
ᵂ: W
ᵃ: a
ᵄ: a
ᵆ: Ae
ᵇ: b
ᵈ: d
ᵉ: e
ᵊ: e
ᵋ: e
ᵌ: e
ᵍ: g
ᵎ: i
ᵏ: k
ᵐ: m
ᵑ: g
ᵒ: o
ᵓ: o
ᵔ: o
ᵕ: o
ᵖ: p
ᵗ: t
ᵘ: u
ᵙ: u
ᵚ: m
ᵛ: v
ᵢ: i
ᵣ: r
ᵤ: u
ᵥ: v
ᵫ: ue
Ḁ: A
ḁ: a
Ḃ: B
ḃ: b
Ḅ: B
ḅ: b
Ḇ: B
ḇ: b
Ḉ: C
ḉ: c
Ḋ: D
ḋ: d
Ḍ: D
ḍ: d
Ḏ: D
ḏ: d
Ḑ: D
ḑ: d
Ḓ: D
ḓ: d
Ḕ: E
ḕ: e
Ḗ: E
ḗ: e
Ḙ: E
ḙ: e
Ḛ: E
ḛ: e
Ḝ: E
ḝ: e
Ḟ: F
ḟ: f
Ḡ: G
ḡ: g
Ḣ: H
ḣ: h
Ḥ: H
ḥ: h
Ḧ: H
ḧ: h
Ḩ: H
ḩ: h
Ḫ: H
ḫ: h
Ḭ: I
ḭ: i
Ḯ: I
ḯ: i
Ḱ: K
ḱ: k
Ḳ: K
ḳ: k
Ḵ: K
ḵ: k
Ḷ: L
ḷ: l
Ḹ: L
ḹ: l
Ḻ: L
ḻ: l
Ḽ: L
ḽ: l
Ḿ: M
ḿ: m
Ṁ: M
ṁ: m
Ṃ: M
ṃ: m
Ṅ: N
ṅ: n
Ṇ: N
ṇ: n
Ṉ: N
ṉ: n
Ṋ: N
ṋ: n
Ṍ: O
ṍ: o
Ṏ: O
ṏ: o
Ṑ: O
ṑ: o
Ṓ: O
ṓ: o
Ṕ: P
ṕ: p
Ṗ: P
ṗ: p
Ṙ: R
ṙ: r
Ṛ: R
ṛ: r
Ṝ: R
ṝ: r
Ṟ: R
ṟ: r
Ṡ: S
ṡ: s
Ṣ: S
ṣ: s
Ṥ: S
ṥ: s
Ṧ: S
ṧ: s
Ṩ: S
ṩ: s
Ṫ: T
ṫ: t
Ṭ: T
ṭ: t
Ṯ: T
ṯ: t
Ṱ: T
ṱ: t
Ṳ: U
ṳ: u
Ṵ: U
ṵ: u
Ṷ: U
ṷ: u
Ṹ: U
ṹ: u
Ṻ: U
ṻ: u
Ṽ: V
ṽ: v
Ṿ: V
ṿ: v
Ẁ: W
ẁ: w
Ẃ: W
ẃ: w
Ẅ: W
ẅ: w
Ẇ: W
ẇ: w
Ẉ: W
ẉ: w
Ẋ: X
ẋ: x
Ẍ: X
ẍ: x
Ẏ: Y
ẏ: y
Ẑ: Z
ẑ: z
Ẓ: Z
ẓ: z
Ẕ: Z
ẕ: z
ẖ: h
ẗ: t
ẘ: w
ẙ: y
ẚ: a
ẛ: s
Ạ: A
ạ: a
Ả: A
ả: a
Ấ: A
ấ: a
Ầ: A
ầ: a
Ẩ: A
ẩ: a
Ẫ: A
ẫ: a
Ậ: A
ậ: a
Ắ: A
ắ: a
Ằ: A
ằ: a
Ẳ: A
ẳ: a
Ẵ: A
ẵ: a
Ặ: A
ặ: a
Ẹ: E
ẹ: e
Ẻ: E
ẻ: e
Ẽ: E
ẽ: e
Ế: E
ế: e
Ề: E
ề: e
Ể: E
ể: e
Ễ: E
ễ: e
Ệ: E
ệ: e
Ỉ: I
ỉ: i
Ị: I
ị: i
Ọ: O
ọ: o
Ỏ: O
ỏ: o
Ố: O
ố: o
Ồ: O
ồ: o
Ổ: O
ổ: o
Ỗ: O
ỗ: o
Ộ: O
ộ: o
Ớ: O
ớ: o
Ờ: O
ờ: o
Ở: O
ở: o
Ỡ: O
ỡ: o
Ợ: O
ợ: o
Ụ: U
ụ: u
Ủ: U
ủ: u
Ứ: U
ứ: u
Ừ: U
ừ: u
Ử: U
ử: u
Ữ: U
ữ: u
Ự: U
ự: u
Ỳ: Y
ỳ: y
Ỵ: Y
ỵ: y
Ỷ: Y
ỷ: y
Ỹ: Y
ỹ: y
ⁱ: i
ⁿ: n
K: K
Å: A
ℬ: B
ℭ: C
ℯ: e
ℰ: E
ℱ: F
Ⅎ: F
ℳ: M
ℴ: 0
℺: 0
⅁: G
⅂: L
⅃: L
⅄: Y
ⅅ: D
ⅆ: d
ⅇ: e
ⅈ: i
ⅉ: j
ff: ff
fi: fi
fl: fl
ffi: ffi
ffl: ffl
ſt: st
st: st
A: A
B: B
C: C
D: D
E: E
F: F
G: G
H: H
I: I
J: J
K: K
L: L
M: M
N: N
O: O
P: P
Q: Q
R: R
S: S
T: T
U: U
V: V
W: W
X: X
Y: Y
Z: Z
a: a
b: b
c: c
d: d
e: e
f: f
g: g
h: h
i: i
j: j
k: k
l: l
m: m
n: n
o: o
p: p
q: q
r: r
s: s
t: t
u: u
v: v
w: w
x: x
y: y
z: z

 
Signe
Reply

Hi Nicolai

Thank you for  your reply :)

The website is using v. 8.6.1.14 :) Can we do something smart here, you think? 

 
Nicolai Pedersen
Reply

Hi Signe

If you can upgrade the customer to latest 8.8.1, we can give you a release with a built in feature to do this.

The alternative is to write a small notification subscriber (Dynamicweb.Notifications.Standard.Application.BeginRequest) that is able to detect if the URL contains the unwanted characters, normalize the latin characters of that, and do a 301 redirect to the right normalized URL. That would take app an hour to do.

BR Nicolai

 
Signe
Reply

Hi Nicolai

Sounds good - should I email you (?) about the notification subscriber? Because that sounds like a solution we would like to go with :)

 

 
Nicolai Pedersen
Reply

Yes, if you want us to do it.

 
Signe
Reply

Hi Nicolai

We were able to solve this with the help of the list you send us - thanks :)

But what about the russian characteres. They don't seem to be normalised? 

See here: http://rasi.d.pr/gX5t/CNh7vr1v

How can we ensure seo friendly urls with DW here?

 

 
Nicolai Pedersen
Reply

Hi Signe

Glad you figured it out.

According to Google, non-latin characters in URLs are not a problem: https://productforums.google.com/forum/?hl=en#!category-topic/webmasters/crawling-indexing--ranking/CrjbCMU8MtM

Wikipidia uses russion letters in their URLs: https://ru.wikipedia.org/wiki/

It is a W3 standard: https://www.w3.org/International/articles/idn-and-iri/

And according to other SEO experts, its just fine: http://webmasters.stackexchange.com/questions/92163/seo-impact-of-using-non-latin-characters-in-url

Same goes for æ, ø and å.

So my advice would be - let it be, it is just fine and probably a benefit for russian queries in Google. The URLs are search friendly already.

But if you want, it can be converted using code: http://stackoverflow.com/questions/1841874/how-to-transliterate-cyrillic-to-latin-text

You can also use the URL field on page properties to override what names are used in URLs, see manual: http://doc.dynamicweb.com/documentation-8/content/content/pages#sideNavTitle1-1-7

Hope this clarifies.

BR Nicolai

 
Signe
Reply

Hi Nicolai

Russian letters are okay for SEO, yes, but for tracking (see previous screenshot -  http://rasi.d.pr/gX5t/CNh7vr1v) it doesn't work properly (and some URLs are also difficult to handle for Analytics and GTM because the URLS are encoded as very long strings), so best practice would be to normalize these letteres as well. Can DW handle this?

 

 
Nicolai Pedersen
Reply

Hi Signe

DW cannot convert russian URLs to something else automatically.

Can you provide some links where Google verifies this best practice?

My experience with GTM and i.e. in page analytics using Google Analytics is limited, but reading their docs tells me that it should not be a problem in general, but some characters which are not a standard UTF-8 character could be an issue that can be solved by converting the URL to punycode in the tracking script.

BR Nicolai

 

You must be logged in to post in the forum