Implemented Changing a few words in the filter

Discussion in 'Archive (Suggestion and Feedback)' started by LordFungi, May 15, 2016.

Thread Status:
Not open for further replies.
  1. tyler489

    tyler489 Well-Known Member

    Messages:
    1,873
    Likes Received:
    202
    Local Time:
    10:48 AM
    Im going to add chugga_fan to the censored word list ;) that way everything he says is censored...

    Because it works that way right jk
     
    Yorinar and F4lconwings like this.
  2. F4lconwings

    F4lconwings Well-Known Member

    Messages:
    631
    Likes Received:
    264
    Local Time:
    5:48 PM
    11/10
     
  3. Yorinar

    Yorinar Well-Known Member

    Messages:
    101
    Likes Received:
    27
    Local Time:
    11:48 AM
    Because servers can't be constantly policed, automatic filtering is necessary. And because simply filtering words isn't going to be able to take context and the flexibility of language into account, there will always be false positives ("i worked hard on this") and false negatives ("this list sucks"). Ultimately, the issue is keeping discourse civil and, of course, because no one wants to hear some kid spouting off a string of obscenities like we're playing call of duty. And no word filter is ever going to accomplish that.

    I agree that list is antiquated though. It has things like "spook" on it, that even my grandfather forgot used to be a racist slur.
     
  4. aD0UBLEj

    aD0UBLEj Well-Known Member

    Messages:
    138
    Likes Received:
    17
    Local Time:
    4:48 PM
    These are actual swear words, that I feel is inappropriate for the public forum (if it doesn't filter them out on here, not tested), is it possible to PM you about it?
     
  5. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    @SirWill because this is the proper thread for it (you posted the pastebin link in the wrong thread) can you post it here and also tell me how they do case insensitive regexes? thanks alot
     
  6. SirWill

    SirWill Founder

    Messages:
    12,284
    Likes Received:
    3,708
    Local Time:
    5:48 PM
    Ups, too many forum tabs open :lurking:
    Do you mean ignore or respect case sensetive?


    Here is the list:
    Pastebin.com
     
  7. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    ignore case sensitive, as if i don't my regex is going to hit over 3k columns
     
  8. The_Icy_One

    The_Icy_One Procrastinates by doing work

    Messages:
    1,044
    Likes Received:
    210
    Local Time:
    4:48 PM
    I'd assume they just convert the string to lowercase before checking against the regex.
     
  9. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    i'm not sure how the plugin works, never assume that, ever, but as it goes i think i have it down toooo only 100 words to match, and it works in such a way that it can detect any words with said words inside of them but if you're using it in a larger english sentence that wouldn't place it in that context it doesn't, fun right? :D
     
  10. SirWill

    SirWill Founder

    Messages:
    12,284
    Likes Received:
    3,708
    Local Time:
    5:48 PM
    i as modifier.
    Like
    /test/i
     
  11. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    got it, so i SHOULD be done soon, i just have to make this one regex for spaces that make it so that it doesn't catch whole setences that by chance use both words and i should be done :D
     
  12. SirWill

    SirWill Founder

    Messages:
    12,284
    Likes Received:
    3,708
    Local Time:
    5:48 PM
    But I think the plugin already do this. Just test it by writing a blacklisted word in upper case on a server ;)
     
  13. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    i'm not on atm, so that's why i asked ;) but yhea, i should be done soon and have added and removed some words in the context of others
     
  14. The_Icy_One

    The_Icy_One Procrastinates by doing work

    Messages:
    1,044
    Likes Received:
    210
    Local Time:
    4:48 PM
    The assumption was mostly based on the fact that the words are all in lowercase on the filter list, but are blocked in all case when used.
     
  15. F4lconwings

    F4lconwings Well-Known Member

    Messages:
    631
    Likes Received:
    264
    Local Time:
    5:48 PM
    I am not sure which code they use, but in most cases the binary code for a single letter is splitted into two parts:
    The letter itself (for example 01000001 for the letter a and a 1 after it meaning capital letter. So 01000001 1 = A)
    So the part that counts is the first one, and the filter would even recognize InSulT as inappropriate, if it was in the list.
     
  16. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    none of what you said makes sense, a standard char is the length of a byte, here, have this ascii conversion chart to explain it Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion[DOUBLEPOST=1463344966][/DOUBLEPOST]


    Edit: for the word "ass" change it to "ass[^ess]" as i completely messed that one up and can't edit the paste
    here is my revised regular expression, critique it what you will, i removed "hell" (that's not really a swear) and added another, also removed alot of words from the list as they contained other words and as such made the regular expression catch the words that contain it aswell, words with spaces don't work if you have an actual sentence where it's irrelevant and it catches if people try any character inbetween them in hopes of bypassing the filter, it's not too complex, to test it i used Online regex tester and debugger: JavaScript, Python, PHP, and PCRE to make sure that it worked, i can fix anything if there's a complaint about it
     
    Last edited: May 15, 2016
  17. F4lconwings

    F4lconwings Well-Known Member

    Messages:
    631
    Likes Received:
    264
    Local Time:
    5:48 PM
    Oh yeah that's how it was:
    01000001 = A
    11000001 = a
    Sorry, it is kind of the same way how it works.
     
  18. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    10000000 = 128, which is past both, so no, it doesn't work like that at all, sorry, but this is offtopic
     
  19. F4lconwings

    F4lconwings Well-Known Member

    Messages:
    631
    Likes Received:
    264
    Local Time:
    5:48 PM
    Dude i am not a developer, and i only have limited knowledge about programming, but i know that there is 1 Bit that causes whether it is 0 or 1 the letter to be capital or not. And that ist the bit that is being ignored in the Code of the list. All I wanted to say.
     
  20. chugga_fan

    chugga_fan ME 4M storage cell of knowledge, all the time

    Messages:
    5,861
    Likes Received:
    730
    Local Time:
    11:48 AM
    as i just showed, this is not the case, but can we keep this somewhere else? it's offtopic
     
Thread Status:
Not open for further replies.

Share This Page